Using an Episodic Memory Module for Pattern Capture and Recognition

advertisement
Using an Episodic Memory Module for Pattern Capture and Recognition
Dan Tecuci and Bruce Porter
Department of Computer Sciences
University of Texas at Austin
Austin, TX 78712, USA
{tecuci,porter}@cs.utexas.edu
Abstract
We propose an episodic memory-based approach
to the problem of pattern capture and recognition.
We show how a generic episodic memory module
can be enhanced with an incremental retrieval algorithm that can deal with the kind of data available for this application. We evaluate this approach
on a goal schema recognition task on a complex
and noisy dataset. The memory module was able to
achieve the same level of performance as statistical
approaches and doing so in a scalable manner.
1 Introduction
A growing class of AI applications rely on capturing and recognizing complex patterns in their data: crime and terrorism
prevention, tax fraud detection, trend assessment, etc.
The two main approaches to building these applications
are memory-based (classify a new situation by using similarity with prior cases) and generalization-based (derive general
classification rules). The purpose of a memory is to store relevant prior experience so that they are available for future use
and to do this in a time and space efficient manner. This corresponds to the capture of patterns and their recognition in
newly observed data.
However, classical memory-based approaches (like casebased reasoning) cannot be directly applied to this domain.
This is due to important differences in the type of patterns
captured (description of situations/entities in CBR vs. temporal sequences of complex events), the dynamic aspect of patterns (events have applicability conditions and consequences)
and the incremental availability of data (due to limited observation and to the fact that complex events unfold over time).
We propose to use a episodic memory for the problem of
pattern-based analysis. Such a memory has all the required
characteristics: it organizes temporally ordered events, these
events are dynamic (i.e. they change the state of the world)
and they are observed incrementally. Capture and recognition
of past events are the basic processes of an episodic memory.
Prior AI applications exhibit only one of these characteristics: [Kolodner, 1984] designed a memory for organizing
events by when and where they happened, but it does not
do episodic recognition; the Basic Agent ([Vere and Bickmore, 1990]) employed two episodic memories, one made
up of simple recognizers (e.g. repetition) and the other one
for guiding the planner’s backtracking mechanism; [Ram and
Santamaria, 1997] recorded raw data from prior actions to improve navigation; [Nuxoll and Laird, 2004] designed the most
advanced cognitive model of an episodic memory, addressing all the functional stages proposed by [Tulving, 1983]. Its
main limitations are the flat episode organization which results in retrieval time linear in the number of stored episodes.
We have designed a generic episodic memory module
that can be applied for a multitude of tasks in various domains. We do not propose complete solutions for problem
solving in these domains, rather the episodic memory will
have a supporting role in solving such problems. Encapsulating much of the complexity needed to build such a system in
its memory subsystem can pay off by simplifying other components.
We augment this memory module with an incremental retrieval algorithm, suitable for the task of pattern recognition,
and evaluate its performance on a plan recognition task on a
synthetic dataset in the logistics planning domain.
2 Related Work
There has been a lot of related work in plan recognition
([Schmidt et al., 1978]) - the act of reasoning from a set of
observed actions in order to infer the goal, a possible next action, or a complete plan (a sequence of steps for achieving a
plan).
Both symbolic and probabilistic approaches to plan recognition have been proposed ([Kautz, 1997], [Pollack, 1990]).
Criticism focuses mainly on their inability to deal with error
recovery and incremental recognition([Woods, 1990]).
In this paper we show that our generic episodic memory
module can address some of these problems. An incremental retrieval algorithm that works with our memory module is
proposed. Error recovery is dealt with by keeping all plausible hypotheses available and only eliminating those that become inconsistent with the observed data. By storing only
relevant patterns we reduce the size of the search space that
needs to be examined at retrieval time.
3 A Generic Memory Module for Events
Most of the tasks an intelligent system accomplishes can be
represented as events, so, in order to store them, a memory
for events is needed. There have been numerous attempts at
building some form of memory into an intelligent system (the
field of case-based reasoning, Soar) but most suffered from
being limited to a particular domain task.
Our proposal is too develop a generic memory module that
could be used in a variety of applications and for a variety
of tasks. Such a memory is intended to be used in connection
with an a application module that would supply necessary domain dependent interface functions.
3.1
General Memory Requirements
Remembering the past is necessary but not sufficient: it needs
to be done in a timely manner. Intelligent systems need to efficiently organize and search their memory. They also need
to retrieve items relevant to the situation at hand. This requires a flexible match between the previous experience and
and the new situation. Such a memory needs to be contentaddressable such that an external stimulus can directly index
relevant knowledge in memory. These internal memory requirements have to be combined with the external ones, of
which scalability and robustness are the most important.
Such a generic episodic memory module should provide:
• a representation of generic episodes that supports reuse.
• an organization of such episodes that can accommodate
a large number of them and is able to retrieve them efficiently.
• generic mechanisms for deciding what to store from an
episode and what features should be used in retrieval;
these mechanisms should enforce (partial) reuse of past
memories.
• scalable and generic retrieval mechanisms.
• generic update mechanisms for when new experience
becomes available.
3.2
The Design of the Generic Memory Module
Episode Representation
A generic episodic memory needs to have a representation for
a generic episode. We define an episode as a sequence of actions with a common goal. A semantic memory (i.e. a knowledge base) will provide knowledge about domain-specific actions, their applicability conditions and the effects and goals
they achieve.
We have identified three major classes of systems that can
benefit from using a memory module:
memory-based planning - devise a plan (a sequence of actions) that accomplishes a given goal. Using a memory
of past plans (along with their results), an agent can save
the work by adaptating prior plans to achieve similar
goals and can avoid plan failures either by recognizing
them early on or by recalling a plan repair that worked
previously.
memory-based diagnosis - diagnose a malfunction based
on its symptoms. Memory can help organize diagnoses
by their symptoms, contexts in which they manifest,
necessary treatments, etc.
memory-based recognition - recognize a sequence of
events achieving some goal (usually undesired). Memory organizes plans by their actions, goals they achieve,
and failures they encounter.
Based on the characteristics of these three classes of applications, we propose that a generic episode have three parts:
context - the general setting in which some episode happened; for some applications this might be the goal of
the episode (e.g. planning), representing the desired
state of the world after the episode is executed.
contents - what the episode consists of: the ordered set of
events that make up the episode; in the case of a planner,
this would be the plan itself.
outcome - some evaluation of the impact that the execution
of the episode had (e.g. if a plan was successful or not)
Indexing
Episodes are stored in memory unchanged (i.e. not generalized) and are indexed for fast retrieval. We have adopted
a two-layer indexing scheme similar to MAC/FAC ([Forbus
et al., 1995]): a shallow indexing in which each episode is
indexed by the all its features taken in isolation and a deep
indexing in which episodes are linked together by how they
differ structurally from one another.
During retrieval, shallow indexing will select potentially
relevant episodes from which a hill-climbing algorithm that
uses semantic-matching will find the episode(s) that best
match the external stimulus. A robust memory will need to
employ a flexible matching algorithm, such that old situations
are still recognized under new trappings. We use a semantic matcher ([Yeh et al., 2003]) to compute the similarity between a new situation and a prior episode.
This semantic matcher combines subgraph isomorphism
with transformation rules in order to resolve mismatches between two representations. It was successfully applied to assessing battle plans’ strengths and weaknesses ([Yeh et al.,
2003]), questions answering [Barker et al., 2004] as well as
natural language tasks like building models of user utterances
([Yeh et al., 2005]) and word-sense disambiguation and semantic role labeling [Yeh et al., 2006]).
Memory API
The memory module provides two basic functions: store and
retrieve. Store takes a new Episode represented as a triple
[context, context, outcome] and stores the Episode in memory, indexing it along one or more dimensions; retrieve takes
a partially specified Episode and one or more dimensions and
retrieves the most similar prior episodes along those dimensions. Other information, such as how these episodes differ
from the stimulus, is returned as well. This is intended to be
used by the application that uses the episodic memory module
in order to better make use of the returned episodes.
4 Incremental Retrieval
The general retrieval algorithm works well for memory-based
planning and sentence interpretation applications, in which
memory organized the patterns of entities and relations, but it
will need to be modified in order to work with sequences of
events that are observed incrementally.
At first glance, the fact that data is presented incrementally
seems to increase retrieval time due to the need to query memory with the presentation of each new stimulus. However, incremental data reduces the size of each query. Humans are
good at dealing with continuous streams of stimuli and employing expectations to focus attention and guide recognition.
The question we address here is: can we devise such an algorithm for an episodic memory?
This idea has been put forth before ([Schmidt et al., 1978],
[Schank, 1982]) and has been applied in areas like dialogue
processing ([Grosz and Sidner, 1986], [Litman and Allen,
1987]) and plan recognition ([Schmidt et al., 1978]). The
sequential structure of events helps constrain the type of expectations a system might form to just the next event(s) (its
type and possibly its description).
To be able to take advantage of this, a memory should have
the ability to ([Schmidt et al., 1978]):
form hypotheses based on a set of initial observations and
background knowledge
build expectations about next actions based on current hypotheses
recognize whether expectations were met when new observations become available
refine and revise a set of hypotheses when
expectations
are not met; this includes dropping hypotheses that
don’t conform to the observed stimuli and building new
ones that do.
We have implemented a simple incremental retrieval algorithm:
initialize hypothesis
while there are stimuli do
make a prediction about next event
observe next event
compare expectation with observation
refine/revise hypotheses
After a new stimulus is observed, memory will refine or
revise its prior hypotheses. These hypotheses consist of prior
episodes that agree with the data observed so far.
5 Pattern Capture and Recognition using an
Episodic Memory
The goal of this project is to augment a generic memory module for events with an retrieval algorithm that can deal with incrementally available data. We want the system to be able to
make predictions incrementally, in a fast and accurate manner, comparable in performance to the state-of-art in statistical recognition. Due to the generic nature of the memory
module and of the retrieval algorithm, this approach should
be easily portable to new domains.
5.1
The Linux Corpus
We chose to evaluate our approach on a goal schema recognition task in the Linux Plan Corpus [Blaylock and Allen,
Plan sessions
Goal schemas
Action schemas
Avg. Actions/Plan
457
19
48
6.1
Figure 1: The Linux plan corpus statistics
2004]. In goal schema recognition the purpose is to predict
the type of goal an agent has, but not its exact parameters.
This corpora is similar to the Unix corpus [Lesh, 1998], but
is an order of magnitude larger in size. It was gathered from
human Linux users from the University of Rochester, Department of Computer Science. Users were given a goal like
find a file with ‘exe’ extension and were instructed to achieve it using simple Linux commands (no
pipes, no awk, etc.) All user commands along with their results were recorded. For each goal, users were also asked to
assess whether they accomplished it. The users judged 457
sessions to be successful1 , involving 19 goal schemas and 48
action schemas (i.e. Linux commands).
5.2
Experiment
We adapted the generic episodic memory module to work
with the incremental retrieval algorithm and applied it to the
goal schema recognition task on the Linux plan corpus. A 10
fold cross-validation was performed on the 457 plans, measuring the accuracy of the memory recognition algorithm and
the number of events matched per recognition problem.
Plan sessions are presented to the system incrementally.
After each new plan action is seen the system refines or revises its current set of hypotheses.
Due to the noise in the dataset, the recognition algorithm
has to deal with superfluous actions (i.e. actions that do not
influence the outcome of the plan). It does this by allowing
mismatches between a new plan and prior episodes.
5.3
Example
Suppose a new plan is being observed one event at a time.
Each Linux command is preceeded by sh> and the result of
its execution appears below.2 The system’s response is given
after each such command. Prior episodes are described in
terms of their sequence of events.
After each event is observed, memory will try to recall
prior episodes that had such an event and will compare the
current observation with those recalled. As a result, the set
of plausible hypotheses is revised (e.g. by adding newly reminded episodes that match observation). To maintain scalability, we limit the number of remindings that the system
explores (in this case to 5).
sh> find -name linux
./code/linux
Given the observed event, memory is reminded of 3
episodes:
1
Because some users were not able correctly judge this, there are
still a number of failed sessions and, therefore, data is noisy.
2
Please note that the results are displayed here just for the readability, the system does not see them
(a) Accuracy.
(b) Number of Explored Episodes.
(c) Number of Stored Episodes.
Figure 2: Experimental results for the Linux planning corpus: Accuracy of goal schema recognition using Top 1, 2, 3 and 4
predictions, number of explored events per recognition and number of stored episodes.
Episode16 and Episode14, both consisting of a
single event Find-By-Name and having as goal
find-file-by-attr-name-exact(b.c)
Episode7 with events:
Find-By-Name, Ls,
Tar-Zip-Create, Tar-Zip-Create, Ls
with
goal compress-dirs-by-loc-dir.
The observed event is matched against all reminded
episodes. Episode14 and Episode16 are removed due to mismatches.3 Episode7 is kept as the only plausible hypothesis.
The next event is observed:
sh> find -name *.pl
./code/accessor.pl
./code/constructor.pl
./code/gang/dwarf/aml.pl
Reminded Episodes: Episode10, whose goal schema is
move-files-by-attr-name-ext and has the follosing events: Find-By-Name, Find-By-Name, Move,
Move, Move, Find-By-Name. Since Episode10 has
not been compared to previously observed events, this comparison is done now. Although no event in Episode7 matches
the observed one, it is kept in the set of plausible hypotheses.
The last event is observed:
sh> mv ./bin/gang/set/convert.pl
./code/accessor.pl
./code/constructor.pl
./code/gang/dwarf/aml.pl
./code/linux
Reminded Episodes: Episode10, already among plausible
ones, is matched against observed event.
3
In this case the event type was the same, hence the reminding,
but due to incorrect command syntax its parameters were missing.
We tried to get a faithful translation of the original dataset and translated incorrect syntax commands, which resulted in events with no
parameters.
There are no more observations and the most similar prior
plan is that of Episode10. The system will conclude that its
goal schema move-files-by-attr-name-ext is the
most plausible one for the observed plan, which is correct.
Even this simple example shows some of the variability
found in this data: files were moved individually to the desired locations in both episodes, but one used a single Move
command, while the other moved them all together using their
names. Another way to achieve the same goal is to use a regular expression as an argument for Move.
5.4
Experimental Results
We measured the accuracy of the top N predictions on a
goal schema recognition task (Figure 2(a)). After memory is
trained on 400 plans, accuracy is around 37%. This is not significantly different than previous studies on the same dataset
and task ([Blaylock and Allen, 2004]), but still rather low.
This is due to several facts: the data is noisy (e.g. bad commands are used - fgrep is used instead of find, user reports success although the given goal was not achieved) and
some goal schemas are similar enough so that the system confuses them (e.g. find file by extension and find
file by name).
Given our goal to build a generic memory module, we are
also interested in the performance of memory alone, besides
that on the goal recognition task. We measured the number
of events per plan session that memory tried to match during recognition. This should be an indicator of retrieval cost.
We also measured memory growth with the number of observed episodes. This is directly related to the storage policy.
For this experiment we have implemented a simple storage
policy: keep only those plans that could not be correctly recognized at the moment they were observed.
Memory grows linearly in the number of observed plans
(Figure 2(c)). This is due to accuracy being around 40%,
which prompts memory to keep on learning (i.e. storing
plans). However, the number of explored episodes (Figure
2(b)) grows at a different rate: asymptotically fast until 200
plans have been observed and remains constant after that. In
200 to 400 observed plans interval memory size almost doubles but the number of explored episodes remains the same.
This is empirical evidence that memory retrieval is scalable.
After 400 plans have been observed, around 41 actions are
examined per recognition. Given that there are an average of
6.1 actions per plan, this means that an average of 6.72 plans
are searched for each recognition.
5.5
Limitations of Current Approach
Our current shallow indexing scheme treats all features
equally, without assigning any weight to them. As we limit
the number of explored remindings, it is important that remindings generated by more distinctive features be explored
before those generated by less distinctive ones.
One issue that we did not address here is the sensitivity
of memory retrieval performance to noise. Even though the
Linux plan corpus is noisy we don’t have good measure of
how much noise there is (e.g. how many sessions are misclassified as successes at accomplishing their goal; how many
unnecessary actions were taken by users given a goal). We
would also like to study how memory performance degrades
when noise is introduced.
Another important problem not addressed here is the ’keyhole recognition problems’ ([Cohen et al., 1982]) in which
the entity executing the plan specifically tries to hide their
intentions, making the recognition task harder.
6 Future Work
Being able to make accurate predictions early on during the
unfolding of the plan is very important. These predcitions
constitute early warnings of potential outcomes that the user
might act upon. A measure of how good the recognizer is at
predicting/recognizing the plan when only a few actions have
been observed is the convergence point - after how many observed actions the system starts making the correct prediction.
We plan to measure this in a subsequent study.
Goal recognition is just one part of the picture. To be useful, it needs to be paired with parameter recognition. We plan
to address this in the near future.
Our current measure of similarity of two plans is based on
the similarity of their individual events. A more powerful
and accurate measure needs to take into account whether/how
individual actions change the state of the world.
Complex plans happen over longer periods of time and
consist of many low-level events. They are unlikely to be
recognized just by looking at these individual events. A good
recognizer needs to be able to recognize subgoals and use
them in the subsequent recognition process.
The current approach builds implicit expectations in the
form of candidate episodes that match the events observed
so far. A way to improve retrieval speed is for memory to actively look for how these candidate episodes differ from each
other and test for the presence/absence of those differences in
the new stimuli.
Expectations built by a generic episodic memory could be
used to focus processing on confirming/disproving a smaller
set of candidate hypotheses, by giving the application the
choice of specifying the ordering function. For example, in a
domain like crime prevention, one might want to test first the
hypotheses that have the worst outcome, so that preventive
measures could be taken as quickly as possible.
7 Conclusions
In this paper we have proposed an approach to pattern capture and recognition based on a generic memory module for
events. We have showed that such a memory module can
be augmented with a simple incremental retrieval algorithm
in order to handle incrementally available data and to make
predictions at each step. We evaluated this approach on a
goal schema recognition task on a complex and noisy dataset
and showed that it achieved the same level performance as
state-of-art statistical approaches. Memory organization and
retrieval proved scalable. Due to the generic nature of the
memory module and of the retrieval algorithm, this approach
should be easily portable to new domains.
References
[Barker et al., 2004] K. Barker, S. Chaw, J. Fan, B. Porter,
D. Tecuci, Peter Z. Yeh, V. Chaudhri, D. Israel, S. Mishra,
P. Romero, and P. Clark. A question-answering system for
ap chemistry: Assessing kr&r technologies. In Proceedings of the Ninth International Conference on the Principles of Knowledge Representation and Reasoning (KR
2004), pages 488–497, 2004.
[Blaylock and Allen, 2004] N. Blaylock and J. Allen. Statistical goal parameter recognition. In The 14th International Conference on Automated Planning and Scheduling
(ICAPS’04), 2004.
[Cohen et al., 1982] Philip R. Cohen, C. Raymond Perrault,
and James F. Allen. Beyond question answering. In Strategies for Natural Language Processing, pages 245–274.
Lawrence Erlbaum Associates, 1982.
[Forbus et al., 1995] Kenneth Forbus, Dedre Gentner, and
Kenneth Law. Mac/fac: A model of similarity-based retrieval. Cognitive Science, 19(2):141–205, 1995.
[Grosz and Sidner, 1986] Barbara J. Grosz and Candace L.
Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 3(12):175–204, 1986.
[Kautz, 1997] Henry A. Kautz. A theory of plan recognition
and its implementation. In J. F. Allen, H.A. Kautz,
and R.N. Pelavin, editors, Reasoning about Plans,
chapter 2. Morgan Kaufmann, San Mateo, CA, 1997.
http://www.cs.washington.edu/homes/kautz/papers/prbook.ps.
[Kolodner, 1984] Janet L. Kolodner. Retrieval and Organizational Strategies in Conceptual Memory: A Computer
Model. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1984.
[Könik and Laird, 2006] Tolga Könik and John E. Laird.
Learning goal hierarchies from structured observations
and expert annotations. Machine Learning, (in print),
2006.
[Lesh, 1998] N. Lesh. Scalable and Adaptive Goal Recognition. PhD thesis, University of Washington, 1998.
[Litman and Allen, 1987] Diane Litman and James Allen. A
plan recognition model for subdialogues in conversation.
Cognitive Science, 11:163–200, 1987.
[Nuxoll and Laird, 2004] Andrew Nuxoll and John E. Laird.
A cognitive model of episodic memory integrated with a
general cognitive architecture. In Proceedings of the Sixth
International Conference on Cognitive Modeling, pages
220–225, Mahwah, NJ, 2004. Lawrence Earlbaum.
[Pollack, 1990] Martha Pollack. Plans as complex mental
attitudes. In Philip R. Cohen, Jerry Morgan, and Martha
Pollack, editors, Intentions in Communication, pages 77–
103. MIT Press, Cambridge, Massachusetts, 1990.
[Porter et al., 1990] Bruce W. Porter, Ray Bareiss, and
Robert C. Holte. Concept learning and heuristic classification in weak-theory domains. Artifical Intelligence,
45:229–263, 1990.
[Ram and Santamaria, 1997] Ashwin Ram and J. C. Santamaria. Continuous case-based reasoning. Artificial Intelligence, 90(1-2):25–77, 1997.
[Schank, 1982] Roger C. Schank. Dynamic Memory. A Theory of Reminding and Learning in Computers and People.
Cambridge University Press, 1982.
[Schmidt et al., 1978] Charles F. Schmidt, N. S. Sridharan,
and J. L. Goodson. The plan recognition problem: An
intersection of psychology and artificial intelligence. Artificial Intelligence, 11:45–83, 1978.
[Tulving, 1983] Endel Tulving. Elements of Episodic Memory. Clarendon Press, Oxford, 1983.
[Vere and Bickmore, 1990] Steven Vere and Timothy Bickmore. A basic agent. Computational Intelligence, 4:41–
60, 1990.
[Woods, 1990] W. A. Woods. On plans and plan recognition: Comments on pollack and on kautz. In P. R. Cohen,
J. Morgan, and M. E. Pollack, editors, Intentions in Communication, pages 135–140. MIT Press, Cambridge, MA,
1990.
[Yeh et al., 2003] Peter Z. Yeh, Bruce Porter, and Ken
Barker. Using transformations to improve semantic matching. In K-CAP 2003: Proceedings of the Second International Conference on Knowledge Capture, pages 180–189,
2003.
[Yeh et al., 2005] Peter Z. Yeh, Bruce Porter, and Ken
Barker. Matching utterances to rich knowledge structures
to acquire a model of the speaker’s goal. In K-CAP2005:
Proceedings of Third International Conference on Knowledge Capture, pages 129–136, 2005.
[Yeh et al., 2006] Peter Z. Yeh, B. Porter, and K. Barker.
A unified knowledge based approach for sense disambiguation and semantic role labeling. In Proceedings of
the Twenty-First National Conference on Artificial Intelligence (AAAI 2006), 2006.
Download