PatternDiviner: A Pattern Recognition Tool

advertisement
PatternDiviner: A Pattern Recognition Tool
Gregory S. Hill and Goran Trajkovski
Cognitive Agency and Robotics Laboratory
Towson University, 8000 York Road, Towson MD 21252-0001
E-mail: {ghill1, gtrajkovski} @ towson.edu
Phone 410-704-6310, Fax 410-704-3868
Overview of Original Software
Activity is central to thought and cognition. Through
Figure 1: TamTam ExampleFigure 1 gives an
interaction autonomous agents build working
example of the TamTam output for a pattern. Patterns that
representations of the environment they inhabit
are successfully recognized are the ‘winners’ and are used
(Trajkovski 2007).
as the parents for the next generation of more sophisticated
TamTam is a software demo based on interactionist
patterns, in an approach reminiscent of genetic algorithms.
principles (Bickhard 1980) based on unsupervised learning
(Buisson 2006). This applet was developed to recognize
We viewed TamTam not only as an demonstrative
and anticipate rhythmic patterns entered via a computer
simulation, but also as a possible learning paradigm in
keyboard. TamTam always starts with a basic set of
pattern recognition and anticipation. We studied the
rhythms (e.g., four full-notes), and then uses a
efficacy of converting this approach into a generalized
sophisticated algorithm for generating more and more
pattern recognition tool to be used for pattern mining for
complicated child patterns, based on the previously
large datasets, including sets of bioinformatics data such as
recognized patterns input by the user.
gene sequences.
Methods
One of the issues when working with the code was that it
was written as a java applet (see [Buisson 2003] for
details), with all of the classes residing in one file, and a
subsequent reliance on global variables. Based on the
TamTam code, we developed PatternDiviner, a software
suite for pattern mining in gene sequences. The core of the
product is the pattern recognition engine (PRE), that relies
Figure 1: TamTam Example
on a multitude of other independent classes and interfaces,
including Note.java, PatternBuilder.java,
PatternDisplay.java, Sequence.java,
through the gene sequence in set sizes of 4 genes through
SequenceCanvas.java, Stroke.java, TamTamPanel.java,
22 genes, as customary in bioinformatics data mining
and TouchPanel.java. PatternDisplay is an interface, used
when studying nucleotide sequence interaction.
by TamTamPanel, which implements the method
updatePatternDisplay(Sequence). This method is used to
For example, given a partial gene sequence of
explicitly display the results of the PRE. In the case of the
‘AGGGTGCGCA AATTGGCGCA …’, the first round of
original TamTam applet, TamTamPanel displays a
sequences would be ‘AGGG’, ‘GGGT’, GGTG’, ‘GTGC’,
graphical sequence of notes, based on the pattern being fed
etc. Any patterns recognized by the PRE are stored by the
to it by the user, as output in the applet. The core pattern
engine internally.
recognition code was placed in PatternBuilder, which is
the PRE of this system. See Figure 2 for the UML diagram
Work in Progress
of the essential classes.
The results of running the Salmonella gene sequence
(NCBI GeneBank 2006) through the engine were negative.
The key sequence lengths we were interested in were
between 18 and 22 genes. The PRE was able to recognize
patterns of up to five genes in length, but nothing further.
There are several possible reasons for the negative result.
The first, and most obvious, is that there may simply be no
patterns to be recognized. A further issue might be the
particular algorithm for generating the child patterns off of
the winning parent patterns. A winning pattern of, for
example, ‘atat’ might generate a child pattern of ‘atgat’,
which may very well not be a larger pattern within the
sequence. Finally, larger gene sequence datasets should be
tested against the tool. As well as different types of data
(meteorological, traffic-flow patterns in various urban
areas, etc.).
Figure 2: UML Diagram
This is a work in progress. More research into alternative
A separate class, DNAPatterns.java, implements
PatternDisplay, which loads up the sample dataset and
feeds it to the Sequence class. We used the following
conversion between notes representation in TamTam and
the base nucleotides as follows: adenine (abbreviated A,
equivalent to a full note), cytosine (C, half note), guanine
(G, note) and thymine (T, 1/8 note). We then iterated
pattern generation schemes might be worthwhile. Also, it
would be interesting to further develop the PRE into a
more abstract, and extensible, class. With a little work, a
base class could be developed that managed some of the
basic pattern recognition, and then let any sub-classes
implement the specific pattern recognition algorithms
needed by the developer. Using a basic Factory pattern,
TamTam could be used in a variety of environments,
easily modifiable and testable.
8) Stojanov, Bozinovski, S, Trajkovski, G,
"Interactionist-Expectative View on Agency and
Learning", in: IMACS Journal of Mathematics and
Acknowledgements
Computers in Simulation, North-Holland, Amsterdam,
This work was assisted generously by Dr Jean-Christophe
vol 44 (1997) 295-310.
Buisson, and was partially funded by the Faculty Research
and Development Committee of Towson University.
References
1) Bickhard, M. “Interactivist Manifesto”. Retrieved
online on June 10, 2006 at
http://www.lehigh.edu/~mhb0/InteractivismManifesto.
pdf.
2) TamTam, Retrieved online on June 5, 2006 at
http://diabeto.enseeiht.fr/tamtam/, Dr. Jean-Christophe
Buisson of L’Ecole Nationale Supérieure
d'Electrotechnique, d'Electronique, d'Informatique,
d'Hydraulique et des Télécommunications
(http://enseeiht.fr/)
3) Jean-Christophe Buisson: “A rhythm recognition
computer program to advocate interactivist
perception”, Cognitive Science, Volume 28, Issue 1,
January-February 2004, Pages 75-87,
4) NCBI GeneBank:
http://www.ncbi.nlm.nih.gov/Genbank/GenBankFtp.ht
ml
5) Trajkovski, G., “An Imitation-Based Approach to
Modeling Homogenous Agents Societies”, IDEA
Publishing, 2007.
6) Collins, S, and Trajkovski, G (2006) “Attack of the
Rainbow Bots: Generating Diversity through MultiAgent Systems”. In Trajkovski, G. (ed) Diversity in
Information Technology Education. Hershey, PA:
InfoSys Press, pp 196-241.
7) Trajkovski, G., Collins, S.: “Autochthony Through
Self-Organization: Interactivism and Emergence in a
Virtual Environment”, New Ideas, Elsevier, in press.
Download