Recognition of On-Line Handwritten Mathematical

advertisement
Recognition of On-Line Handwritten Mathematical Expressions
in the E-Chalk System - An Extension
Ernesto Tapia and Raúl Rojas
Freie Universität Berlin, Institut für Informatik
Takustr. 9, D-14195 Berlin, Germany
{tapia,rojas}@inf.fu-berlin.de
Abstract
This article introduces the new version of our system for
the recognition of on-line handwritten mathematical expressions. The previous version was capable to recognize the
most common mathematical expressions, such as expressions of calculus, following the usual mathematical conventions. This version improves and extends our technique, allowing the recognition of structures described by tabular
and multiline arrangements, such as matrices and systems
of equations. The recognition results are stored in a composite of symbols, which are used by computer algebra systems for algebraic computations, and function plotting. The
system is embedded as an intelligent assistant in the Electronic Chalkboard (E-Chalk), a system for the live transmition and storage of lectures.
1. Introduction
Traditionally, there has been a lack of tools for the development of systems specialized on the recognition of online handwritten mathematics, due to the complexity of twodimensional mathematical notation including formulas, arrays of symbols, diagrams, etc. Programs running on personal digital assistants (PDAs) and Tablet PCs are capable
to recognize digital ink as text, or simple one-dimensional
arithmetical expressions, e.g. using the Grafitti alphabet for
the former, or libraries such as PenReader1 for the latter.
Most recently, Microsoft has offered a special version of
the Windows operating system for the Tablett PC, which includes libraries for processing, storing, and recognizing digital ink. However, the system is also specialized towards the
recognition of Latin letters, and cannot be easily extended
to recognize other symbols.
In the meantime, a few prototypes of on-line formula
recognizers have appeared. The Freehand Formula Entry
1
Developed by Paragon Software, http://www.penreader.com
System [11] is a pen-based equation editor distributed under the GNU General Public License. The Natural Log system [8] is a user-dependent system written in Java, available
on-line as an Applet on the Internet. The recognition results
of the former are a LATEX represention of the mathematical
expression, whereas the latter offers also a MathML representation. The Infty Editor [3] is a commercial system specialized for creating mathematical documents. It uses an interesting approach for the recognition of isolated symbols,
based on the classification of extended and non-extended
characters. MathJournal [14] is another commercial system
developed for the Tablet PC version of the Windows platform. Its “solution engines” process the recognized expressions in numeric, graphic, or symbolic formats. The recognition of symbols is limited, because it uses the recognizer
provided by Mirosoft.
The recognition system described in this article is a tool
embedded in the Electronic Chalkboard (E-Chalk), a system for the live transmition and storage of lectures [7].
E-Chalk uses a contact-sensitive screen to process and store
the lecturer’s hadwriting, which is processed by means of
intelligent agents working in the background. See Fig. 1.
The aim of our system is to provide the lecturers the possibility to interpret and to process handwritten mathematics.
The purpose of this article is to introduce the new version of our recognition system. It is organized as follows.
Section 2 describes the components of the recognition system. Section 3 shows how the system is used as an intelligent tool in E-Chalk. We conclude this article with some
comments in Sec. 4.
2. System description
We follow a two-step approach for the recognition of online handwritten mathematical expressions. The first step
is the recognition of isolated handwritten symbols, and the
second step is the layout analysis of the symbols in the expression. In the next sections we describe the methods we
Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005
IEEE
Figure 2. DAG for four classes.
Figure 1. The E-Chalk system. Four contactsensitive screens are used with four rear projectors.
developed to carry out these main steps. A detailed description of our techniques can be found in [12].
2.1. Symbol recognition
Symbols are divided in three groups, which correspond
to symbols constituted by one stroke, by two strokes, and by
three strokes or more. Strokes are processed using known
techniques to reduce irregularities, variability and amount
of data: point clustering, dehooking, resampling and interpolation, grouping and reordering of strokes, and scaling.
After the preprocessing step is done, some important features are extracted to form the input vector used for classification. These features include local features such as the
coordinates of stroke points, sine and cosine of turning angles, and other global features related to topological properties of the stroke.
We use Support Vector Machines (SVM) as the default
classifier of on-line isolated symbols. From our experiments, we found that it was the best classifier when compared against other popular classification techniques. The
results of our experiments with the 1a subset (digits) of
the UNIPEN database [4] are shown in Table 1. For support vector classification we used the technique of SVMDirected Acyclic Graph (SVM-DAG) [10] using a radial basis kernel on each of the classification nodes of the graph.
(See Fig. 2.) We used libsvm version 2.71 [2] to train the
classification nodes. The parameter values C and γ of our
support vector classifier were tuned using the tool included
in libsvm. The parameter values used by the other classifiers shown in Table 1 are the default parameters given in
Weka version 3.4.3 [15].
Approach
Support Vector
Nearest neighbor
Neural network
Perceptron [9]
Hidden Markov Model [6]
Fuzzy-Geometric [5]
Decision tree
Naive Bayes
Table 1. Comparison of different classification approaches using the UNIPEN 1a (digits) subset. The
error rates were estimated using ten-fold cross
validation.
2.2. Structural analysis of mathematical expressions
A mathematical formula is described as a composite,
where the main components are symbols and baselines.
Baselines are symbol composites representing symbols having horizontal arrangement in the expression [16]. Symbols are parents of baselines. Baselines have labels which
give information about their position relative to their parent symbol. These labels are used to establish mathematical relationships between parent symbols and children baselines. (See Fig. 3.) Such relative positions lie on regions
around symbols (above-left, above, superscript, right, subscript, below, below-left and subexpression regions). (See
Fig. 4.)
The main idea in our method is to use the semantical information given by symbols to group them and then construct the baseline tree which best describes the expression. For such purpose, symbols are considered as nodes
of a weighted graph, and a Minimal Spanning Tree (MST)
for this graph is constructed using a optimized version of
Prim’s algorithm. The calculation of the weight of a edge
joining two symbols is a distance-based value: If the symbols satisfy some mathematical relationship, as operators or
operands in terms of semantical information, the weight is
obtained by calculating the minimum distance between attractor points of symbols, see Fig. 4. If no mathematical
Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005
IEEE
Error
1.3480 %
1.4799 %
2.5021 %
3.0000 %
3.2000 %
3.7000 %
4.0070 %
8.3841 %
(a)
Figure 3. Above: a mathematical expressions. Below: Composite of four baselines representing the
expression.
(b)
Figure 4. Regions around symbols. Circles represent the attractor points of the symbol.
relationship is satisfied, the weight is the distance between
the centers of each symbol bounding box. Details about the
mathematical relationships satisfied by symbols, as well as
the location of attractor points can be found in [13]. Our
method allows:
Figure 5. (a) Example of a MST construction for
symbol clustering.The MST is initialized with the
main baseline (−, , e, d, x). Boxes are used to locate regions defining fractions. (b) MST construction for tabular layouts.
1. Fraction Localization. This MST construction locates
fraction in order to handle irregular horizontal structures. Consider the argument of the exponential function in Fig. 5(a). In such structure, the σ in the fraction could be interpreted as lying at the right of the e,
while it actually belongs the denominator of the fraction. Fraction localization can be also used to locate
and to determine attributes of the symbols like ‘overbrace’ and ‘underbrace’.
3. Matrix Recognition. The MST matrix construction is
activated to symbols enclosed by squared brackets, see
Fig. 5(a). This MST construction is used to recognize
structures which describe equation systems, as well as
functions defined by cases. For such purpose, we use
the symbol ‘{’ to start the recognition of the tabular arrangement, see Fig. 6(b). This construction can also be
used to recognize stacked arguments of sum-like operators (i.e. sum, integral, and product operators). In
contrast to the previous cases, no symbol is used to
start the tabular analysis, but the matrix mode is set
by default, when processing the symbols located in the
above and below regions of sum-like operators.
2. Symbol Clustering. The MST construction for symbol clustering associates groups of symbols to the regions, in order to define mathematical relationships between symbols and groups of symbols. This MST construction can be easily extended to recognize expressions that contain, for example, structures such as n Ck ,
or other operators with similar layouts. It can be also
extended to recognize arguments of roots, such as the
√
n in the structure n .
The composite representing the expression is constructed
by applying recursively the MST constructions explained
above: the MST for symbols clustering is initialized with
the main baseline. After that, the MST construction is applied to the groups found in the previous construction. Figure 5(a) shows the first step of the recursion.
Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005
IEEE
3. Implementation in the E-Chalk System
3.1. Java implementation
Using the techniques described in the previous section,
we wrote a Java library for the recognition of on-line mathematical expressions in E-Chalk. We make use of a composite pattern to describe the baseline tree structure. Symbols are stroke composites, whereas baselines are symbol
composites. In this way, Visitors are accepted by symbols
and baselines. A Java class implementing a general interface to specify the FormatVisitor data type is used to convert the composite structure into a string representation, depending whether computer algebra system (Mathematica or
Maple) is used to compute the operations indicated in the
formula.
(a)
3.2. Handwriting processing
The recognition mode in E-Chalk is activated by selecting a reserved stroke color for the palette. During the recognition mode, strokes as processed and grouped as described
in Sec. 2.1. If the classification gives as result the ‘end’ symbol ‘ ’ (lower-right corner), then the layout analysis is activated, and the structure of the recognition is interpreted,
depending on computation defined by the gesture encountered in the handwriting.
The processing of handwriting allows the recognition of
matrices can also be extended to recognize systems of equations and to solve them. In this case, the activation gesture is
the symbol ‘{’ at the beginning of the expression. The variables used to find the solution of the equation system are
enclosed with the end symbol, see Fig. 6(b). The system
is also capable to recognize structures defining parametric
plots, as shown in Fig. 6(c). Intervals for values of parameters are established using an “equals-colon” notation, such
as x = −1 : 3. Parameters without intervals take the values
in the interval [−1, 1].
3.3. Classification wizard
The introduction of new and training of new data is done
by using the interface shown in Fig. 7. A classification
model is stored in an compressed file, with the tree structure
shown in Fig. 7. A directory called “data” which store different files containing the symbols drawn in the right panel.
Once the user ends to write the new symbols, a new page
appears by pressing the “Next” button, to begin the training the classifiers using the parameters for support vector
classification obtained in [12].
Figure 2 shows a DAG originally used for three classes,
were the shadowed nodes are the new nodes introduced in
the structure. The structure of the directed acyclic graph
of the classifier allows an easy introduction and new data.
When introducing a new class of symbols, one has only to
(b)
(c)
Figure 6. Example of expressions recognized by
the system. (a) Multiplication of matrices. (b) An
equation system and its solution. (c) Function
plotting.
train a subset of the classification nodes. If the classification
model has originally n classes, we have to add only n classification nodes to the graph for training. If new examples of
classes already included in the model are introduced, only
the corresponding nodes are trained with the new data. The
data used to train the nodes are the support vectors stored
in the classification models. The use of support vector machines allows a fast training of the new nodes, since they
constitute small fraction of representative examples from
the original database. For example, consider Fig. 2. There
the data used to train the node labelled ‘1 vs. 4’ are the new
Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005
IEEE
References
Figure 7. The interface used for storing and training of new data.
data belonging to class ‘4’, and the support vectors with label ‘1’ stored in the nodes containing the label ‘1’, i.e. nodes
labeled ‘1 vs. 3’ and ‘1 vs. 2’.
4. Comments and further work
In this article we introduced a new version of our recognition system for on-line handwritten mathematical expressions. The method is capable to recognize matrices and
other tabular arrangements. It allows the processing of online handwriting for matrix computations, for solving systems of equations, and for function plotting.
The part corresponding to the recognition of symbols
is capable to grow up “on-line” using the support vectors
stored in the original model to train new classification models. Some authors argue that the use of only the support vectors for training new classifiers is not the best choice because doing so classification rates can get worse. To overcome such difficultines, we plan to implement some of the
algorithms of incremental learning for support vector classification [1].
We plan also to include in the wizard some automatic
method for the calculation of the parameters used for delimiting regions around symbols, used for the structural analysis of the mathematical expressions. We think that such parameters are strongly user-dependant, but a learning procedure included in the wizard could help to find the optimal
parameters for each user.
We do not provide a comparison with other methods,
because a benchmark for the recognition of mathematical
expressions does not exist. We will work towards defining
such a benchmark.
[1] G. Cauwenberghs and T. Poggio. Incremental and Decremental Support Vector Machine Learning. In Advances in
Neural Information Processing Systems (NIPS*2000), volume 13, 2001.
[2] C. Chang and C. Lin. LIBSVM: A Library for Support Vector
Machines, 2001. http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
[3] M. Fujimoto, T. Kanahori, and M. Suzuki. Infty Editor A Mathematics Typesetting Tool with a Handwriting Interface and a Graphical Front-End to OpenXM Servers. In Proceedings of the RIMS Worshop “Computer Algebra - Algorithms, Implementations and Applications”, volume 1335,
pages 217–226, 2003.
[4] I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and
S. Janet. UNIPEN Project for On-Line Data Exchange and
Recognizer Benchmarks. In Proc. of the 12th Conference on
Pattern Recognition (ICPR94), pages 29–23, 1994.
[5] J. Hebert, M. Parizeau, and N. Ghazzali. A New Fuzzy
Geometric Representation for On-Line Isolated Character
Recognition. In Proceedings of the 14th International Conference on Pattern Recognition, pages 33–40, 1998.
[6] J. Hu, S. Lim, and M. Brown. HMM Based Writer Independent On-Line Handwritten Character and Word Recognition.
In Proceedings of IWFHR-6, pages 143–155, 1998.
[7] L. Knipping. An Electronic Chalkboard for Classroom and
Distance Teaching. PhD thesis, Freie Universität Berlin, Institut für Informatik, February 2005.
[8] N. Matsakis. Recognition of Handwritten Mathematical Expressions. Master’s thesis, Massachusetts Institute of Technology, Cambridge, MA, May 1999.
[9] M. Parizeau, A. Lemieux, and C. Gagné. Character Recognition Experiments using Unipen Data. In Proceedings of
the 6th International Conference on Document Analysis and
Recognition (ICDAR), pages 481–485, 2001.
[10] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large Margin DAGs for Multiclass Classification. Advances in Neural
Information Proccesing Systems, 12:547–553, 2000.
[11] S. Smithies, K. Novins, and J. Arvo. A Handwriting-Based
Equation Editor. In Proceedings of Graphics Interface, 1999.
[12] E. Tapia. Understanding Mathematics: A System for the
Recognition of On-Line Handwritten Mathematical Expressions. PhD thesis, Freie Universität Berlin, Institut für Informatik, December 2004.
[13] E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Expressions using a Minimum Spanning
Tree Contruction and Symbol Dominance. In J. Lladós and
Y.-B. Kwon, editors, Graphics Recognition, volume 3088 of
LNCS. Springer, July 2004.
[14] L. Wenzel and H. Dillner. MathJournal – An Interactive Tool
for the Tablet PC. http://www.xthink.com.
[15] I. H. Witten and E. Frank. Data Mining: Practical Machine
Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, 2000.
[16] R. Zanibbi, D. Blostein, and J. Cordy. Recognizing Mathematical Expressions Using Tree Transformation. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
24(11), November 2002.
Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005
IEEE
Download