A Brief Description of Recognition Fraction and other Objects from

advertisement
Observations and a partial prescription for recognition of fractions and
other non-linear written mathematics
Kevin Lin
Richard Fateman
Background
Our goal for SKEME[] is to develop a multimodal application that incorporates speech
and handwriting for the input of mathematical symbols. To simplify the implementation,
we are using packaged programs for the lower level recognition, namely Microsoft
Speech SDK and Microsoft Tablet PC SDK. Here we consider only the handwriting
component, and consider aspects of how to use handwriting as a sole-source for
describing mathematics.
Microsoft Tablet PC SDK is reasonably accurate for input of regular horizontal
characters when they are written linearly. In some modes it also recognizes, by layout
A
analysis, expressions written on several lines. This means that
becomes an expression
B
written on three lines, A-B . (As an aside, the restriction to one line does not solve all
problems.
is most likely to be recognized as “at b” because the default recognizer
prefers words. We hope we can teach the system to prefer + over t.) A possible approach
to solve this problem is to collect the ink strokes such as
and then, based on the
bounding boxes, separate in a functional manner, the ink strokes that constitute parts of a
fraction. After separating the ink strokes into horizontal components, we can pass the
results (perhaps more than once) into the handwriting recognizer, which need only deal
with mathematical expressions of symbols that contain elements oriented horizontally.
We also note that the handwriting recognizer is, in its usual applications, a substitute for
keyboard input. There is no obvious notation on the keyboard to indicate vertical
concatenation of characters, and so we have left the realm of “handwriting to text” unless
we allow for results like \frac{A}{B}, a text version in “TeX” of a built-up fraction.
What we have so far…
So far, K. Lin has a demonstration program to show this approach is feasible. Given a
collection of strokes (a data structure in the context of the Microsoft system), we can look
for horizontal bars. If one is detected, the algorithm can search the remaining elements
for objects whose bounding boxes fall within the left and right boundaries of the
horizontal line (alleged to be a divide bar). If the bounding boxes do not fall (mostly)
within the left and right boundaries, then they are not candidates for numerator or
denominator. Furthermore, the objects need to be in close proximity with the fraction bar
(See diagram below)
Not Part of a Fraction
(too far from horizontal bar)
Not Part of a Fraction
(too far left)
Part of a Fraction
Probably Part of a Fraction
(borderline case)
Horizontal Line
Not Part of a Fraction
(too far from horizontal bar)
Tolerance
It is very unlikely that you will draw a perfectly straight horizontal line. Therefore we
introduce a tolerance level to give leeway for the user’s imperfections in handwriting
recognition. For experimentation we have set the default tolerance level to be 30
(display) pixels, but in a more general approach we should have a percentage calculation,
say that any box with height to width ratio exceeding some tolerance (say 0.1) is likely to
be a horizontal bar. (Note that stylus/pad resolution is much higher than display
resolution.)
Numerator.Right
Numerator.Top
Numerator.Left
Numerator.Bottom
HLine.Top
HLine.Height
Denominator.Top
Denominator.Left
HLine.Bottom
Denominator.Right
Denominator.Bottom
For our prototype we overloaded the same parameter of tolerance level, default 30
pixels, to define how close the objects need to be to the bottom or the top of the
horizontal line. A tolerance of 30 indicates that the numerator can extend no more than
30 pixels lower than the top of the horizontal bar, and extend upward no more than 60
pixels higher than the top of the horizontal bar. Similarly, a tolerance of 30 pixels means
that the denominator of the fraction can rise no further than 30 pixels above the bottom of
the horizontal bar, and a denominator of the fraction needs to be within 60 pixels of the
bottom of the horizontal bar. Example:
t = tolerance (example t = 30)
Horizontal Line
Maxheight = t
Numerator
Denominator
HLine.Top >
(Numerator.Bottom - t)
AND HLine.Top <
Numerator.Bottom + 2*t
HLine.Bottom <
(Denominator.Top + t)
AND HLine.Top >
Denominator.Top - 2*t
The horizontal line can have
a maximum height of 30.
If HLine.Top = 300, find
numerator between 240 and
330
If the HLine.Bottom = 300,
find denominator between
270 and 360.
Where we go from here…
This characterization is far too simplistic. First, the horizontal bar may not be a fraction at
all, but one of the (many) other uses of horizontal lines. There are other reasons for
horizontal strokes. “+” has a horizontal stroke. So also do E, T, =, ~, ≤,≠,,±, etc.
The handwriting recognizer should probably be used first, and if it produces an answer
with high confidence, that should be our result. In the case of low confidence, we can
embark on a segmentation procedure.
The numerator and denominator have horizontal extent as well: we hypothesize that a
reasonably careful writer will try to make the bar wider than the numerator or
denominator, but this may not actually be possible if the bar is draw, as is commonly
done, before the denominator (which may be large). The parts of the fraction may also be
in part further away, as would be the case of exponents in the numerator or denominator.
A more thorough examination of the possibilities in the context of OCR (static pages) has
been given by Fateman [].
In the case of handwriting we can use other information not available to the OCR
program. In particular, if we assume that the normal way of writing a fraction is to write,
in sequence, the numerator, divide bar, denominator, we can use temporal information for
grouping. That is, consider a sequence of symbols a1, a2, a3, divide bar, a4, a5. Assume
also that a1 and a5 are within the expected boxes for the numerator and denominator
respectively. Then we would likely place a2 and a3 also in the numerator, and a4 in the
denominator, because of their temporal order. We might be looking at
a1a2  a3
a5
a
4
Another advantage for handwriting recognition (vs. OCR) is that recognition of densely
typeset formulas requires, in general, separating lines of multi-line formulas. It is
implausible for a handwriting input program to have such multi-line inputs. It is also
possible (and advisable) to make some efforts to assure the writer that the computer has
understood the symbols as they are written: feedback on recognition, followed by
correction, is not part of the usual static OCR process (although this is supported by a
subsequent editing pass in some programs).
An all-encompassing procedure to accurately identify fractions and other markers that are
horizontal lines must have a substantial heuristic component.
Interlocking heuristics for detection of superscript and subscript for OCR have also been
previously described by Fateman []. The additional temporal factors are once again
available for handwriting: we expect handwriting of the base expression followed by
subscript and then superscript. If a bounding (base) box has a smaller bounding box to
the upper right, written after the base box, this can be tentatively identified as a
superscript. Unfortunately, bounding boxes are inadequate unless base-line calculations
are included (this information is available from the handwriting recognizer!) The
example we cite is to look at the bounding boxes for dp versus pd: which has the
subscript? and pd versus dp : which has the superscript? Knowledge of font, baseline
and upper/lower case is significant. On the positive side it may be possible to determine
the termination of a superscript (temporally) by a return to the baseline.
Conclusion
Conventional unconstrained handwriting recognition, even if given hints about
mathematical symbols and typical linear constructions, won’t do 2-D math, but can be
used as a component for recognition within bounding boxes derived from 2-D
arrangements.
PSEUDOCODE / Fraction detection
For all bounding boxes of strokes
Collect P = all plausible horizontal lines
Collect from P, a subset Q = all plausible divide bars
For each item I in Q, examine expressions (possibly multiple strokes)
in bounding boxes mostly above I for possible recognition as
individual characters or symbols in numerators; similarly for denominators.
Order by stroke time; collect plausible numerators and denominators.
References
R. Fateman, Taku Tokuyasu, Benjamin P. Berman, and Nicholas Mitchell:
``Optical Character Recognition and Parsing of Typeset Mathematics.”
Journal of Visual Communication and Image Representation vol 7 no. 1
(March 1996), 2-15.
Download