advertisement

Observations and a partial prescription for recognition of fractions and other non-linear written mathematics Kevin Lin Richard Fateman Background Our goal for SKEME[] is to develop a multimodal application that incorporates speech and handwriting for the input of mathematical symbols. To simplify the implementation, we are using packaged programs for the lower level recognition, namely Microsoft Speech SDK and Microsoft Tablet PC SDK. Here we consider only the handwriting component, and consider aspects of how to use handwriting as a sole-source for describing mathematics. Microsoft Tablet PC SDK is reasonably accurate for input of regular horizontal characters when they are written linearly. In some modes it also recognizes, by layout A analysis, expressions written on several lines. This means that becomes an expression B written on three lines, A-B . (As an aside, the restriction to one line does not solve all problems. is most likely to be recognized as “at b” because the default recognizer prefers words. We hope we can teach the system to prefer + over t.) A possible approach to solve this problem is to collect the ink strokes such as and then, based on the bounding boxes, separate in a functional manner, the ink strokes that constitute parts of a fraction. After separating the ink strokes into horizontal components, we can pass the results (perhaps more than once) into the handwriting recognizer, which need only deal with mathematical expressions of symbols that contain elements oriented horizontally. We also note that the handwriting recognizer is, in its usual applications, a substitute for keyboard input. There is no obvious notation on the keyboard to indicate vertical concatenation of characters, and so we have left the realm of “handwriting to text” unless we allow for results like \frac{A}{B}, a text version in “TeX” of a built-up fraction. What we have so far… So far, K. Lin has a demonstration program to show this approach is feasible. Given a collection of strokes (a data structure in the context of the Microsoft system), we can look for horizontal bars. If one is detected, the algorithm can search the remaining elements for objects whose bounding boxes fall within the left and right boundaries of the horizontal line (alleged to be a divide bar). If the bounding boxes do not fall (mostly) within the left and right boundaries, then they are not candidates for numerator or denominator. Furthermore, the objects need to be in close proximity with the fraction bar (See diagram below) Not Part of a Fraction (too far from horizontal bar) Not Part of a Fraction (too far left) Part of a Fraction Probably Part of a Fraction (borderline case) Horizontal Line Not Part of a Fraction (too far from horizontal bar) Tolerance It is very unlikely that you will draw a perfectly straight horizontal line. Therefore we introduce a tolerance level to give leeway for the user’s imperfections in handwriting recognition. For experimentation we have set the default tolerance level to be 30 (display) pixels, but in a more general approach we should have a percentage calculation, say that any box with height to width ratio exceeding some tolerance (say 0.1) is likely to be a horizontal bar. (Note that stylus/pad resolution is much higher than display resolution.) Numerator.Right Numerator.Top Numerator.Left Numerator.Bottom HLine.Top HLine.Height Denominator.Top Denominator.Left HLine.Bottom Denominator.Right Denominator.Bottom For our prototype we overloaded the same parameter of tolerance level, default 30 pixels, to define how close the objects need to be to the bottom or the top of the horizontal line. A tolerance of 30 indicates that the numerator can extend no more than 30 pixels lower than the top of the horizontal bar, and extend upward no more than 60 pixels higher than the top of the horizontal bar. Similarly, a tolerance of 30 pixels means that the denominator of the fraction can rise no further than 30 pixels above the bottom of the horizontal bar, and a denominator of the fraction needs to be within 60 pixels of the bottom of the horizontal bar. Example: t = tolerance (example t = 30) Horizontal Line Maxheight = t Numerator Denominator HLine.Top > (Numerator.Bottom - t) AND HLine.Top < Numerator.Bottom + 2*t HLine.Bottom < (Denominator.Top + t) AND HLine.Top > Denominator.Top - 2*t The horizontal line can have a maximum height of 30. If HLine.Top = 300, find numerator between 240 and 330 If the HLine.Bottom = 300, find denominator between 270 and 360. Where we go from here… This characterization is far too simplistic. First, the horizontal bar may not be a fraction at all, but one of the (many) other uses of horizontal lines. There are other reasons for horizontal strokes. “+” has a horizontal stroke. So also do E, T, =, ~, ≤,≠,,±, etc. The handwriting recognizer should probably be used first, and if it produces an answer with high confidence, that should be our result. In the case of low confidence, we can embark on a segmentation procedure. The numerator and denominator have horizontal extent as well: we hypothesize that a reasonably careful writer will try to make the bar wider than the numerator or denominator, but this may not actually be possible if the bar is draw, as is commonly done, before the denominator (which may be large). The parts of the fraction may also be in part further away, as would be the case of exponents in the numerator or denominator. A more thorough examination of the possibilities in the context of OCR (static pages) has been given by Fateman []. In the case of handwriting we can use other information not available to the OCR program. In particular, if we assume that the normal way of writing a fraction is to write, in sequence, the numerator, divide bar, denominator, we can use temporal information for grouping. That is, consider a sequence of symbols a1, a2, a3, divide bar, a4, a5. Assume also that a1 and a5 are within the expected boxes for the numerator and denominator respectively. Then we would likely place a2 and a3 also in the numerator, and a4 in the denominator, because of their temporal order. We might be looking at a1a2 a3 a5 a 4 Another advantage for handwriting recognition (vs. OCR) is that recognition of densely typeset formulas requires, in general, separating lines of multi-line formulas. It is implausible for a handwriting input program to have such multi-line inputs. It is also possible (and advisable) to make some efforts to assure the writer that the computer has understood the symbols as they are written: feedback on recognition, followed by correction, is not part of the usual static OCR process (although this is supported by a subsequent editing pass in some programs). An all-encompassing procedure to accurately identify fractions and other markers that are horizontal lines must have a substantial heuristic component. Interlocking heuristics for detection of superscript and subscript for OCR have also been previously described by Fateman []. The additional temporal factors are once again available for handwriting: we expect handwriting of the base expression followed by subscript and then superscript. If a bounding (base) box has a smaller bounding box to the upper right, written after the base box, this can be tentatively identified as a superscript. Unfortunately, bounding boxes are inadequate unless base-line calculations are included (this information is available from the handwriting recognizer!) The example we cite is to look at the bounding boxes for dp versus pd: which has the subscript? and pd versus dp : which has the superscript? Knowledge of font, baseline and upper/lower case is significant. On the positive side it may be possible to determine the termination of a superscript (temporally) by a return to the baseline. Conclusion Conventional unconstrained handwriting recognition, even if given hints about mathematical symbols and typical linear constructions, won’t do 2-D math, but can be used as a component for recognition within bounding boxes derived from 2-D arrangements. PSEUDOCODE / Fraction detection For all bounding boxes of strokes Collect P = all plausible horizontal lines Collect from P, a subset Q = all plausible divide bars For each item I in Q, examine expressions (possibly multiple strokes) in bounding boxes mostly above I for possible recognition as individual characters or symbols in numerators; similarly for denominators. Order by stroke time; collect plausible numerators and denominators. References R. Fateman, Taku Tokuyasu, Benjamin P. Berman, and Nicholas Mitchell: ``Optical Character Recognition and Parsing of Typeset Mathematics.” Journal of Visual Communication and Image Representation vol 7 no. 1 (March 1996), 2-15.