InftyReader Group, Inc.
2809 Bohlen Drive
Hilliard, Ohio 43026
United States
Phone: (614) 777-0660
Fax: (614) 259-0013
TTY: (800) 750-0750
http://apps4android.org
InftyReader: An OCR System for Math Documents
by:
Masakazu Suzuki and Katsuhito Yamaguchi
The Infty Project and Science Accessibility Net
and,
John Gardner
ViewPlus Technologies, Inc.
Updated on June 25, 2011
Table of Contents
Table of Contents .................................................................................... 1
Overview ............................................................................................... 2
InftyReader Features ................................................................................ 2
Output Formats ....................................................................................... 2
System Requirements ................................................................................ 3
Installation Procedure ............................................................................... 3
License ................................................................................................. 3
Optimizing the Quality of OCR Recognition ...................................................... 3
Step by Step Instructions for Using InftyReader ................................................. 4
Feedback .............................................................................................. 6
Overview
Optical character recognition (OCR) technologies are invaluable for improving access to
printed materials by people with print disabilities.Most do not work well with scientific
content. Scientific documents typically include mathematical and other special symbols
that standard OCR does not recognize. In addition, no standard commercial OCR
application can recognize a two-dimensionally-structured math equation and convert it
to a standard software format.
The InftyReader OCR application can properly recognize scientific documents scanned
from paper or in PDF format. InftyReader recognizes complicated math expressions,
tables, graphs and other technical notations and converts them to accessible formats.
InftyReader can be used by people with print disabilities in combination with the
ChattyInfty accessible scientific editor application. ChattyInfty provides speech access
for reading and writing math and editing the output of InftyReader. Sighted people can
use InftyReader with the free Infty Editor to edit InftyReader output and produce
accessible scientific content.
InftyReader and ChattyInfty are sold by InftyReader Group, Inc. Their website is:
http://inftyreader.org
InftyReader Features

It uses the "ExpressReaderPro", OCR engine from Toshiba Corporation and the
"WinReader" OCR application from MediaDrive Corporation simultaneously to
recognize characters in regular text.

It uses an OCR engine developed by Infty to recognize math and scientific
formulas.

It can recognize tables containing math expressions.

It can convert black and white scanned documents and PDF files

It recognizes individual pages in a PDF file

It is licensed for business/commercial use
Output Formats
InftyReader can output a recognition result in any of the following formats

IML

LaTeX

HR-TeX

XHTML+MathML

Microsoft Word 2007 (XML)
IML is a XML file format developed expressely for InftyEditor and ChattyInfty. By default
InftyReader will save results in IML. The original image is retained and can be displayed
Copyright © 2008-2011 by InftyReader Group, Inc. All rights Reserved.
Page 2 of 6
with either Infty Editor or ChattyInfty. In ChattyInfty, the image can be accessed
tactually through an on-line graphics display (available from KGS Japan) or by embossing
it on a ViewPlus embosser. Consequently Infty Editor and ChattyInfty users may compare
results with the original image and make corrections as necessary. These editors can also
convert the result into any of the formats listed above.
Other than IML, allInfty formats except HR-TeX are standard mainstream forms. HR-TeX
(Human-Readable TeX), is an abbreviated LaTeX-like notation developed to be more
easily readable than standard LaTeX.
System Requirements
InftyReader and ChattyInfty require Windows XP, Vista, or Windows7 operating systems,
32 or 64 bit. Microsoft Internet Explorer7 or later must be installed. In order to correct
OCR errors and edit documents, we strongly recommend that the free Infty editor be
installed for use by sighted people or the ChattyInfty editor for use by people with print
disabilities.
Installation Procedure
The initial InftyReader or ChattyInfty downloaded archive is a zip file. Extract the
contents into any convenient folder. One file is named, "setup.exe." Run this file to
install InftyReader or ChattyInfty. Note that administrator privileges are required to
install applications.
License
The InftyReader license is included in the download archive as "License_E.txt". Please
read it!
Optimizing the Quality of OCR Recognition
InftyReader can recognize only a high quality black-and-white (binary) image. For paper
documents, it is very important to scan the document in black and white (binary) mode
and to use a resolution of at least 400 dpi. 600 dpi is recommended for best results. The
paper should be flat and carefully aligned in the scanner to avoid images that are fuzzy,
skewed, or slanted. If possible, pages in books should be cut from the binding so they
will lie flat on the scanner. Save the scanned files as TIFF, GIF or PNG format.
Recognizing math characters requires much higher quality images than does standard
OCR, and poor quality images will give correspondingly poor results. Heavy users of
InftyReader can improve the quality of recognition by editing scanned images to remove
small extraneous scan defects and artifacts. Recognition can be improved by optimizing
the scanner threshold so that fewer than 1% of characters are broken or touch other
characters.
InftyReader subdivides the document into text, math, tables, and figures and then uses
different procedures to recognize each. Users can improve the quality of recognition by
hand-editing images to ensure that the content flows properly. For example, cutting
Copyright © 2008-2011 by InftyReader Group, Inc. All rights Reserved.
Page 3 of 6
columns apart and arranging them in proper sequence is recommended when pages are
partially columnized.
One common problem that needs to be avoided with scanned images is a dark frame that
can appear around the page. This problem is caused by non-white area above the page
during scanning. It can be avoided by placing a large white paper over the page being
scanned. The paper should be large enough to cover the entire scanner surface. Images
with such problems can be repaired by removing the offending dark frame in a good
image editor. Be careful not to reduce the image dpi during such a process.
A PDF file also can be recognized. Normal PDF files have characters represented in fonts
and are subject to fewer OCR problems than scanned images. Always obtain a PDF if
possible. Articles in most scientific journals are available as PDF.
Step by Step Instructions for Using InftyReader
InftyReader is a GUI application. It can be used in command mode in the Console
Window by running Infty.exe from the Infty program folder. This tutorial is restricted to
the GUI version. Command mode use is covered in the InftyHelpE.txt file included in the
archive.

Step 1. Start InftyReader. For Windows7, type InftyReader in the Search box that
takes focus when you go to Start (by pressing the Windows key or CTRL-ESC), and
press Enter. For older Windows operating systems, find InftyReader in the
Program menu and press Enter. You will find a number of buttons and edit boxes
if you TAB around the initial InftyReader screen. Use space bar to press buttons,
not Enter. The screen has these elements:
o
Three buttons giving you the choice of selecting a file to recognize, a
folder to recognize, or to scan in a document.
o
A read-only box that will show the current input file/folder name. It is
initially empty.
o
A radio selection box permitting you to select the desired input format.
Choices are Tiff, BMP, GIF, PNG, and PDF.
o
A read-only box showing the output file/folder name. It is initially empty.
o
A radio selection box giving you the choice of output format. Choices are
IML, LaTeX, HR-TeX, XHTML(MathML), and Microsoft Word 2007 (XML).
o
A yes/no choice of whether to open the output file in the appropriate
application. Default is "no".
o
Choice of whether to put a NewLine indicator at the end of each
paragraph. Default is to do so.
o
The "Start OCR" button that should be pressed once all selections are made.
o
An Exit button that closes InftyReader
o
A choice of math level as "all math symbols" or "High school math symbols".
Select the latter if the math is simple, because recognition will be better.
o A choice of using InftyReader in English or Japanese.
Copyright © 2008-2011 by InftyReader Group, Inc. All rights Reserved.
Page 4 of 6

Step 2 is to set the various options and then click on the choice of file, folder, or
scan input.
o
If you choose to convert a single file, click the "file" button by pressing
space bar or left-clicking the mouse. This opens a standard Windows "open"
dialog that permits you to type in the file name or browse for it. There is a
choice of keeping this file as "read-only", and an "open" button. After
selecting the file, press the "open" button. After a short processing delay,
the initial screen re-appears with the input and default output file names
filled in. You may edit the output file name or browse to another folder if
you wish.
o
If you click on "folder", you open a selection tree that permits you to
browse for the desired input folder. Once the folder is found, press "ok".
The initial screen re-appears with the input folder name filled in. The
Output file name box is empty. You cannot edit the output folder name. An
additional box appears, giving you the choice of whether to convert files in
sub-folders. Default for this "Search sub-folders" choice is off.

Step 3: Set the desired dpi level in the box giving an option of 600 or 400 dpi.
Then press the "Start OCR" button with space bar or mouse.

If you have selected LaTeX for the output format, after pressing the "Start OCR"
button, you will be given a long list of LaTeX options. If you are LaTeX-savvy, you
will recognize these options. If not, we recommend that you choose your desired
paper size and then accept all other default options and press "ok" to continue.

If you have chosen to convert a single file:

o
The output file will be in the same folder as the input file and, by default,
will have the same name except for extension. The extension will be
whatever was selected for output format. For example, File0.tiff will
produce output file File0.iml if iml is the output format, File0.tex if LaTeX
is the output format, etc.
o
You will find that there are other files produced by InftyReader as well, in
the same folder as the input and output files. At a minimum there will be a
log file, sometimes several log files. Other files will be produced,
depending on which output format is used.
If you have chosen to convert a folder:
o
InftyReader will convert all image files in that folder having the specified
input format. All output will be put into a single file having the name of
the folder with extension of the output. For example all files in the folder
Hi will be in the single file Hi.iml for iml output, Hi.tex for LaTeX output,
etc.
o
If the "Search sub-folders" box is checked, InftyReader will convert files in
all sub-folders as well. All output in sub-folder HiThere will be in HiThere
and gathered together into the single file HiThere.iml, etc.
Copyright © 2008-2011 by InftyReader Group, Inc. All rights Reserved.
Page 5 of 6
Feedback
Steve.jacobs @ inftyreader.org
This work is licensed under a Creative Commons Attribution 3.0 Unported License
Copyright © 2008-2011 by InftyReader Group, Inc. All rights Reserved.
Page 6 of 6