ד`ר ראיד סעאבנה Historical Document Image Analyzing of Arabic

advertisement
‫ד'ר ראיד סעאבנה‬
Historical Document Image Analyzing of Arabic manuscripts
and Script Recognition
Document image analysis (DIA) refers to the process of converting a raster
image of a document pages (a matrix of pixels) to a symbolic form consisting
of textual (characters, digits, punctuation, words) and graphical (lines,
geometric shapes, etc.) objects. Document descriptions in terms of these
higher-level objects are significantly more compact than their image
counterparts. More importantly, the rich semantic content of such descriptions
makes it possible to manipulate these documents to serve a variety of uses
such as searching them for specific patterns or classifying and combining
them according to some criteria. Most DIA systems consist of, Image
Enhancement,
Binarization,
Page
layout
analysis
and
segmentation,
Prepossessing, Feature extraction, Classification, Clustering and Post
processing.
In this talk I will give a brief introduction to the field of HDIA and the Arabic
script and some DIA related fields we are researching, followed by results we
have on: 1) Language independent line extraction using seam carving
technique and 2) spotting Arabic words in historical documents using Dynamic
Time Warping and Chamfer distance.
Download