Historical Document Analysis Mini-Project Processing Historical Document image follow the following procedures 1. 2. 3. 4. 5. Image De-noising and Binarization Page Layout Analysis Feature Extraction Classification Applications The input dataset is a set of images in one folder. The system should record for each manuscript its main properties in XML files that include title, author, writer, period, region, etc. Applying a procedure on a manuscript of set of images should also be recorded in an XML file or log file. This is important to determine the applicability of a procedure as some procedure is pre-requisite for others; e.g., it is mandatory to extract/compute feature before applying classification. The control of the flow, while respecting the pre-requisite is maintained by the GUI. For each one of the above mentioned procedures there are several algorithms that vary in accuracy and speed. Thus, we plane to implement multiple algorithms for each procedure and let the user, via the GUI, select one of these algorithms. Here is the projects list, which is derived from the procedure list 1. 2. 3. 4. 5. 6. Graphics User Interface GUI Image denoising and binarization will be done by the GUI group, as it is just a call for an OpenCV function. Page Layout Analysis a. Manual Page Layout Analysis- This should be done in close collaboration with the GUI group b. Automatic Page Layout Analysis- Require some research and it is good for those who would like to play with algorithms Feature Extraction a. Define the set and parameters for various features, in this we will handle pixel based and contour-based features, using XML file and determine the applicable classifiers. b. Compute features and set the appropriate fields in the XML file in (a) Classification a. Integrate classification libraries into the systems b. Pass the features to the appropriate classifier Application a. Search for a work in document b. Scrip recognition c. Compare two scripts