UNIVERSITY OF MACEDONIA ECONOMIC AND SOCIAL SCIENCES Support and Inclusion of students with disabilities at higher education institutions in Montenegroz Konstantinos Charitakis, PhD kcharitakis@uom.gr 28 / 8 / 2012 Overview Accessible Digital Books • Scanning of book • Optical Character Recognition – OCR • Text editing and formatting Konstantinos Charitakis, PhD 28 / 8 / 2012 STEP 1 – Book Scanning Let us assume that we have a printed book or a hard copy handout. • Scanning equipment - use a scanner device in order to scan the book and produce images of its pages. – Fast scanner – Document feeder • Scanning Analysis - Set scanning analysis at 300 dpi. – Important Notice: scanning analysis higher than 300 dpi will result to a lot of “garbage” during the OCR process. • Output format - save the scanned images in PDF format. • PDF editing - create a single PDF file using a PDF editing software. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 2 – Optical Character Recognition (OCR) At this phase we import the scanned pages (images of pages), to an OCR software and convert the printed text to machine-encoded text. • Optical Character Recognition (OCR) is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. • OCR software - FineReader11. Among others it is one of the best in the market in terms performance with the Greek language and provided functionalities. • Software training – when needed – When book has strange fonts e.g. handwriting or slim and compressed font styles – Most OCR software packages have the training functionality – Training rules can be saved as Templates and be reused – Time saving process Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 2 – OCR Output At this stage it is very important to consider for WHOM we want to make the digital book accessible for. • Two choices - There are two choices at this stage that lead to two different outputs (Matrices). – 1st Matrix (Plain Text ONLY) OCR will clean all text formatting (Bold, italics, underlined, size, headers, images, tables, lists etc.) and keep only PLAIN TEXT. • This is very useful when we want to create a digital audio book only for blind people. – 2nd Matrix (keep content’s structure and formatting) OCR will keep Headings, headers, footers, references, page numbering, images, columns, shapes, captions, etc. and create a WORD document with identical text structure as the PDF. • This is very useful for Large Scale Print e.g. Α3 size pages suitable for individuals with other disabilities. • OCR Output Format – Save the OCR output as text file formats e.g. .txt, .doc Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process 1st Matrix - Plain Text Editing This is text after we have stripped off all text formatting. • Text Editor software – e.g. Microsoft Office Word. • Text restructuring - In order to make the book accessible for blind people we need to recover some of its structure. – OCR s/w has functionalities to keep some structure but even for the smallest thing we will not have the control of how exactly the software is going to do it and at the end you will end up cleaning it anyway. – Experience showed that the procedure to recreate the structure is much faster than to clean the errors/garbage from structure kept by OCR. – Either we clean all or keep all. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process 1st Matrix - Plain Text Editing (2) • Text Formatting – start from table of contents. – We look at the table of content of our original book and put appropriate style and formatting (Heading 1, Heading 2 etc.) in the text. – There is functionality in WORD that creates the table of content automatically. We use this functionality and then we delete the old one. – In case we have a document without table of contents we just format its original headings in the resulted text. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process 1st Matrix - Plain Text Editing (3) • Page numbering – enter page numbering only in special cases (e.g. in academic documents or handouts) – Page numbering is performed by strictly using WORD’s Insert/ New page functionality where needed. – Not by pressing ENTER or enter sections etc. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process OCR - Common Errors Then we start actual text editing by checking and correcting possible errors from the OCR proccess. Common OCR errors - based on their frequency of occurrence • Syllabification – annoying for blind users. – Solution • OCR functionality to handle this before the conversion to PLAIN TEXT during the OCR process. • FIND hyphens followed by and line change OR hyphen followed by space and DELETE. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process OCR - Common Errors (2) • Wrong letter recognition - OCR understands a letter as another. – Solution • FIND letter and REPLACE one by one so the correct ones will not be replaced. • Check for errors in relation to the neighbouring letters. • Line changing - OCR enters lines where it shouldn’t. – Solution • FIND paragraph character without a foul stop before and DELETE. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process OCR - Common Errors (3) • Continuous ENTER – Solution • FIND two contiguous ENTER characters and DELETE. • Continuous Spaces - are forbidden. – Solution • FIND two contiguous spaces and DELETE. • Tabs - entered usually where lists, tables included. – Solution • FIND Tab and DELETE. • Wherever we want to enter indentation we do it with the indentation functionality in WORD (Format/ Paragraph/ Indentation). Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process OCR - Common Errors (4) • Spell checking - the most time consuming process. – Solution • Perform spell checking with WORD functionality for the whole document. • However in special occasions we need to include the mistakes as is. Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process Special Characters • • • • • • • • • • • • • • Three continuous full stops: … Three continuous full stops with space in between: . . . Two continuous full stops : .. Two continuous full stops with space in between: . . . Opening bracket: ( Closing bracket: ) Left apostrophe: ‘ Right apostrophe: ’ Double intonation: ¨ Opening quotation marks: « Closing quotation marks : » Asterisc: * Paragraph: ^p OR ^13 Tab: ^t OR ^9 Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process Special Characters (2) • • • • • • • • • • • • Long dash ( — ): ^+ Dash ( – ): ^= (dash with space) Exponent ( ^ ): ^^ Greater than: > Smaller than: < Semicolon: · Line change: ^l OR ^11 Column change: ^n OR ^14 Page change : ^m Space without interuption ( ° ): ^s Hyphen ( — ): ^+ Optional hyphen ( ¬ ): ^- Konstantinos Charitakis, PhD 28 / 8 / 2012 Other Issues Other issues that we need to considered depending on the use of the book. • Footnotes - are placed at the end of each page, at the end of each chapter or at the end of the book. – Solution • Follow conventions of putting them all at the end of text. • WORD functionality Insert/ Reference/ Footnote adds a number in the text and in the footnote with continuous numbering. • When the screen reader software comes up to a footnote it goes to the footnote using the hyperlink, it reads it and then returns back. • Place footnotes at the end of the book where needed, so it will be the user’s choice if he want to read it or not. • Page numbering - delete it OR include it only in special document cases as mentioned earlier e.g. Academic, Law books etc. Konstantinos Charitakis, PhD 28 / 8 / 2012 Other Issues (2) • Bibliography - In academic writings we have to keep it. – We have to follow a convention when we include it. – Using a link might be difficult because the screen reader will go at the end of the book to read it but it will not be able to go back to continue reading. – Citation and references are included in the same way as they are included in the original book. – If there are no footnotes in the original book someone may use this functionality to add references. Konstantinos Charitakis, PhD 28 / 8 / 2012 Other Issues (3) • Index – There is no need to keep the index since someone can use the FIND functionality of WORD. – However in some cases it is necessary to include Index as for example in Law books. • Images – There are no images in the Plain Text output and they have no mean for the blind people. – Instead use a description of images • Beginning of Image 1 – insert caption – End of image 1 Followed by a text description of the image. • Beginning of description of Image 1 – insert description – End of description of image 1. • Describing an image/picture for blind people is a big chapter that we re not going to analyse it at this phase. Konstantinos Charitakis, PhD 28 / 8 / 2012 Other Issues (4) • Conventions - It is good practice to include a list of the conventions used in the text at the beginning of the book (e.g. back of the cover page). E.g. for footnotes, table of contents, images etc. • Versioning - Good practice to keep versioning scheme. – Plain Text matrix we create many versions of the same text. • At the end we end up with one WORD file with automated table of contents, page numbering (where applicable), footnotes, bibliography, image descriptions that is already accessible. Konstantinos Charitakis, PhD 28 / 8 / 2012 File Formats • Text format – Save file in different text file formats e.g. .doc, .txt, .html, .xml • Audio file format – With Text to Speech software it can be recorded to an audio file format e.g. .wav, .mp3 etc. • Braille ready file format – A Braille Ready format (.brl file type) from Word. – Each Braille printer has its own software that converts and save the text in Braille Ready file. - Konstantinos Charitakis, PhD 28 / 8 / 2012 Step 3 - Text Editing Process 2nd Matrix – Structured Text Editing This is the case where we edit text that has preserved its formatting and structure after the OCR process. • Target group - The result of this procedure can be used by individuals with visual impairment or other disabilities. • Allows the users additional options and functionalities - e.g. search, navigation, enlarged text, it gives the opportunity to someone to be able to use a screen reader in order to listen to the text while he reads it. • Contrast and Color management - It might be necessary to adjust the contrast or the color of the fonts and background e.g. white letters in black background etc. depending on each user’s needs. • Error handling - We need to perform error checking and correct them in the same way as it was described earlier in 1st Matrix. Konstantinos Charitakis, PhD 28 / 8 / 2012 Thank you! Konstantinos Charitakis, PhD 28 / 8 / 2012 • . Konstantinos Charitakis, PhD 28 / 8 / 2012 • . Konstantinos Charitakis, PhD 28 / 8 / 2012 Konstantinos Charitakis, PhD 28 / 8 / 2012