Presentation Procedure_of_accessible_books

advertisement
UNIVERSITY OF MACEDONIA
ECONOMIC AND SOCIAL SCIENCES
Support and Inclusion of students with disabilities at higher
education institutions in Montenegroz
Konstantinos Charitakis, PhD
kcharitakis@uom.gr
28 / 8 / 2012
Overview
Accessible Digital Books
• Scanning of book
• Optical Character Recognition – OCR
• Text editing and formatting
Konstantinos Charitakis, PhD
28 / 8 / 2012
STEP 1 – Book Scanning
Let us assume that we have a printed book or a hard copy handout.
• Scanning equipment - use a scanner device in order to scan the
book and produce images of its pages.
– Fast scanner
– Document feeder
• Scanning Analysis - Set scanning analysis at 300 dpi.
– Important Notice: scanning analysis higher than 300 dpi will
result to a lot of “garbage” during the OCR process.
• Output format - save the scanned images in PDF format.
• PDF editing - create a single PDF file using a PDF editing software.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 2 – Optical Character
Recognition (OCR)
At this phase we import the scanned pages (images of pages), to an
OCR software and convert the printed text to machine-encoded text.
• Optical Character Recognition (OCR) is the mechanical or
electronic conversion of scanned images of handwritten, typewritten
or printed text into machine-encoded text.
• OCR software - FineReader11. Among others it is one of the best in
the market in terms performance with the Greek language and
provided functionalities.
• Software training – when needed
– When book has strange fonts e.g. handwriting or slim and compressed
font styles
– Most OCR software packages have the training functionality
– Training rules can be saved as Templates and be reused
– Time saving process
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 2 – OCR Output
At this stage it is very important to consider for WHOM we want to
make the digital book accessible for.
•
Two choices - There are two choices at this stage that lead to two different
outputs (Matrices).
– 1st Matrix (Plain Text ONLY)
OCR will clean all text formatting (Bold, italics, underlined, size, headers, images,
tables, lists etc.) and keep only PLAIN TEXT.
• This is very useful when we want to create a digital audio book only for blind
people.
– 2nd Matrix (keep content’s structure and formatting)
OCR will keep Headings, headers, footers, references, page numbering, images,
columns, shapes, captions, etc. and create a WORD document with identical text
structure as the PDF.
• This is very useful for Large Scale Print e.g. Α3 size pages suitable for
individuals with other disabilities.
•
OCR Output Format – Save the OCR output as text file formats e.g. .txt,
.doc
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
1st Matrix - Plain Text Editing
This is text after we have stripped off all text formatting.
• Text Editor software – e.g. Microsoft Office Word.
• Text restructuring - In order to make the book accessible for blind
people we need to recover some of its structure.
– OCR s/w has functionalities to keep some structure but even for the
smallest thing we will not have the control of how exactly the software is
going to do it and at the end you will end up cleaning it anyway.
– Experience showed that the procedure to recreate the structure is much
faster than to clean the errors/garbage from structure kept by OCR.
– Either we clean all or keep all.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
1st Matrix - Plain Text Editing (2)
• Text Formatting – start from table of contents.
– We look at the table of content of our original book and put
appropriate style and formatting (Heading 1, Heading 2 etc.) in
the text.
– There is functionality in WORD that creates the table of content
automatically. We use this functionality and then we delete the
old one.
– In case we have a document without table of contents we just
format its original headings in the resulted text.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
1st Matrix - Plain Text Editing (3)
• Page numbering – enter page numbering only in special cases
(e.g. in academic documents or handouts)
– Page numbering is performed by strictly using WORD’s Insert/
New page functionality where needed.
– Not by pressing ENTER or enter sections etc.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
OCR - Common Errors
Then we start actual text editing by checking and correcting possible
errors from the OCR proccess.
Common OCR errors - based on their frequency of occurrence
• Syllabification – annoying for blind users.
– Solution
• OCR functionality to handle this before the conversion to PLAIN
TEXT during the OCR process.
• FIND hyphens followed by and line change OR hyphen followed by
space and DELETE.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
OCR - Common Errors (2)
• Wrong letter recognition - OCR understands a letter as another.
– Solution
• FIND letter and REPLACE one by one so the correct ones
will not be replaced.
• Check for errors in relation to the neighbouring letters.
• Line changing - OCR enters lines where it shouldn’t.
– Solution
• FIND paragraph character without a foul stop before and
DELETE.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
OCR - Common Errors (3)
• Continuous ENTER
– Solution
• FIND two contiguous ENTER characters and DELETE.
• Continuous Spaces - are forbidden.
– Solution
• FIND two contiguous spaces and DELETE.
• Tabs - entered usually where lists, tables included.
– Solution
• FIND Tab and DELETE.
• Wherever we want to enter indentation we do it with the
indentation functionality in WORD (Format/ Paragraph/
Indentation).
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
OCR - Common Errors (4)
• Spell checking - the most time consuming process.
– Solution
• Perform spell checking with WORD functionality for the whole
document.
• However in special occasions we need to include the
mistakes as is.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
Special Characters
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Three continuous full stops: …
Three continuous full stops with space in between: . . .
Two continuous full stops : ..
Two continuous full stops with space in between: . . .
Opening bracket: (
Closing bracket: )
Left apostrophe: ‘
Right apostrophe: ’
Double intonation: ¨
Opening quotation marks: «
Closing quotation marks : »
Asterisc: *
Paragraph: ^p OR ^13
Tab: ^t OR ^9
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
Special Characters (2)
•
•
•
•
•
•
•
•
•
•
•
•
Long dash ( — ): ^+
Dash ( – ): ^= (dash with space)
Exponent ( ^ ): ^^
Greater than: >
Smaller than: <
Semicolon: ·
Line change: ^l OR ^11
Column change: ^n OR ^14
Page change : ^m
Space without interuption ( ° ): ^s
Hyphen ( — ): ^+
Optional hyphen ( ¬ ): ^-
Konstantinos Charitakis, PhD
28 / 8 / 2012
Other Issues
Other issues that we need to considered depending on the use of the
book.
• Footnotes - are placed at the end of each page, at the end of each
chapter or at the end of the book.
– Solution
• Follow conventions of putting them all at the end of text.
• WORD functionality Insert/ Reference/ Footnote adds a number in
the text and in the footnote with continuous numbering.
• When the screen reader software comes up to a footnote it goes to
the footnote using the hyperlink, it reads it and then returns back.
• Place footnotes at the end of the book where needed, so it will be
the user’s choice if he want to read it or not.
• Page numbering - delete it OR include it only in special document
cases as mentioned earlier e.g. Academic, Law books etc.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Other Issues (2)
• Bibliography - In academic writings we have to keep it.
– We have to follow a convention when we include it.
– Using a link might be difficult because the screen reader will go
at the end of the book to read it but it will not be able to go back
to continue reading.
– Citation and references are included in the same way as they
are included in the original book.
– If there are no footnotes in the original book someone may use
this functionality to add references.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Other Issues (3)
• Index
– There is no need to keep the index since someone can use the
FIND functionality of WORD.
– However in some cases it is necessary to include Index as for
example in Law books.
• Images
– There are no images in the Plain Text output and they have no
mean for the blind people.
– Instead use a description of images
• Beginning of Image 1 – insert caption – End of image 1
Followed by a text description of the image.
• Beginning of description of Image 1 – insert description –
End of description of image 1.
• Describing an image/picture for blind people is a big chapter
that we re not going to analyse it at this phase.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Other Issues (4)
• Conventions - It is good practice to include a list of the conventions
used in the text at the beginning of the book (e.g. back of the cover
page). E.g. for footnotes, table of contents, images etc.
• Versioning - Good practice to keep versioning scheme.
– Plain Text matrix we create many versions of the same text.
• At the end we end up with one WORD file with automated table of
contents, page numbering (where applicable), footnotes,
bibliography, image descriptions that is already accessible.
Konstantinos Charitakis, PhD
28 / 8 / 2012
File Formats
• Text format
– Save file in different text file formats e.g. .doc, .txt, .html, .xml
• Audio file format
– With Text to Speech software it can be recorded to an audio file
format e.g. .wav, .mp3 etc.
• Braille ready file format
– A Braille Ready format (.brl file type) from Word.
– Each Braille printer has its own software that converts and save
the text in Braille Ready file. -
Konstantinos Charitakis, PhD
28 / 8 / 2012
Step 3 - Text Editing Process
2nd Matrix – Structured Text Editing
This is the case where we edit text that has preserved its formatting and
structure after the OCR process.
•
Target group - The result of this procedure can be used by individuals with
visual impairment or other disabilities.
•
Allows the users additional options and functionalities - e.g. search,
navigation, enlarged text, it gives the opportunity to someone to be able to
use a screen reader in order to listen to the text while he reads it.
•
Contrast and Color management - It might be necessary to adjust the
contrast or the color of the fonts and background e.g. white letters in black
background etc. depending on each user’s needs.
•
Error handling - We need to perform error checking and correct them in
the same way as it was described earlier in 1st Matrix.
Konstantinos Charitakis, PhD
28 / 8 / 2012
Thank you!
Konstantinos Charitakis, PhD
28 / 8 / 2012
• .
Konstantinos Charitakis, PhD
28 / 8 / 2012
• .
Konstantinos Charitakis, PhD
28 / 8 / 2012
Konstantinos Charitakis, PhD
28 / 8 / 2012
Download