DEVELOPMENT OF IMAGE TO SPEECH/PDF CONVERSION USING DIGITAL IMAGE PROCESSING 1 ©M. S. Ramaiah University of Applied Sciences Outline • • • • • • • • • Introduction Motivation(Project concept and its relevance) Aims and Objectives Block diagram Methods and Methodology Project Concept Results Conclusions References 2 ©M. S. Ramaiah University of Applied Sciences Title Development of Image to Speech/PDF Conversion Using Digital Image Processing Aim To develop a conversion algorithm of image into speech format by the means of digital image processing for blind people. 3 ©M. S. Ramaiah University of Applied Sciences Objectives 1. To conduct literature survey for conversion of existing image to speech processing methodology . 2. To develop the image recognition algorithm 3. To develop speech conversion algorithm for the input image 4. To implement algorithms for extracting desired parameters 5. To integrate subsystems, test and evaluate the system for its effective functionality 4 ©M. S. Ramaiah University of Applied Sciences Block Diagram Overview of the system INPUT IMAGE IMAGE PREPROCESSING IMAGE TO TEXT CONVERTER TEXT TO AUDIO AUDIO AUDIO OUTPUT OUTPUT 5 ©M. S. Ramaiah University of Applied Sciences BLOCK DIAGRAM 6 ©M. S. Ramaiah University of Applied Sciences Methods and Methodology Objective No. Statement of the object Methods/ methodology Resources required 1. To conduct literature review various existing image to speech converting modules Literature review on different systems available for image to conversion for blind people both hardware and software modules Articles, Journals, Technical papers and, patents 2. To arrive at a functional block diagram and flow chart with subsystems Developing software based design for proposed system with functionality. 3. Extraction of region of interest from the input image (or text region) Maximally Stable Extremal Region(MSER)- Algorithm used to extract the text regions from the input image by varying the threshold value of image and MATLAB R2021a Stroke Width algorithm(SWT) –This algorithm is implemented to increases the efficiency and reliability of the image extracted using MSER algorithm. MATLAB R2021a ©M. S. Ramaiah University of Applied Sciences 7 Methods and Methodology Objectives no. Statement of the object Methods/ methodology Resources required 4. Character extraction from the extracted text image Optical Character Recognition technique(OCR) -Implemented to extract each character image features and thereby classifying them with respect to the pattern of image. MATLAB R2021a 5. Conversion of text to speech Speech synthesizerConversion of e-text to speech is incorporated using an interface which is known as speech SAPI(Win 32 SAPI) Microsoft SAPI, MATLAB R2021a 8 ©M. S. Ramaiah University of Applied Sciences Project Concept 1. Image pre-processing • This step helps to remove the Nosie present in the image so that , it can reduces errors to happen in the later stages . • RBG images are converted to grey scale images or black –white images 2. Maximally Stable Extremal Region • MSER varies with the threshold of the image, given some threshold value the pixels below that threshold value are white and all those above or equal are black. • The MSER feature detector works well for finding text regions because of the consistent colour and high contrast of text leads to stable intensity profiles. • The first step to implementing MSER is to sweep threshold of intensity from black to white performing a simple luminance threshold of the image. Once that is done extraction of the connected components or the Extremal Regions is performed. 9 ©M. S. Ramaiah University of Applied Sciences Project Concept • After that a threshold is found when an extremal region is maximally stable. In this project we have taken 3 as our threshold. • Remove Non-Text Regions Based On Basic Geometric Properties. • Finally, the regions descriptors as features of MSER are obtained. • Although the MSER algorithm detects most of the text, it also detects several other stable regions in the image that are not text. Stroke width algorithm helps to solve this. 3. Stroke Width Algorithm • • • The first step is the stroke width transform which is an operator which determines the width of the most likely stroke containing the pixel for each and every pixel. The output produces by the SWT is an image of the same size as of the input image where each element contains the width of the stroke associated with that pixel. We have now obtained a map of the most likely stroke-widths for each pixel in the original image. 10 ©M. S. Ramaiah University of Applied Sciences Project Concept • The next step is to group all these pixels into letter candidate which is done by selecting two neighbouring having similar stroke width, and then applying several rules to distinguish the letter candidates. 4. Optical Character Recognition. • Optical Character Recognition is process allows the application to automatically recognize a character through an optical technique. OCR is the process of translating acquired images of typewritten or printed text into digitally mutable information 5. Text To Speech Conversion. • Win 32 SAPI needs to be loaded to the computer, which converts text into speech. Desired voice and pace are set, which initializes the wave player for convert the text into speech. Finally the speech for given image is obtained. 11 ©M. S. Ramaiah University of Applied Sciences Conclusions • An approach for image to speech conversion using optical character recognition and speech synthesis is attempted . • The application developed is simple to use, very cost effective, portable and applicable in the real time • Tests have been conducted to check the conversion and good results have been achieved. 12 ©M. S. Ramaiah University of Applied Sciences