International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 A Framework for Visually Impaired Web Users with Keyboard as User Interface Shanmugapriyan.K#1, S.K.V.Jayakumar*2, venkatesh.A#3 #Department of Computer Science and Engineering, Pondicherry University, Pondicherry-605014, Abstract— Speech interface integrated browser is a web browser that helps users by using an interactive voice user interface, useful to those who have difficulty reading or seeing a web page. A speech interface integrated browser presents information aurally, using text-to-speech conversion to render textual information. It is designed for visually impaired users who wish to access the Internet in a non-visual way. It specifies users who are visuallyimpaired, people with reading problems, users with partial sight and people withother vision oriented problems. A broad range of users, not only those with disabilities, will have enhanced choice, convenience and control in accessing the information age. language for defining voice dialogs. In March 2000 they published Voice XML 1.0. The W3C produced version of Voice XML 2.0, which reached the final recommendation stage in March 2004.Voice XML allows web developers to design and implement IVR (interactive voice response) services. Voice XML is used for building telecommunication services so we just inherit the properties of voice XML such as voice prompts, voice response and other services to build our speech interface project. Here we use a speech interface to develop an application that recognizes keyboard input from 0 to 9 using which visually impaired users can browse the web. In this paper, some of these issues and difficulties have been taken up. II. SURVEY Keywords— TTS, speech interface, navigability. I. INTRODUCTION We propose a framework using which, dedicated speech based interface shall be provided on an existing website of public interest by its owner for providing its important services to the blind users. Thus, a blind user can do more important tasks independently on such a website without using any assistive tool like screen readers. It aims to plan an innovation as per the user’s’ needs and desires as opposed to from the oddity and usefulness of the innovation as saw by the developers. An auxiliary target of this task is to create a high level structure to enable Speech interface for end users and to obtain voice response framework ought to be with a specific end goal to address the issues of blind users. A superior comprehension of existing perusing methods can illuminate the outline of open sites, improvement of new tools that make experienced clients more compelling, and help beat the introductory expectation to learn and adapt. Voice XML is a standard for voice based communication and a language for creating voice-user interfaces. DTMF keypad is used as input from a keypad, speech recognition as voice input and prerecorded audio and text-to-speech synthesis (TTS) for output. Text to speech synthesis is the component serving as core part of the system which produces audio output. Server side technologies such as Java, servlets, asp and many others can be utilized for this approach. AT&T, IBM, Lucent, and Motorola formed the Voice XML Forum in March 1999, in order to produce a standard markup ISSN: 2231-5381 A. VoiceXML A voice browser is a web browser that presents an interactive voice user interface to the user. Generally a visual web browser meets expectations with HTML pages; a voice program follows up on pages that indicate voice dialogs. Commonly these pages were written in Voice XML, the W3C's standard voice dialog mark-up language, but another voice dialogue language stays in use. A speech interface integrated browser presents information aurally, using text-to-speech conversion [9]. B. Speech Grammar Recognition Specification (SRGS) The Voice Driven Interface essentially accepts spoken words as input. The input voice signals are then compared against with a set of commands which are pre-defined. If there is an appropriate match, the corresponding command is output to the Command Processor [7]. A document language that can be used by Proceedings of National Conference on New Horizons in IT - NCNHIT 2013 28ISBN 978-developers to notify the words and specify arrangements of words to be captured for by a speech recognizer or other grammar processor [4]. C. Semantic Interpretation for Speech Recognition (SISR) A document language that can be used by Proceedings of National Conference on New Horizons in IT - NCNHIT 2013 28ISBN 978-developers to specify the words and patterns of http://www.ijettjournal.org Page 433 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 words to be listened for by a speech recognizer or other grammar processor. Zoom Text 19 1.3% D. Speech Synthesis Markup Language (SSML) ChromeVox 6 0.4% It is a markup language for rendering a combination of prerecorded speech, synthetic speech and music. Other 70 4.9% E. Java speech API It defines way for easy use multi-platform software interface that let on developers to take benefit of speech technology for personal and enterprise computing. Java Speech API provide features like controlling actions by means of providing command ,includes robust multi-platform and adds others features of java platform. Ituses two main speech technologies that will be sustained through the speech synthesis and speech recognition. With speech recognition computers are able to pick up user's spoken language and match words which are spoken. In other words, it executes an input as an audio having speech by transforming it to the form of text [8]. Speech synthesis manipulates the counter process of producing synthetic speech from text generated by an application, a user or an applet. It is generally known to as text-to-speech technology. F. Speech Synthesis Speech synthesis is the artificial production of human speech. A speech synthesizer is a computer system that is used for converting normal text into speech. It serves as main component which produces output in the form needed for blind. So a text-to-speech (TTS) system converts normal text into speech. [5]. Throughout the years JAWS is still the most mainstream screen readers and has seen a critical decrease in essential use, use has adjusted following 2012 at around half. From 12.3% in May 2012 window-Eyes saw its use cut basically into equivalent amounts of in the course of recent months, to only 6.7% in January 2014.It will be intriguing to perceive how Window-Eyes use, changes later on now that it is open for Microsoft Office clients. NVDA saw a striking build and Voice Over a minor increment. SA and Zoom Text saw diminishes in utilization as an essential screen reader. While they were not listed for effortlessness' purpose, the review report shows that Orca, Super Nova, and Speak up (among others) were recognizable "Other" screen readers. Local contrasts are of note. JAWS were substantially better known in Asia (65% of respondents) and North America (52%) than in Europe/UK (44%). NVDA was about 3 times more mainstream in Europe/UK than in North America 2) Screen Readers Commonly Used G. Screen Reader Usage 1) Respondents of screen readers TABLE I PRIMARY DESKTOP/LAPTOP SCREEN READER RESPONDENTS Screen Reader JAWS # of Respondents % of Respondents 721 50.0% WINDOWSEyes 97 6.7% Voiceover 149 10.3% NVDA 268 18.6% 111 7.7% Fig .1 Screen Readers Commonly Used System Access or System Access To Go ISSN: 2231-5381 JAWS usage on desktop/laptop continues to decrease while both NVDA and Voice Over increase. Window-Eyes and System Access usage is generally unchanged. Respondents about 58% use more than one desktop/laptop screen reader. 26% use three or more and 9% use four or more different screen readers. The respondent’s percentage using multiple screen readers have increased over time. http://www.ijettjournal.org Page 434 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 III. METHOD A. Non-Speech Based Interface DTMF-based interactive systems are widely used for many applications, either web-based or telephone based. The main modalities used in such systems are audio, typing, and pointand-click [3]. Fig. 3 The Multimodal Based Interactive Framework 1) Input and Output Fig. 2 Non-Speech Based Interface The input is provided using Keyboard and the output as audio with the help of Text-to-speech. Speech is for the most part pre-recorded since all states of the dialog are destined. The stream is state based, regularly a tree structure, stream with the decisions showed to the customer at each step. Figure 1 exhibits the normal layout of a non-speech interface where a menu-driven, intelligent system uses DTMF as information from a movement of open choices and responds using destined recorded sound. Such frameworks may obviously demonstrate the data clearly on a screen, consequently utilizing the visual modality also, if the application and configuration awards. [2]. B. The Multimodal Based Interactive Framework The multimodal based interactive framework was induced as a benefit to how we structure our work into subgroups. Components used in this multimodal framework include input, analyser, dialog handler, data resources, voice generation and output. The diagram will help us to recognize areas at present outside the degree of existing gatherings. Voice interaction is a conversation mode between human and computer, pursued by people [6]. ISSN: 2231-5381 2) Analyser It processes the input using specialized modules for each type of input. In effect, the input is analysed and its semantic and pragmatic meaning is channelled to the system manager. 3) Generation It makes proper output of the system response. It interprets from the inner framework representation for a usable response for the user. It is an enormous point of interest for visually impaired individuals if a web page can render site pages in sound. It chooses how that data would be fulfilled by the most suitable output mode or a blend of the output modes. 3) Interaction handler It is the most confounded part containing a few modules that handle the connection state, the framework data, the information assets, the approval and confirmation of input and response information, the process management, the plan of action, the client encounter, the application capacities, the environment variables, and numerous more. http://www.ijettjournal.org Page 435 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 4) Data resources The data pools, databases, web services and any external information needed or requested by the system in order to fulfil the information requests and data flow. C. PROPOSED ARCHITECTURE Speech-based interaction is dependent on user's efficiency. There are huge interdependencies between the components; that is, the segments must cooperate in place for the Web to be open. At the point when openness highlights are adequately executed in one segment, alternate segments are more prone to actualize them. The interaction between user and system itself is called a dialogue. Spoken dialogue interfaces handle humanmachine dialogue using natural language as the main input and output [1]. In any case, the viability of this strategy vigorously depends on the nature of the partitioning that has been carried out. If the site can't be dissected accurately, the subsequent dialog may not be extremely palatable for the client. The goal of this project is to make the Web more accessible by providing some of the features naturally available to sighted users to users with visual impairments [5]. fulfil the plan. The DH is also connected to all external resources, back-end database and world knowledge. 3) Output Usually a text to speech synthesis is added with natural language generator (NLG).This natural language generator renders output from dialog handler from communicative acts to a proper text form. While tts helps to convert it into audio form. For the regard of customer fulfilment many applications use prerecorded audio queues instead of synthetic speech. . In that case, the dialogue handler or manager forms the output by registering all text prompts and correlating them with pre-recorded audio files. IV. ANALYSIS In this section a complete design of workflow that enables users to perform stream of task flows is illustrated. With a specific end goal to address a structure of a decently planned speech interface framework, the imagined framework needs to meet with and maintain with some for the most part acknowledged tenets of a usable interface. We propose interfaces that change and adapt according to users and their individual abilities. The major aspect of web browsing for the user is navigating a web page. When page loads user gets navigated according to both system instructions and users approach in traversing a page. TABLE II WEB NAVIGATION Type of navigation features Within a web page Arrow keys. Fig. 4 Proposed Architecture Tab key. 1) Input The input is provided using Keyboard and the output as audio with the help of tts. 2) Dialogue handler It is the core of the dialogue system. It handles a unique and complete conversation with the user, evaluating the input and creating the output. In order to do that, it activates and coordinates a series of processes that evaluate the user prompt. The dialogue handler (DH) identifies the communicative act, interprets and disambiguates the NLU output and creates a specific dialogue strategy in order to respond [10]. It maintains the state of the dialogue (or belief state), formulates a dialogue plan and employs the necessary dialogue actions in order to ISSN: 2231-5381 Key strokes. Between web pages Execute Links to navigate to other pages. Inside forms Sequential keystrokes. http://www.ijettjournal.org text filling with Access to text boxes, buttons and combo boxes. Page 436 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 The following describes the methods involved in browsing through a page. input. A Simple Demonstration on access a web page through just number keys. Before that the web page flow is determined with the help of web page owner [3]. So by pressing a key control the flow starts. A. Navigating within a web page It starts through different forms of navigation such as using Voice search, arrow keys, tab navigation and using keyboard number keys with user and system interaction. Web developers must make sure how a user can select their desired method of navigation. Clearly there will be times where the user must take a decision to select appropriate navigation. Here voice search specifies navigation by means of matching the spoken dialogue (SD) of the user. Arrow keys help user by means of simple key presses so that the system prompts where the user is currently located. Whereas keyboard number navigation process with help of instruction provoked by the system revoice response. B. Navigating between a web page It includes executing a link and page navigation with the help of links. It is important that the developer must know what kind of information that user prefers. It is done by making user and system interact. For example, consider a web page with a list of links. Therefore, when the user prefers to navigate to another page through links, then the system demands to choose certain key strokes. First, it reads out the links and then it asks the user to execute the destined link. C. Navigating inside form Forms are explored sequentially with the help of the above similar user and system interaction. But regular examining of the work flow of the web app is necessary. And make sure the user does not get confused with work flow. Here user input is must for filling form elements; the input is either given through spoken dialogues or through keystrokes. It includes providing access to buttons, check boxes and combo boxes. D. Reading a page content The user browses a page mainly to know what’s inside that page. for a blind user contents of the page are read out as per user’s need. He may make the decision to read or even skip to the next. As the page loads the user gets instructed to get an audio output. And so if executed the audio is played for user to listen. Fig .5 Control flow through a web page Consider the main page of a site: E. Input from user Inputs are given via keystrokes and also by means of spoken dialogues. It means user as to switch either one type of input. V. ILLUSTRATION A. work flow through web page The objective of this project has been to incorporate speech based interface for blind to read our web pages with ease by pressing just number keys (0 -9) on the keyboard as ISSN: 2231-5381 http://www.ijettjournal.org Page 437 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 Step 6: The system says: press 1 to read the contents Press 2 to stop reading Press 3 to exit Trustworthiness Step 7: This process is repeated until the user wants to Exit. But the main thing is web site flow must be maintained by the site owner. Only then it will be easier for blind user to navigate through their sites. B. FUNCTIONAL REQUIREMENT Fig 6. Main page of a site Step1:Blind user is asked to enter a default Key for instance Now the control flow starts….. It should exploit numbers in keyboard as input and output using speech synthesis and pre-recorded sound as output. This technology will make it easier for visually impaired people browse the web from system keyboard using open standards for IVR based browsing. These IVR based browser would use prerecorded material to present the contents of web pages. Users must interact with these browsers by pressing a button (0 to 9) on a keyboard. C. NON-FUNCTIONAL REQUIREMENT Step2: System says: contents…. Press 1 for Characteristics Press 2 for history Press 3 for implementations Press 4 for Trust and security Step 3:The user selects trust and security by pressing 4 Since the user as interrupted the control flow Stops and jumps to the control specified by the user. If not, the system would continue with the control flow by asking to press 5, 6, 7 and so on till the flow ends. Step4:The system says: press 1 for controlling Changes Press 2 for trustworthiness Press 3 for security Step 5: The user selects 2 Better component design to get more and better performance. A flexible architecture based on services that will be highly desirable in future. The framework ought to have the capacity to run for secure associations or possibly in a blend of security and shaky association with additional measures of security through giving limited access to the substitute speech based framework. It ought to be versatile, i.e. can be extended for extra functionalities after some time. Openness, Usability and Navigability ought to be improved extensively. Access time/Usage time for a given errand ought to be diminished. Examples of perplexity or uncertainty amid route ought to be wiped out totally. The framework ought to have the capacity to run for secure associations or possibly in a blend of security and frail association with additional measures of security through giving confined access to the substitute speech based framework. VI. CONCLUSION In this paper, the architecture of a speech interface for the visually handicapped people (blind) has been described. Thus a framework for speech integrated browser is designed for users who wish to access the Internet in a non-visual or combined auditory and visual way. This research takes a gander at how communication can be used between user and computer viably and attractively in order to access the information by the blind user. This includes partially sighted users, people with dyslexia or learning difficulties, and users who are learning new languages. A broad range of users, not only those with So now the flow jumps to trustworthiness ISSN: 2231-5381 http://www.ijettjournal.org Page 438 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 disabilities, will have enhanced choice, convenience and control in accessing the information age. [7] Josiah Poon and Chris Nunn, ―Browsing the Web from a Speech-Based Interface‖ [8]http://www.oracle.com/technetwork/java/index-138300.html REFERENCES [1] TasosSpiliotopoulos, ―Integrating Usability Engineering for Designing the Web Experience: Methodologies and Principles‖. [2] Ghazi I. Alkhatib,‖ Web Engineered Applications for Evolving Organizations: Emerging Knowledge‖. [3] PrabhatVerma, Raghuraj Singh and Avinash Kumar Singh,― A framework to integrate speech based interface for blind web users on the websites of public interest‖, Verma et al. Human-centric Computing and Information Sciences 2013, vol.3:21 [9] Nobutaka HAYASHI and Kazumasa TAKAMI, ―An oral interaction control method with Web pages by VXML metadata definition‖, Graduate School of Engineering, Soka University. [10] DimitrisSpiliotopoulos, PepiStavropoulou, and GeorgiosKouroupetroglou,‖ poken Dialogue Interfaces: Integrating Usability‖, Department of Informatics and Telecommunications National and Kapodistrian University of Athens Panepistimiopolis, Ilisia, GR-15784, Athens, Greece. [4] http://www.w3.org/Voice/ [5] Ramakrishnan, I, Stent A, Yang A (2004) Hearsay: enabling audio browsing of hypertext content. In: Proceedings of the 13th International Conference on World Wide Web (2004). ACM Press, pp 80–89. [6] DI Guoqiang, Liu Yao Yao, Han Lingchao and Wu Jianping,‖ Design and Implementation of Voice Web Pages for Online Shopping Based on .NET and Streaming Media‖, International Conference on Management of eCommerce and e-Government,pp.226-229. ISSN: 2231-5381 http://www.ijettjournal.org Page 439