A Framework for Visually Impaired Web Shanmugapriyan.K

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
A Framework for Visually Impaired Web
Users with Keyboard as User Interface
Shanmugapriyan.K#1, S.K.V.Jayakumar*2, venkatesh.A#3
#Department of Computer Science and Engineering, Pondicherry University,
Pondicherry-605014,
Abstract— Speech interface integrated browser is a web browser
that helps users by using an interactive voice user interface, useful
to those who have difficulty reading or seeing a web page. A speech
interface integrated browser presents information aurally, using
text-to-speech conversion to render textual information. It is
designed for visually impaired users who wish to access the Internet
in a non-visual way. It specifies users who are visuallyimpaired,
people with reading problems, users with partial sight and people
withother vision oriented problems. A broad range of users, not
only those with disabilities, will have enhanced choice, convenience
and control in accessing the information age.
language for defining voice dialogs. In March 2000 they
published Voice XML 1.0. The W3C produced version of Voice
XML 2.0, which reached the final recommendation stage in
March 2004.Voice XML allows web developers to design and
implement IVR (interactive voice response) services. Voice
XML is used for building telecommunication services so we just
inherit the properties of voice XML such as voice prompts, voice
response and other services to build our speech interface project.
Here we use a speech interface to develop an application that
recognizes keyboard input from 0 to 9 using which visually
impaired users can browse the web. In this paper, some of these
issues and difficulties have been taken up.
II. SURVEY
Keywords— TTS, speech interface, navigability.
I. INTRODUCTION
We propose a framework using which, dedicated speech
based interface shall be provided on an existing website of
public interest by its owner for providing its important services
to the blind users. Thus, a blind user can do more important
tasks independently on such a website without using any
assistive tool like screen readers. It aims to plan an innovation
as per the user’s’ needs and desires as opposed to from the
oddity and usefulness of the innovation as saw by the
developers. An auxiliary target of this task is to create a high
level structure to enable Speech interface for end users and to
obtain voice response framework ought to be with a specific end
goal to address the issues of blind users. A superior
comprehension of existing perusing methods can illuminate the
outline of open sites, improvement of new tools that make
experienced clients more compelling, and help beat the
introductory expectation to learn and adapt. Voice XML is a
standard for voice based communication and a language for
creating voice-user interfaces. DTMF keypad is used as input
from a keypad, speech recognition as voice input and prerecorded audio and text-to-speech synthesis (TTS) for output.
Text to speech synthesis is the component serving as core part
of the system which produces audio output. Server side
technologies such as Java, servlets, asp and many others can be
utilized for this approach.
AT&T, IBM, Lucent, and Motorola formed the Voice XML
Forum in March 1999, in order to produce a standard markup
ISSN: 2231-5381
A. VoiceXML
A voice browser is a web browser that presents an interactive
voice user interface to the user. Generally a visual web browser
meets expectations with HTML pages; a voice program follows
up on pages that indicate voice dialogs. Commonly these pages
were written in Voice XML, the W3C's standard voice dialog
mark-up language, but another voice dialogue language stays in
use. A speech interface integrated browser presents information
aurally, using text-to-speech conversion [9].
B. Speech Grammar Recognition Specification (SRGS)
The Voice Driven Interface essentially accepts spoken words as
input. The input voice signals are then compared against with a
set of commands which are pre-defined. If there is an
appropriate match, the corresponding command is output to the
Command Processor [7].
A document language that can be used by Proceedings of
National Conference on New Horizons in IT - NCNHIT 2013
28ISBN 978-developers to notify the words and specify
arrangements of words to be captured for by a speech
recognizer or other grammar processor [4].
C. Semantic Interpretation for Speech Recognition (SISR)
A document language that can be used by Proceedings of
National Conference on New Horizons in IT - NCNHIT 2013
28ISBN 978-developers to specify the words and patterns of
http://www.ijettjournal.org
Page 433
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
words to be listened for by a speech recognizer or other grammar
processor.
Zoom Text
19
1.3%
D. Speech Synthesis Markup Language (SSML)
ChromeVox
6
0.4%
It is a markup language for rendering a combination of prerecorded speech, synthetic speech and music.
Other
70
4.9%
E. Java speech API
It defines way for easy use multi-platform software interface
that let on developers to take benefit of speech technology for
personal and enterprise computing. Java Speech API provide
features like controlling actions by means of providing
command ,includes robust multi-platform and adds others
features of java platform. Ituses two main speech technologies
that will be sustained through the speech synthesis and speech
recognition. With speech recognition computers are able to pick
up user's spoken language and match words which are spoken. In
other words, it executes an input as an audio having speech by
transforming it to the form of text [8]. Speech synthesis
manipulates the counter process of producing synthetic speech
from text generated by an application, a user or an applet. It is
generally known to as text-to-speech technology.
F. Speech Synthesis
Speech synthesis is the artificial production of human
speech. A speech synthesizer is a computer system that is used
for converting normal text into speech. It serves as main
component which produces output in the form needed for blind.
So a text-to-speech (TTS) system converts normal text into
speech. [5].
Throughout the years JAWS is still the most mainstream
screen readers and has seen a critical decrease in essential use,
use has adjusted following 2012 at around half. From 12.3% in
May 2012 window-Eyes saw its use cut basically into equivalent
amounts of in the course of recent months, to only 6.7% in
January 2014.It will be intriguing to perceive how Window-Eyes
use, changes later on now that it is open for Microsoft Office
clients. NVDA saw a striking build and Voice Over a minor
increment. SA and Zoom Text saw diminishes in utilization as
an essential screen reader. While they were not listed for
effortlessness' purpose, the review report shows that Orca, Super
Nova, and Speak up (among others) were recognizable "Other"
screen readers.
Local contrasts are of note. JAWS were substantially better
known in Asia (65% of respondents) and North America (52%)
than in Europe/UK (44%). NVDA was about 3 times more
mainstream in Europe/UK than in North America
2)
Screen Readers Commonly Used
G. Screen Reader Usage
1)
Respondents of screen readers
TABLE I
PRIMARY DESKTOP/LAPTOP SCREEN READER RESPONDENTS
Screen Reader
JAWS
#
of
Respondents
%
of
Respondents
721
50.0%
WINDOWSEyes
97
6.7%
Voiceover
149
10.3%
NVDA
268
18.6%
111
7.7%
Fig .1 Screen Readers Commonly Used
System Access
or System Access
To Go
ISSN: 2231-5381
JAWS usage on desktop/laptop continues to decrease while
both NVDA and Voice Over increase. Window-Eyes and
System Access usage is generally unchanged. Respondents
about 58% use more than one desktop/laptop screen reader. 26%
use three or more and 9% use four or more different screen
readers. The respondent’s percentage using multiple screen
readers have increased over time.
http://www.ijettjournal.org
Page 434
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
III. METHOD
A. Non-Speech Based Interface
DTMF-based interactive systems are widely used for many
applications, either web-based or telephone based. The main
modalities used in such systems are audio, typing, and pointand-click [3].
Fig. 3 The Multimodal Based Interactive Framework
1)
Input and Output
Fig. 2 Non-Speech Based Interface
The input is provided using Keyboard and the output as audio
with the help of Text-to-speech.
Speech is for the most part pre-recorded since all states of the
dialog are destined. The stream is state based, regularly a tree
structure, stream with the decisions showed to the customer at
each step. Figure 1 exhibits the normal layout of a non-speech
interface where a menu-driven, intelligent system uses DTMF as
information from a movement of open choices and responds
using destined recorded sound. Such frameworks may obviously
demonstrate the data clearly on a screen, consequently utilizing
the visual modality also, if the application and configuration
awards. [2].
B. The Multimodal Based Interactive Framework
The multimodal based interactive framework was induced as
a benefit to how we structure our work into subgroups.
Components used in this multimodal framework include input,
analyser, dialog handler, data resources, voice generation and
output. The diagram will help us to recognize areas at present
outside the degree of existing gatherings. Voice interaction is a
conversation mode between human and computer, pursued by
people [6].
ISSN: 2231-5381
2) Analyser
It processes the input using specialized modules for each type
of input. In effect, the input is analysed and its semantic and
pragmatic meaning is channelled to the system manager.
3) Generation
It makes proper output of the system response. It interprets
from the inner framework representation for a usable response
for the user. It is an enormous point of interest for visually
impaired individuals if a web page can render site pages in
sound. It chooses how that data would be fulfilled by the most
suitable output mode or a blend of the output modes.
3)
Interaction handler
It is the most confounded part containing a few modules that
handle the connection state, the framework data, the information
assets, the approval and confirmation of input and response
information, the process management, the plan of action, the
client encounter, the application capacities, the environment
variables, and numerous more.
http://www.ijettjournal.org
Page 435
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
4)
Data resources
The data pools, databases, web services and any external
information needed or requested by the system in order to fulfil
the information requests and data flow.
C. PROPOSED ARCHITECTURE
Speech-based interaction is dependent on user's efficiency.
There are huge interdependencies between the components; that
is, the segments must cooperate in place for the Web to be open.
At the point when openness highlights are adequately
executed in one segment, alternate segments are more prone to
actualize them. The interaction between user and system itself is
called a dialogue. Spoken dialogue interfaces handle humanmachine dialogue using natural language as the main input and
output [1]. In any case, the viability of this strategy vigorously
depends on the nature of the partitioning that has been carried
out. If the site can't be dissected accurately, the subsequent
dialog may not be extremely palatable for the client. The goal of
this project is to make the Web more accessible by providing
some of the features naturally available to sighted users to users
with visual impairments [5].
fulfil the plan. The DH is also connected to all external
resources, back-end database and world knowledge.
3)
Output
Usually a text to speech synthesis is added with natural
language generator (NLG).This natural language generator
renders output from dialog handler from communicative acts to a
proper text form. While tts helps to convert it into audio form.
For the regard of customer fulfilment many applications use prerecorded audio queues instead of synthetic speech. . In that case,
the dialogue handler or manager forms the output by registering
all text prompts and correlating them with pre-recorded audio
files.
IV. ANALYSIS
In this section a complete design of workflow that enables
users to perform stream of task flows is illustrated. With a
specific end goal to address a structure of a decently planned
speech interface framework, the imagined framework needs to
meet with and maintain with some for the most part
acknowledged tenets of a usable interface. We propose
interfaces that change and adapt according to users and their
individual abilities. The major aspect of web browsing for the
user is navigating a web page. When page loads user gets
navigated according to both system instructions and users
approach in traversing a page.
TABLE II
WEB NAVIGATION
Type of navigation
features
Within a web page
Arrow keys.
Fig. 4 Proposed Architecture
Tab key.
1)
Input
The input is provided using Keyboard and the output as audio
with the help of tts.
2)
Dialogue handler
It is the core of the dialogue system. It handles a unique and
complete conversation with the user, evaluating the input and
creating the output. In order to do that, it activates and
coordinates a series of processes that evaluate the user prompt.
The dialogue handler (DH) identifies the communicative act,
interprets and disambiguates the NLU output and creates a
specific dialogue strategy in order to respond [10]. It maintains
the state of the dialogue (or belief state), formulates a dialogue
plan and employs the necessary dialogue actions in order to
ISSN: 2231-5381
Key strokes.
Between web pages
Execute Links to navigate to other
pages.
Inside forms
Sequential
keystrokes.
http://www.ijettjournal.org
text
filling
with
Access to text boxes, buttons and
combo boxes.
Page 436
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
The following describes the methods involved in browsing
through a page.
input. A Simple Demonstration on access a web page
through just number keys. Before that the web page flow is
determined with the help of web page owner [3]. So by
pressing a key control the flow starts.
A. Navigating within a web page
It starts through different forms of navigation such as using
Voice search, arrow keys, tab navigation and using keyboard
number keys with user and system interaction. Web developers
must make sure how a user can select their desired method of
navigation. Clearly there will be times where the user must take
a decision to select appropriate navigation. Here voice search
specifies navigation by means of matching the spoken dialogue
(SD) of the user. Arrow keys help user by means of simple key
presses so that the system prompts where the user is currently
located. Whereas keyboard number navigation process with help
of instruction provoked by the system revoice response.
B. Navigating between a web page
It includes executing a link and page navigation with the help
of links. It is important that the developer must know what kind
of information that user prefers. It is done by making user and
system interact. For example, consider a web page with a list of
links. Therefore, when the user prefers to navigate to another
page through links, then the system demands to choose certain
key strokes. First, it reads out the links and then it asks the user
to execute the destined link.
C. Navigating inside form
Forms are explored sequentially with the help of the above
similar user and system interaction. But regular examining of the
work flow of the web app is necessary. And make sure the user
does not get confused with work flow. Here user input is must
for filling form elements; the input is either given through
spoken dialogues or through keystrokes. It includes providing
access to buttons, check boxes and combo boxes.
D. Reading a page content
The user browses a page mainly to know what’s inside that
page. for a blind user contents of the page are read out as per
user’s need. He may make the decision to read or even skip to
the next. As the page loads the user gets instructed to get an
audio output. And so if executed the audio is played for user to
listen.
Fig .5 Control flow through a web page
Consider the main page of a site:
E. Input from user
Inputs are given via keystrokes and also by means of spoken
dialogues. It means user as to switch either one type of input.
V. ILLUSTRATION
A. work flow through web page
The objective of this project has been to incorporate
speech based interface for blind to read our web pages with
ease by pressing just number keys (0 -9) on the keyboard as
ISSN: 2231-5381
http://www.ijettjournal.org
Page 437
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
Step 6: The system says: press 1 to read the contents
Press 2 to stop reading
Press 3 to exit Trustworthiness
Step 7: This process is repeated until the user wants to Exit.
But the main thing is web site flow must be maintained by the
site owner. Only then it will be easier for blind user to navigate
through their sites.
B. FUNCTIONAL REQUIREMENT
Fig 6. Main page of a site
Step1:Blind user is asked to enter a default Key for instance
Now the control flow starts…..
It should exploit numbers in keyboard as input and output
using speech synthesis and pre-recorded sound as output. This
technology will make it easier for visually impaired people
browse the web from system keyboard using open standards for
IVR based browsing. These IVR based browser would use prerecorded material to present the contents of web pages. Users
must interact with these browsers by pressing a button (0 to 9)
on a keyboard.
C. NON-FUNCTIONAL REQUIREMENT
Step2: System says: contents….
Press 1 for Characteristics
Press 2 for history
Press 3 for implementations
Press 4 for Trust and security
Step 3:The user selects trust and security by pressing 4
Since the user as interrupted the control flow
Stops and jumps to the control specified by the user. If not,
the system would continue with the control flow by asking to
press 5, 6, 7 and so on till the flow ends.
Step4:The system says: press 1 for controlling Changes
Press 2 for trustworthiness
Press 3 for security
Step 5: The user selects 2
Better component design to get more and better performance.
A flexible architecture based on services that will be highly
desirable in future. The framework ought to have the capacity to
run for secure associations or possibly in a blend of security and
shaky association with additional measures of security through
giving limited access to the substitute speech based framework.
It ought to be versatile, i.e. can be extended for extra
functionalities after some time. Openness, Usability and
Navigability ought to be improved extensively. Access
time/Usage time for a given errand ought to be diminished.
Examples of perplexity or uncertainty amid route ought to be
wiped out totally. The framework ought to have the capacity to
run for secure associations or possibly in a blend of security and
frail association with additional measures of security through
giving confined access to the substitute speech based framework.
VI. CONCLUSION
In this paper, the architecture of a speech interface for the
visually handicapped people (blind) has been described. Thus a
framework for speech integrated browser is designed for users
who wish to access the Internet in a non-visual or combined
auditory and visual way. This research takes a gander at how
communication can be used between user and computer viably
and attractively in order to access the information by the blind
user. This includes partially sighted users, people with dyslexia
or learning difficulties, and users who are learning new
languages. A broad range of users, not only those with
So now the flow jumps to trustworthiness
ISSN: 2231-5381
http://www.ijettjournal.org
Page 438
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
disabilities, will have enhanced choice, convenience and control
in accessing the information age.
[7] Josiah Poon and Chris Nunn, ―Browsing the Web from a Speech-Based
Interface‖
[8]http://www.oracle.com/technetwork/java/index-138300.html
REFERENCES
[1] TasosSpiliotopoulos, ―Integrating Usability Engineering for Designing
the Web Experience: Methodologies and Principles‖.
[2] Ghazi I. Alkhatib,‖ Web Engineered Applications for Evolving
Organizations: Emerging Knowledge‖.
[3] PrabhatVerma, Raghuraj Singh and Avinash Kumar Singh,― A
framework to integrate speech based interface for blind web users on the
websites of public interest‖, Verma et al. Human-centric Computing and
Information Sciences 2013, vol.3:21
[9] Nobutaka HAYASHI and Kazumasa TAKAMI, ―An oral interaction
control method with Web pages by VXML metadata definition‖, Graduate
School of Engineering, Soka University.
[10]
DimitrisSpiliotopoulos,
PepiStavropoulou,
and
GeorgiosKouroupetroglou,‖ poken Dialogue Interfaces: Integrating Usability‖,
Department of Informatics and Telecommunications National and Kapodistrian
University of Athens Panepistimiopolis, Ilisia, GR-15784, Athens, Greece.
[4] http://www.w3.org/Voice/
[5] Ramakrishnan, I, Stent A, Yang A (2004) Hearsay: enabling audio
browsing of hypertext content. In: Proceedings of the 13th International
Conference on World Wide Web (2004). ACM Press, pp 80–89.
[6] DI Guoqiang, Liu Yao Yao, Han Lingchao and Wu Jianping,‖ Design
and Implementation of Voice Web Pages for Online Shopping Based on .NET
and Streaming Media‖, International Conference on Management of eCommerce and e-Government,pp.226-229.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 439
Download