Portable Camera-Based Assistive Bar

advertisement
Portable Camera-Based Assistive Bar-Code
Reader for Visually Challenged
Shaherunnisa1, Diwakar R. Marur2
Department of Electronics and Communication Engineering,SRM University,India
Abstract - Vision loss affects almost every
activity of daily living. A camera-based assistive
barcode reading frame work is to help blind persons to
read product labels and product packaging from handheld objects in their daily lives. The object handled is
segregated from background by shaking to determine
the region of interest (ROI). The camera captures the
image of ROI. Algorithm such as Scale Invarient
Feature Transform (SIFT) is used to compare the
barcode of the testing product with the barcodes present
in the database and the matches image data is obtained.
The information is transferred to the microcontroller
and audio converter converts it into audio and transfers
the obtained information to the user. The recognized
text codes are output to blind users in speech. User
interface issues are explored and assess robustness of
the algorithm in extracting and reading product
information by identifying barcode from different
objects with complex backgrounds.
Keywords - SIFT, PCA-SIFT, GSIFT, CSIFT,
ASIFT, Orientation, DoG.
I.
INTRODUCTION
Visually impaired people are often in target
groups of various investigations, including basic
research, applied research, research and
development studies. Over 285 million people in
the world are visually impaired, of whom 39
million are blind and 246 million have moderate to
serve visual impairment according to the survey
taken by World Health Organization (WHO) [1]. It
is predicted that without extra interventions, these
numbers will rise to 75 million blind and 200
million visually impaired by the year 2020.
Even in a developed country like the
United States (US), the 2008 National Health
Interview Survey (NHIS) reported that an estimated
25.2 million adult Americans (over 8%) visually
impaired or blind [2]. This number is increasing
rapidly as the baby boomer generation ages. Fig.1
shows the number of visually challenged people
per million population in different countries.
Shaherunnisa, Electronics and Communication Engineering,
SRM
University
(email:shaherunnisa1992@gmail.com).
Vijayawada, India, 9600017362. Diwakar R. Marur,
Electronics and Communication Engineering, SRM University
(email:
diwakar.r@ktr.srmuniv.ac.in).
Chennai,
India,
9444878525.
Recent developments in computer vision,
digital cameras, and portable computers make it
feasible to assist these individuals. Portable bar
code readers designed to help blind people identify
different products in an extensive product database
can enable users who are blind to access
information about these products [3] through
speech and Braille. But a big limitation is that it is
very hard for blind users to find the location of the
bar code.
60
Blind per
million
population
50
40
Number of people
1,2
Low vision
per million
population
30
20
10
Visually
impaired per
million
population
0
Countries
Fig 1. Number of people blind per million
population in different countries
Image local feature description algorithms
are developed to determine the product and to
locate the barcode on the product to describe the
local image such as Gradient Location and
Orientation Histogram (GLOH) [4], Scale Invariant
Feature Transform (SIFT) [5]. For any object in an
image, interesting points on the object can be
extracted to provide a ‘feature description’ of the
object. This description, extracted from a training
image, can then be used to identify the object when
attempting to locate the object in a test image
containing many other objects. To perform reliable
recognition, it is important that the features
extracted from the training image be detectable
even under changes in image scale, noise and
illumination. Such points usually lie on highcontrast regions of the image, such as object edges.
As shown in Fig.2, such barcode
information can appear in multiple orientations. To
assist blind persons to read barcode from these
kinds of hand-held objects, we have conceived of a
camera-based assistive barcode reading framework
to track the object of interest within the camera
view and extract barcode information from the
object [6]. This SIFT algorithm [7] can effectively
handle complex background and multiple
orientations, and extract text information from
hand-held objects.
the webcam is in RGB24 format. The frames from
the video is segregated and undergone to the preprocessing.
The data processing component is used for
Object-of-interest detection, Bar-code localization.
Algorithms are developed to determine the product
and to locate the barcode on the product to describe
the local image. The audio output component is to
inform the blind user of recognized text codes in
the form of speech or audio.
II. RELATED WORK
Fig 2. Examples of Bar-code from hand-held
objects and its multiple orientations.
A barcode is
an
optical machinereadable representation of data relating to the
object to which it is attached. Originally barcodes
systematically represented data by varying the
widths and spacing of parallel lines, and may be
referred to as linear or one-dimensional (1D). Later
they evolved into rectangles, dots, hexagons and
other geometric patterns in two dimensions (2D).
Although 2D systems use a variety of
symbols, they are generally referred to as barcodes
as well. The captured two dimensional signals are
sampled and quantized to yield digital images.
In assistive reading systems for blind
persons, it is very challenging for users to position
the object of interest within the centre of the
camera’s view. As of now, there are still no
acceptable solutions. We approach the problem in
stages [8]. To make sure the hand-held object
appears in the camera view, we use a camera with
sufficiently wide angle to accommodate users with
only approximate aim. This may often result in
other text objects appearing in the camera’s view
(for example while shopping at a supermarket).
To extract the hand-held object from the camera
image, we develop a motion-based method to
obtain a region of interest (ROI) of the object. Then
we perform barcode recognition only in this ROI.
The scene capture component collects
scenes containing objects of interest in the form of
images or video, it corresponds to a camera
attached to a pair of sunglasses. The live video is
captured by using web cam and it can be done
using MATLAB libraries. The image format from
Blind and visually impaired people are at a
great disadvantage. There are various technologies
developed to help the visually impaired people as
shown in the references [9-11]. The previously
developed algorithms to extract barcode from scene
images and for matching of barcodes [8-10]. A
survey paper about computer vision based assistive
technologies to help people with visual
impairments can be found in Ref. [5]. Since the
SIFT algorithm was formally proposed, researchers
have never stopped improving it. There are various
algorithms developed among which the number of
references of some of the algorithms are relatively
high. Thus, these algorithms are selected and
investigated.
In the phase of descriptor establishing,
SIFT uses a 128- dimensional vector to describe
each key point. This high dimension makes the
following step to SIFT (Image feature matching)
slow. In order to reduce the dimensionality of
describing each key point, Y. Ke [8] uses the
Principal Component Analysis (PCA) method to
replace the histogram method used in SIFT. This
improved version is called PCA-SIFT.
In the phase of descriptor establishment,
SIFT only describes local information and doesn’t
make use of global information. E.N. Mortensen
[9] introduced a SIFT descriptor with Global
context (called GSIFT), which adds a global texture
vector to the basis of SIFT.
In the phase of key point detection, SIFT
only uses gray scale information of an image. A lot
of colour information is discarded for the colour
images. A.A.Farag proposed CSIFT, which adds
colour invariance to the basis of SIFT and intends
to overcome the short comings of SIFT for colour
images. H.Bay [10] proposed SURF which is very
similar to SIFT but adopts different processing
methods in every step. H. Bay claimed SURF is an
enhanced version of SIFT.
J.M. Morel proposed Affine-SIFT (called
ASIFT), which follows affine transformation
parameters to correct images and intends to resist
strong affine issues. Their performances differs in
different situations: scale change, rotation change,
blur change, illumination change and affine change.
Each algorithm has its own advantages. SIFT and
CSIFT perform the best under scale and rotation
change. CSIFT improves SIFT under blur change
and affine change but not under illumination
change.
PCA-SIFT is better in different situations
like scale and rotation, blur and illumination.
GSIFT is best in blur and illumination change but
not in rotation, scale and affine change. SURF
performance is common in all the situations, but
runs the fastest.
ASIFT performs best in affine change and
good at scle and rotation change but performs worst
in blur and illumination change. Since rotation
change, scale change are parameters we consider,
SIFT algorithm is used.
III. ALGORITHM OVERVIEW
SIFT is an image local feature description
algorithm based on scale-space. Due to its strong
matching ability, SIFT has many applications in
different fields, such as image retrieval, image
stitching, and machine vision. The procedure of
SIFT mainly includes three steps: keypoint
detection, descriptor establishing, and image
feature matching which also includesScale-space
extrema detection using Difference of Gaussian
function, Keypoint localization and descriptor
construction, Orientation assignment.
The greatest characteristic of SIFT
algorithm is scale invariance. In order to achieve
scale invariance, SIFT uses a DoG (Difference of
Gaussian) function, shown in equation (1), to do
convolution on an image.
G(x,y,σ)=
1
2𝜋𝜎
2𝑒
−[
𝑥2 +𝑦2
]
2𝜎2
(1)
D(x,y,σ)= (G(x,y,kσ)-G(x,y,σ))*I(x,y)
= L(x,y,kσ)-L(x,y,σ),
(2)
Where x,y are the coordinates and σ is
standard deviation and also a parameter of
Gaussian function. G(x,y,σ) denotes the Gussian
function. D(x,y,σ) determines the difference of
Gaussian function. It obtains different scale images
by changing σ.
Then, it subtracts the images which are
adjacent in the same resolution to get a DoG
pyramid. The DoG function is a kind of an
improvement of a Gauss-Laplace algorithm [15],
shown as equation 2, where I (x, y) denotes an
input image, and k denotes a scale coefficient of an
adjacent scale-space factor.
SIFT compares each point with its
adjacent 26 pixels, which is the sum of eight
adjacent pixels in the same layer and nine pixels in
the upper and lower adjacent layers.
Fig 3. Visual
Descriptor [9]
representation
of
keypoint
If the point is minimum or maximum, the
location and scale of this point are recorded.
Therefore, SIFT gets all extreme points of DoG
scale-space, and locates extreme points exactly.
After that, it removes low contrast and unstable
edge points. It further removes interference points,
using 2× 2 Hessian matrix obtained from adjacent
difference images. Next, in the scale of each
keypoint, SIFT computes the gradient strength and
direction of every neighborhood.
According to gradient directions, SIFT
votes in histogram for every neighborhood, and
uses the summations as the gradient strengths of a
keypoint. And the main direction of this keypoint is
defined as the direction whose gradient strength is
maximal. Then, SIFT uses the keypoint as a center
to choose an adjacent 16×16 region. After the
region is chosen, SIFT divides this region into 4×4
sub-regions, and sums the gradient strength in each
sub-region. SIFT uses eight directions in each subregion to generate an eight-dimensional vector.
Thereby, SIFT gets a 128-dimensional feature
description from 16 sub-regions, according to a
certain order as shown in fig 3.
IV. SYSTEM DESIGN
This paper presents a prototype system of
assistive text reading. As illustrated, the system
framework
consists
of
three
functional
components: scene capture, data processing and
audio output. The scene capture component collects
scenes containing objects of interest in the form of
images or video. In our prototype, it corresponds to
a camera attached to a pair of sunglasses. The data
processing component is used for deploying our
proposed algorithms, including 1) object-of-interest
detection to selectively extract the image of the
object held by the blind user from the cluttered
background or other neutral objects in the camera
view; and 2) barcode localization to obtain image
regions containing barcode.
We use a laptop as the processing device
in our current prototype system. The audio output
component is to inform the blind user of recognized
text codes.
A Bluetooth earpiece with minimicrophone is employed for speech output. This
simple hardware configuration ensures the
portability of the assistive text reading system.Fig 4
depicts a work flowchart of the prototype system.
is converted to gray image. Then it is segmented
and compared with the images in the database and
extracted barcode region is obtained as shown
below.
start
Acquire image
using camera
Fig. 5 Captured Image
Fig. 6 RGB to Gray
converted image
Fig. 7 Segmented image
Fig. 8 Extracted
Barcode
Detect Object of
Interest
No
If
Barcode
detected
Yes
Extract barcode
region
Object recognition
through barcode
Output text to blind
user in speech
Stop
Fig 4. Flowchart of proposed framework to read
text from hand held objects for blind users.
VI. CONCLUSION AND FUTURE WORK
In this paper we have discussed a
prototype system to recognise the product handheld objects for assisting blind persons. In order to
solve the common aiming problem for blind users,
we have proposed a motion-based method to detect
the object of interest while the blind user simply
shakes the object for a couple of seconds. This
method can effectively distinguish the object of
interest from background or other objects in the
camera view. To extract barcode regions from
complex backgrounds, we have proposed a SIFT
algorithm for image matching and to obtain the
product details of the matched image. SIFT
algorithm is proposed in this paper based on its
performance on scale and rotation change.
We will also extend our algorithm to
handle non-horizontal barcodes further more we
will address significant human interface issues
associated with recognising the hand-held objects
for visually challenged users.
V. EXPERIMENTAL RESULTS
The results obtained by implementing the flowchart
is as shown in the fig. 5. This result consists of the
tested image whose barcode has to be obtained and
REFERENCES
[1]
World Health Organization. 10 facts about blindness
and visual impairment. [Online]
Available:
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
www.who.int/features/factfiles/blindness/blindness_fact
s/en/index.html
Advance Data Reports from the National Health
Interview Survey (2008). [Online]
Available: http://www.cdc.gov/nchs/nhis/nhis_ad.html.
ScanTalker, Bar code scanning application to help
Blind Identify over one million products. [Online]
Available:
http://www.freedomscientific.com/fs_news/
PressRoom/en/2006/ScanTalker2-Announcement_330-2006.asp.
Lowe, D.G. (2004). Distinctive image features from
scale-invariant keypoints. International Journal of
Computer Vision, vol.2, pp.91-110.
KReader Mobile User Guide, knfb Reading
Technology Inc. (2008). [Online]
Available: http://www.knfbReading.com.
E. Ohbuchi, H. Hanaizumi, and L. A. Hock. Barcode
readers using the camera device in mobile phones. In
Proceedings of the 2004 International Conference
on Cyberworlds, Washington, DC, USA, 2004,
pp.260–265.
Lowe, D.G. (1999). Object recognition from local
scale invariant features. In Proceedings of the 7th
IEEE International Conference on Computer Vision,
pp.20-27.September 1999, vol. 2, 1150-1157.
D. Chai and F. Hock. Locating and decoding EAN13 barcodes from images captured by digital
cameras. In Information, Communications and Signal
Processing, 2005 Fifth International Conference on,
pp. 1595–1599, 2005.
R.Manduchi and J. Coughlan, “(Computer) Vision
without sight,” Commun. ACM, vol. 55, no. 1, pp.
96–104, 2012.
The Portset Reader, TVI Technologies for the
Visually Impaired Inc., Hauppauge, NY,
USA.(2012). [Online].
Available:
http://www.tviweb.com/products/porsetreader.html
L. Ran, S. Helal, and S. Moore, “Drishti: An
integrated indoor/outdoor blind navigation system
and service,” in Proc. 2nd IEEE Annu. Conf.
Pervasive Comput. Commun., 2004, pp. 23–40.
Ke, Y., Sukthankar, R. (2004). “PCA-SIFT: A more
distinctive representation for local image
descriptors,” In Proceedings of Computer Vision and
Pattern Recognition (CVPR2004), 27 June – 2 July
2004, vol. 2, pp. 506-513.
Mortensen, E.N., Deng, H., Shapiro, L. (2005). A
SIFT descriptor with global context. In Computer
Vision and Pattern Recognition (CVPR 2005), 20-25
June 2005. IEEE, Vol. 1, pp.184-190.
Morel, J.M., Yu, G. (2009).
ASIFT: A new framework for fully affine invariant
image comparison. SIAM Journal on Imaging
Sciences, vol.2 , pp.438-469.
Rabbani, H. (2011). Statistical modeling of low SNR
magnetic resonance images in wavelet domain using
Laplacian prior and two-sided Rayleigh noise for
visual quality improvement. Measurement Science
Review, vol.4, pp.125-130.
Download