Portable Camera-Based Assistive Bar-Code Reader for Visually Challenged Shaherunnisa1, Diwakar R. Marur2 Department of Electronics and Communication Engineering,SRM University,India Abstract - Vision loss affects almost every activity of daily living. A camera-based assistive barcode reading frame work is to help blind persons to read product labels and product packaging from handheld objects in their daily lives. The object handled is segregated from background by shaking to determine the region of interest (ROI). The camera captures the image of ROI. Algorithm such as Scale Invarient Feature Transform (SIFT) is used to compare the barcode of the testing product with the barcodes present in the database and the matches image data is obtained. The information is transferred to the microcontroller and audio converter converts it into audio and transfers the obtained information to the user. The recognized text codes are output to blind users in speech. User interface issues are explored and assess robustness of the algorithm in extracting and reading product information by identifying barcode from different objects with complex backgrounds. Keywords - SIFT, PCA-SIFT, GSIFT, CSIFT, ASIFT, Orientation, DoG. I. INTRODUCTION Visually impaired people are often in target groups of various investigations, including basic research, applied research, research and development studies. Over 285 million people in the world are visually impaired, of whom 39 million are blind and 246 million have moderate to serve visual impairment according to the survey taken by World Health Organization (WHO) [1]. It is predicted that without extra interventions, these numbers will rise to 75 million blind and 200 million visually impaired by the year 2020. Even in a developed country like the United States (US), the 2008 National Health Interview Survey (NHIS) reported that an estimated 25.2 million adult Americans (over 8%) visually impaired or blind [2]. This number is increasing rapidly as the baby boomer generation ages. Fig.1 shows the number of visually challenged people per million population in different countries. Shaherunnisa, Electronics and Communication Engineering, SRM University (email:shaherunnisa1992@gmail.com). Vijayawada, India, 9600017362. Diwakar R. Marur, Electronics and Communication Engineering, SRM University (email: diwakar.r@ktr.srmuniv.ac.in). Chennai, India, 9444878525. Recent developments in computer vision, digital cameras, and portable computers make it feasible to assist these individuals. Portable bar code readers designed to help blind people identify different products in an extensive product database can enable users who are blind to access information about these products [3] through speech and Braille. But a big limitation is that it is very hard for blind users to find the location of the bar code. 60 Blind per million population 50 40 Number of people 1,2 Low vision per million population 30 20 10 Visually impaired per million population 0 Countries Fig 1. Number of people blind per million population in different countries Image local feature description algorithms are developed to determine the product and to locate the barcode on the product to describe the local image such as Gradient Location and Orientation Histogram (GLOH) [4], Scale Invariant Feature Transform (SIFT) [5]. For any object in an image, interesting points on the object can be extracted to provide a ‘feature description’ of the object. This description, extracted from a training image, can then be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable recognition, it is important that the features extracted from the training image be detectable even under changes in image scale, noise and illumination. Such points usually lie on highcontrast regions of the image, such as object edges. As shown in Fig.2, such barcode information can appear in multiple orientations. To assist blind persons to read barcode from these kinds of hand-held objects, we have conceived of a camera-based assistive barcode reading framework to track the object of interest within the camera view and extract barcode information from the object [6]. This SIFT algorithm [7] can effectively handle complex background and multiple orientations, and extract text information from hand-held objects. the webcam is in RGB24 format. The frames from the video is segregated and undergone to the preprocessing. The data processing component is used for Object-of-interest detection, Bar-code localization. Algorithms are developed to determine the product and to locate the barcode on the product to describe the local image. The audio output component is to inform the blind user of recognized text codes in the form of speech or audio. II. RELATED WORK Fig 2. Examples of Bar-code from hand-held objects and its multiple orientations. A barcode is an optical machinereadable representation of data relating to the object to which it is attached. Originally barcodes systematically represented data by varying the widths and spacing of parallel lines, and may be referred to as linear or one-dimensional (1D). Later they evolved into rectangles, dots, hexagons and other geometric patterns in two dimensions (2D). Although 2D systems use a variety of symbols, they are generally referred to as barcodes as well. The captured two dimensional signals are sampled and quantized to yield digital images. In assistive reading systems for blind persons, it is very challenging for users to position the object of interest within the centre of the camera’s view. As of now, there are still no acceptable solutions. We approach the problem in stages [8]. To make sure the hand-held object appears in the camera view, we use a camera with sufficiently wide angle to accommodate users with only approximate aim. This may often result in other text objects appearing in the camera’s view (for example while shopping at a supermarket). To extract the hand-held object from the camera image, we develop a motion-based method to obtain a region of interest (ROI) of the object. Then we perform barcode recognition only in this ROI. The scene capture component collects scenes containing objects of interest in the form of images or video, it corresponds to a camera attached to a pair of sunglasses. The live video is captured by using web cam and it can be done using MATLAB libraries. The image format from Blind and visually impaired people are at a great disadvantage. There are various technologies developed to help the visually impaired people as shown in the references [9-11]. The previously developed algorithms to extract barcode from scene images and for matching of barcodes [8-10]. A survey paper about computer vision based assistive technologies to help people with visual impairments can be found in Ref. [5]. Since the SIFT algorithm was formally proposed, researchers have never stopped improving it. There are various algorithms developed among which the number of references of some of the algorithms are relatively high. Thus, these algorithms are selected and investigated. In the phase of descriptor establishing, SIFT uses a 128- dimensional vector to describe each key point. This high dimension makes the following step to SIFT (Image feature matching) slow. In order to reduce the dimensionality of describing each key point, Y. Ke [8] uses the Principal Component Analysis (PCA) method to replace the histogram method used in SIFT. This improved version is called PCA-SIFT. In the phase of descriptor establishment, SIFT only describes local information and doesn’t make use of global information. E.N. Mortensen [9] introduced a SIFT descriptor with Global context (called GSIFT), which adds a global texture vector to the basis of SIFT. In the phase of key point detection, SIFT only uses gray scale information of an image. A lot of colour information is discarded for the colour images. A.A.Farag proposed CSIFT, which adds colour invariance to the basis of SIFT and intends to overcome the short comings of SIFT for colour images. H.Bay [10] proposed SURF which is very similar to SIFT but adopts different processing methods in every step. H. Bay claimed SURF is an enhanced version of SIFT. J.M. Morel proposed Affine-SIFT (called ASIFT), which follows affine transformation parameters to correct images and intends to resist strong affine issues. Their performances differs in different situations: scale change, rotation change, blur change, illumination change and affine change. Each algorithm has its own advantages. SIFT and CSIFT perform the best under scale and rotation change. CSIFT improves SIFT under blur change and affine change but not under illumination change. PCA-SIFT is better in different situations like scale and rotation, blur and illumination. GSIFT is best in blur and illumination change but not in rotation, scale and affine change. SURF performance is common in all the situations, but runs the fastest. ASIFT performs best in affine change and good at scle and rotation change but performs worst in blur and illumination change. Since rotation change, scale change are parameters we consider, SIFT algorithm is used. III. ALGORITHM OVERVIEW SIFT is an image local feature description algorithm based on scale-space. Due to its strong matching ability, SIFT has many applications in different fields, such as image retrieval, image stitching, and machine vision. The procedure of SIFT mainly includes three steps: keypoint detection, descriptor establishing, and image feature matching which also includesScale-space extrema detection using Difference of Gaussian function, Keypoint localization and descriptor construction, Orientation assignment. The greatest characteristic of SIFT algorithm is scale invariance. In order to achieve scale invariance, SIFT uses a DoG (Difference of Gaussian) function, shown in equation (1), to do convolution on an image. G(x,y,σ)= 1 2𝜋𝜎 2𝑒 −[ 𝑥2 +𝑦2 ] 2𝜎2 (1) D(x,y,σ)= (G(x,y,kσ)-G(x,y,σ))*I(x,y) = L(x,y,kσ)-L(x,y,σ), (2) Where x,y are the coordinates and σ is standard deviation and also a parameter of Gaussian function. G(x,y,σ) denotes the Gussian function. D(x,y,σ) determines the difference of Gaussian function. It obtains different scale images by changing σ. Then, it subtracts the images which are adjacent in the same resolution to get a DoG pyramid. The DoG function is a kind of an improvement of a Gauss-Laplace algorithm [15], shown as equation 2, where I (x, y) denotes an input image, and k denotes a scale coefficient of an adjacent scale-space factor. SIFT compares each point with its adjacent 26 pixels, which is the sum of eight adjacent pixels in the same layer and nine pixels in the upper and lower adjacent layers. Fig 3. Visual Descriptor [9] representation of keypoint If the point is minimum or maximum, the location and scale of this point are recorded. Therefore, SIFT gets all extreme points of DoG scale-space, and locates extreme points exactly. After that, it removes low contrast and unstable edge points. It further removes interference points, using 2× 2 Hessian matrix obtained from adjacent difference images. Next, in the scale of each keypoint, SIFT computes the gradient strength and direction of every neighborhood. According to gradient directions, SIFT votes in histogram for every neighborhood, and uses the summations as the gradient strengths of a keypoint. And the main direction of this keypoint is defined as the direction whose gradient strength is maximal. Then, SIFT uses the keypoint as a center to choose an adjacent 16×16 region. After the region is chosen, SIFT divides this region into 4×4 sub-regions, and sums the gradient strength in each sub-region. SIFT uses eight directions in each subregion to generate an eight-dimensional vector. Thereby, SIFT gets a 128-dimensional feature description from 16 sub-regions, according to a certain order as shown in fig 3. IV. SYSTEM DESIGN This paper presents a prototype system of assistive text reading. As illustrated, the system framework consists of three functional components: scene capture, data processing and audio output. The scene capture component collects scenes containing objects of interest in the form of images or video. In our prototype, it corresponds to a camera attached to a pair of sunglasses. The data processing component is used for deploying our proposed algorithms, including 1) object-of-interest detection to selectively extract the image of the object held by the blind user from the cluttered background or other neutral objects in the camera view; and 2) barcode localization to obtain image regions containing barcode. We use a laptop as the processing device in our current prototype system. The audio output component is to inform the blind user of recognized text codes. A Bluetooth earpiece with minimicrophone is employed for speech output. This simple hardware configuration ensures the portability of the assistive text reading system.Fig 4 depicts a work flowchart of the prototype system. is converted to gray image. Then it is segmented and compared with the images in the database and extracted barcode region is obtained as shown below. start Acquire image using camera Fig. 5 Captured Image Fig. 6 RGB to Gray converted image Fig. 7 Segmented image Fig. 8 Extracted Barcode Detect Object of Interest No If Barcode detected Yes Extract barcode region Object recognition through barcode Output text to blind user in speech Stop Fig 4. Flowchart of proposed framework to read text from hand held objects for blind users. VI. CONCLUSION AND FUTURE WORK In this paper we have discussed a prototype system to recognise the product handheld objects for assisting blind persons. In order to solve the common aiming problem for blind users, we have proposed a motion-based method to detect the object of interest while the blind user simply shakes the object for a couple of seconds. This method can effectively distinguish the object of interest from background or other objects in the camera view. To extract barcode regions from complex backgrounds, we have proposed a SIFT algorithm for image matching and to obtain the product details of the matched image. SIFT algorithm is proposed in this paper based on its performance on scale and rotation change. We will also extend our algorithm to handle non-horizontal barcodes further more we will address significant human interface issues associated with recognising the hand-held objects for visually challenged users. V. EXPERIMENTAL RESULTS The results obtained by implementing the flowchart is as shown in the fig. 5. This result consists of the tested image whose barcode has to be obtained and REFERENCES [1] World Health Organization. 10 facts about blindness and visual impairment. [Online] Available: [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] www.who.int/features/factfiles/blindness/blindness_fact s/en/index.html Advance Data Reports from the National Health Interview Survey (2008). [Online] Available: http://www.cdc.gov/nchs/nhis/nhis_ad.html. ScanTalker, Bar code scanning application to help Blind Identify over one million products. [Online] Available: http://www.freedomscientific.com/fs_news/ PressRoom/en/2006/ScanTalker2-Announcement_330-2006.asp. Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol.2, pp.91-110. KReader Mobile User Guide, knfb Reading Technology Inc. (2008). [Online] Available: http://www.knfbReading.com. E. Ohbuchi, H. Hanaizumi, and L. A. Hock. Barcode readers using the camera device in mobile phones. In Proceedings of the 2004 International Conference on Cyberworlds, Washington, DC, USA, 2004, pp.260–265. Lowe, D.G. (1999). Object recognition from local scale invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision, pp.20-27.September 1999, vol. 2, 1150-1157. D. Chai and F. Hock. Locating and decoding EAN13 barcodes from images captured by digital cameras. In Information, Communications and Signal Processing, 2005 Fifth International Conference on, pp. 1595–1599, 2005. R.Manduchi and J. Coughlan, “(Computer) Vision without sight,” Commun. ACM, vol. 55, no. 1, pp. 96–104, 2012. The Portset Reader, TVI Technologies for the Visually Impaired Inc., Hauppauge, NY, USA.(2012). [Online]. Available: http://www.tviweb.com/products/porsetreader.html L. Ran, S. Helal, and S. Moore, “Drishti: An integrated indoor/outdoor blind navigation system and service,” in Proc. 2nd IEEE Annu. Conf. Pervasive Comput. Commun., 2004, pp. 23–40. Ke, Y., Sukthankar, R. (2004). “PCA-SIFT: A more distinctive representation for local image descriptors,” In Proceedings of Computer Vision and Pattern Recognition (CVPR2004), 27 June – 2 July 2004, vol. 2, pp. 506-513. Mortensen, E.N., Deng, H., Shapiro, L. (2005). A SIFT descriptor with global context. In Computer Vision and Pattern Recognition (CVPR 2005), 20-25 June 2005. IEEE, Vol. 1, pp.184-190. Morel, J.M., Yu, G. (2009). ASIFT: A new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, vol.2 , pp.438-469. Rabbani, H. (2011). Statistical modeling of low SNR magnetic resonance images in wavelet domain using Laplacian prior and two-sided Rayleigh noise for visual quality improvement. Measurement Science Review, vol.4, pp.125-130.