International Journal of Application or Innovation in Engineering & Management... Web Site: www.ijaiem.org Email: , Volume 3, Issue 1, January 2014

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 3, Issue 1, January 2014
ISSN 2319 - 4847
A Review on Image Conversion using
MapReduce in Cloud Computing Environment
Vinit B. Mohata1, Dhananjay M.Dakhane2, Ravindra L.Pardhi3
1,2,3
Department of Computer Science and Engineering, Sipna COET, Amravati, India
Abstract
Cloud computing is a colloquial expression used to describe a variety of different types of computing concepts that involve a large
number of computers connected through a real-time communication network. Many service providers who release Social Network
Services (SNS) [3] allows users to disseminate multimedia objects. SNS and media content providers are constantly working
toward providing multimedia-rich experiences to end users [4]. In order to develop a SNS based on large amounts of social
media, scalable mass storage for social media data created daily by users is needed. Although the ability to share multimedia
objects makes the Internet more attractive to consumers, clients and underlying networks are not always able to keep up with this
growing demand. Here, we apply a cloud computing environment to our Hadoop-based multimedia transcoding system.
Improvements in quality and speed are achieved by adopting Hadoop Distributed File System (HDFS) [12] for storing large
amounts of image data created by numerous users, MapReduce [10] for distributed and parallel processing of image data.
Keywords— Cloud computing, cloud testing, cloud infrastructure, Image Conversion.
1. INTRODUCTION
Cloud computing [1] has achieved remarkable interest from researchers and the IT industry for providing a flexible
dynamic IT infrastructure, QoS guaranteed computing environments, and configurable software services [2]. Due to these
advantages, many service providers who release Social Network Services (SNS) [3] allows users to disseminate
multimedia objects. SNS and media content providers are constantly working toward providing multimedia-rich
experiences to end users [4]. In order to develop a SNS based on large amounts of social media, scalable mass storage for
social media data created daily by users is needed. Although the ability to share multimedia objects makes the Internet
more attractive to consumers, clients and underlying networks are not always able to keep up with this growing demand.
Multimedia processing is characterized by large amounts of data, requiring large amounts of processing, storage, and
communication resources, thereby imposing a considerable burden on the computing infrastructure. The traditional
approach to transcoding multimedia data requires specific and expensive hardware because of the high-capacity and high
definition features of multimedia data. Therefore, general purpose devices and methods are not cost effective, and they
have limitations. Here, we apply a cloud computing environment to our Hadoop-based Image Coversion system.
Improvements in quality and speed are achieved by adopting Hadoop Distributed File System (HDFS) [12] for storing
large amounts of image data created by numerous users, MapReduce [10] for distributed and parallel processing of image
data. This platform is composed of two parts: A social media data analysis platform for large scalable data analysis; a
cloud distributed and parallel data processing platform for storing, distributing, and processing social media data.
2. LITERATURE REVIEW AND RELATED WORK
The amount of images being uploaded to the internet is rapidly increasing, with Facebook users uploading over 2.5 billion
new photos every month [18], however, applications that make use of this data are severely lacking. Current computer
vision applications use a small number of input images because of the difficulty is in acquiring computational resources
and storage options for large amounts of data [9]. As such, development of vision applications that use a large set of
images has been limited [9]. The Hadoop Mapreduce platform provides a system for large and computationally intensive
distributed processing (Dean, 2004), though use of Hadoops system is severely limited by the technical complexities of
developing useful applications [29].
Kocakulak and Temizel [15] used Hadoop and MapReduce to perform ballistic image analysis, which requires that a large
database of images be compared with an unknown image. They used a correlation method to compare the library of
images with the unknown image. The process imposed a high computational demand, but its processing time was reduced
dramatically when 14 computational nodes were utilized. Li et al. [16] used a parallel clustering algorithm with Hadoop
and MapReduce. This approach was intended to reduce the time taken by clustering algorithms when applied to a large
number of satellite images. The process starts by clustering each pixel with its nearest cluster and then calculates all the
Volume 3, Issue 1, January 2014
Page 407
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 3, Issue 1, January 2014
ISSN 2319 - 4847
new cluster centres on the basis of every pixel in one cluster set. The Problem with this approach is the storage space
required for stoarage is same of image size, so for larger set of images more space is required.
Golpayegani and Halem [17] used a Hadoop MapReduce framework to operate on a large number of Aqua satellite
images collected by the AIRS instruments. A gridding of satellite data is performed using this parallel approach. In all
those cases, with the increase of image data the parallel algorithm conducted by MapReduce exhibited superiority over a
single machine implementation. Moreover, by using higher performance hardware the superiority of the MapReduce
algorithm was better reflected. In the research conducted on image processing operated on top of the Hadoop
environment, which is a relatively new field started for working on satellite images, the number of successfully taken
approaches has been few.
3. ANALYSIS OF PROBLEM
Current approaches to processing images depend on processing a small number of images having a sequential processing
nature. These processing loads can almost fit on a single computer equipped with a relatively small memory. Still, we can
observe that more disk space is needed to store the large-scale image repository that usually results from satellite-collected
data.
The current processing of images goes through ordinary sequential ways to accomplish this job. The program loads image
after image, processing each image alone before writing the newly processed image on a storage device. Generally, we use
very ordinary tools that can be found in Photoshop, for example. Besides, many ordinary C and Java programs can be
downloaded from the Internet or easily developed to perform such image processing tasks. Most of these tools run on a
single computer with a Windows operating system. Although batch processing can be found in these single-processor
programs, there will be problems with the processing due to limited capabilities.
With the proliferation of online photo storage and social medias from websites such as Facebook, Flickr, and Picasa, the
amount of image data available is larger than ever before and growing more rapidly every day [18]. This alone provides
an incredible database of images that can scale up to billions of images. Incredible statistical and probabilistic models can
be built from such a large sample source. The management of unstructured data is recognized as one of the major
unsolved problems in the information technology (IT) industry, the main reason being that the tools and techniques that
have proved so successful transforming structured data into business intelligence and actionable information simply don’t
work when it comes to unstructured data. Unstructured data files often include text and multimedia content. Examples
include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other
kinds of business documents. Note that while these sorts of files may have an internal structure, they are still considered
"unstructured" because the data they contain doesn't fit neatly in a database. Experts estimate that 80 to 90 percent of the
data in any organization is unstructured. And the amount of unstructured data in enterprises is growing significantly -often many times faster than structured databases are growing. New approaches are necessary. Therefore, we are in need
of a new parallel approach to work effectively on massed image data. In order to process a large number of images
effectively, we use the Hadoop HDFS to store a large amount of remote sensing image data, and we use MapReduce to
process these in parallel. The MapReduce programming model will be actively working with this distributed file system.
It is these reasons that motivate the need for research with vision applications that take advantage of large sets of images.
Hadoop can process stores of both unstructured and structured data that are extremely large, very complex and changing
rapidly.
HDFS is characterized as a highly fault-tolerant distributed file system that can store a large number of very large files on
cluster nodes. MapReduce is built on top of the HDFS file system but is independent. MapReduce provides an extremely
powerful framework that works well on data-intensive applications where the model for data processing is similar or the
same. It is often the case with image-based operations that we perform similar operations throughout an input set, making
MapReduce ideal for image-based applications. However, many researchers find it impractical to be able to collect a
meaningful set of images relevant to their studies [9]. Additionally, many researchers do not have efficient ways to store
and access such a set of images. As a result, little research has been performed on extremely large image-sets. The goal of
this image transcoding is to create a tool that will make development of large-scale image processing.
4. PROPOSED WORK
In our proposed work we are trying to make an application which uses Hadoop MapReduce and Cloud Environment.
Which help to build an effective system for processing of big data processing and storage?
Hadoop it is a Flexible Infrastructure for large scale computation and data processing on a network of commodity
hardware. HDFS is the primary storage system used by Hadoop applications [5]. HDFS creates multiple replicas of data
blocks and distributes them on computed nodes throughout a cluster to enable reliable and extremely rapid computations.
HDFS has a master-slave structure and uses the TCP/IP protocol to communicate with each node. MapReduce is a
Volume 3, Issue 1, January 2014
Page 408
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 3, Issue 1, January 2014
ISSN 2319 - 4847
programming model for the parallel processing of distributed large-scale data [6]. MapReduce processes an entire largescale data set by dividing it among multiple servers.
MapReduce frameworks provide a specific programming model and a run-time system for processing and creating large
amounts of datasets which is amenable to various real-world tasks [8]. MapReduce framework also handles automatic
scheduling, communication, synchronization for processing huge datasets and it has the ability related with fault
tolerance. MapReduce programming model is executed in two main steps, called mapping and reducing. Mapping and
reducing are defined by mapper and reducer functions that are data processing functions. Each phase has a list of key and
values pairs as input and output. In the mapping, MapReduce input datasets and then feeds each data element to the
mapper as a form of key and value pairs. In the reducing, all the outputs from the mapper are processed and a final result
is created by reducer with merging process. MapReduce frameworks provide a specific programming model and a runtime system for processing and creating large amounts of datasets which is amenable to various real-world tasks [8].
MapReduce framework also handles automatic scheduling, communication, synchronization for processing huge datasets
and it has the ability related with fault tolerance. MapReduce programming model is executed in two main steps, called
mapping and reducing. Mapping and reducing are defined by mapper and reducer functions that are data processing
functions. Each phase has a list of key and values pairs as input and output. In the mapping, MapReduce input datasets
and then feeds each data element to the mapper as a form of key and value pairs. In the reducing, all the outputs from the
mapper are processed and a final result is created by reducer with merging process.
For making effective Image conversion application we have to utilize Hadoop MapReduce, cloud environment in single
system. Which will help to solve the problems of Big data compression and limited storage. Hadoop MapReduce includes
several stages, each with an important set of operations helping to get to your goal of getting the answers you need from
Big Image data. The process starts with a user request to run a MapReduce program and continues until the results are
written back to the HDFS. HDFS and MapReduce perform their work on nodes in a cluster hosted on racks of commodity
servers. By using Hadoop, MapReduce, Cloud we can solve all our problems related to big data processing and storage.
5. APPLICATIONS
5.1. Social Media: The Large data is generated from the social media platforms such as YouTube, Facebook, Twitter,
LinkedIn, and Flickr. The amount of images being uploaded to the internet is rapidly increasing, with Facebook
users uploading over 2.5 billion new photos every month. It can be used to improve applications performance by
greatly reducing the file size and network bandwidth required to display your application.
5.2. Business Applications: online shopping application where the every item has image data is shown. Companies’
data and scan copies of various documents.
5.3. Satellite images: This includes weather data or the data that the government captures in its satellite surveillance
imagery.
5.4. Photographs and video: This includes security, surveillance, and traffic video.
5.5. Website content: This comes from any site delivering unstructured content, like Flickr, or Instagram.
6. CONCLUSION
In this Review Paper, we are trying to make a MapReduce based image conversion module in a cloud-computing
environment to solve the problem of computing infrastructure overhead. Such overhead increases the burden on the
Internet infrastructure owing to the increase in large image data shared through the Internet. However, Image data
processing is time consuming and requires large computing resources. To solve this problem, we are trying to implement
an image conversing module that exploits the advantages of cloud computing.
REFERENCES
[1] M. Kim and H. Lee, “SMCC: Social media cloud computing model for developing SNS based on social media,”
Communications in Computer and Information Science, vol.206, pp.259-266, 2011.
[2] Z. Lei, “Media transcoding for pervasive computing,” in Proc. of 5th ACM Conf. on Multimedia, no4, pp.459-460,
Oct. 2001.
[3] Sun-Moo Kang, Bu-Ihl Kim, Hyun-Sok Lee, Young-so Cho, Jae-Sup Lee, Byeong-Nam Yoon, “A study on a public
multimedia service provisioning architecture for enterprise networks”, Network Operations and Management
Symposium, 1998, NOMS 98., IEEE, 15-20 Feb 1998,44-48 vol.1, ISBN : 0 -7803-4351-4.
[4] Hyeokju Lee, Myoungjin Kim, Joon Her, and Hanku Lee, “Implementation of MapReduce-based Image Conversion
Module in Cloud Computing Environment,” International Conference on Information Networking (icoin) 2012 .Date
1-3 Feb. 2012.
[5] R. Buyya, C. Yeo, S. Venugopal, J. Broberg, “Cloud computing and emerging IT platforms:
Volume 3, Issue 1, January 2014
Page 409
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 3, Issue 1, January 2014
ISSN 2319 - 4847
Vision, hype, and reality for delivering computing as the 5th utility,” Future Generation Computer Systems, vol.25, no.6,
pp.599-616, Jun. 2009.
[6] G. Barlas, “Cluster-based optimized parallel video transcoding,” Parallel Computing, vol.38,no.4-5, pp.226-244, Apr.
2012.
[7] Myoungjin Kim, Seungho Han, Yun Cui, Hanku Lee, and Changsung Jeong,” A Hadoop-based Multimedia
Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE,” KSII Transaction on Internet
and Information Systems VOL. 6, NO. 11, Nov 2012.
[8] I. Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang,” Video transcoding: An overview of various techniques and research
issues,” IEEE Transactions on Multimedia, vol.7, no.5, pp.793-804 Oct.
2005.
[9] S. Ghemawat, H. Gobioff and S.-T. Leung, “The google file system,” Operating Systems Review (ACM), vol.37, no.5,
pp.29-43, Oct. 2003.
[10] D. Seo, J. Kim and I. Jung, “Load distribution algorithm based on transcoding time estimation for distributed
transcoding servers,” in Proc. of 2010 Conf. on Information Science and Applications, article no.5480586, Apr.
2010.
[11] D.M. Boyd and N.B. Ellison, “Social network sites: Definition, history, and scholarship,” Journal of ComputerMediated Communication, vol.13, no.1, pp.210-230, Oct. 2007.
[12] J. Shafer, S. Rixner and A.L. Cox, “The Hadoop distributed file system: Balancing portability and performance,” in
Proc. of IEEE International Symposium on Performance Analysis of Systems and Software, pp.122-133, Mar. 2010.
[13] Hadoop MapReduce project, http://hadoop.apache.org/mapreduce/
[14] Hari Kalva, Aleksandar Colic, Garcia, Borko Furht, “Parallel programming for multimedia applications”,
Multimedia Tools and Applications, volume 51, number 2, 901-818, DOI:10.1007/s11042-010-o656-2.
[15] H. Kocakulak and T. T. Temizel, “A Hadoop solution for ballistic image analysis and recognition,” in 2011 Int.
Conf. High Performance Computing and Simulation (HPCS), Istanbul, pp. 836–842.
[16] B. Li, H. Zhao, Z. H. Lv, “Parallel ISODATA clustering of remote sensing images based on MapReduce,” in 2010
Int. Conf. Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Huangshan, pp. 380–383.
[17] N. Golpayegani and M. Halem, “Cloud computing for satellite data processing on high end compute clusters,” in
Proc. 2009 IEEE Int. Conf. Cloud Computing (CLOUD ’09), Bangalore, pp. 88–92.
[18] FACEBOOK, 2010. Facebook image storage. http://blog.facebook.com/blog.php?post=206178097130.
AUTHOR
Vinit B. Mohata Received Bachelor of Engineering in Information Technology from SGB Amravati
University & Pursuing Master of Engineering in Computer Engineering from Sipna College of Engineering
and Technology, Amravati.
Volume 3, Issue 1, January 2014
Page 410
Download