A Review on Cloud to Handle and Process Big Data Nishu Arora1

A Review on Cloud to Handle and Process Big Data Nishu Arora1, Rajesh Kumar Bawa2 M.Tech Student1, Associate Professor2 Department of Computer Science, Punjabi University Patiala1, 2 Abstract: Cloud computing provides services to internet by utilizing resources of the computing infrastructure to provide different services of the internet. It allows consumers and businesses to use applications without installation and access their personal files at any computer with internet access. A distributed storage system for managing structured data at Google called Bigtable. Bigtable is designed to reliably scale to petabytes of data and thousands of machines. Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability. Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Personalized Search, Google Earth and many more. In this paper a review is done to analyze the cloud performance on data stored at data centers. Keywords: Cloud data, data center, map reduce, distributed file system. INTRODUCTION The idea behind the Cloud is that users can use the service anytime, anywhere through the Internet, directly through the use of browser. In cloud computing data is stored in virtual space as it uses the browsers to use network services. Since, networks are associated so main concern is of security of data. a. Bigtable Resemblance of Bigtable with database is due to sharing of implementation strategies with databases. Bigtable provides a different interface than Parallel databases and mainmemory databases. Data is indexed using row and column names that can be arbitrary strings. Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk [1]. b. Big data It is large pools of data that can be captured, communicated, aggregated, stored, and analysed. Big Data is not a technology, but rather a phenomenon resulting from the vast amount of raw information generated across society, and collected by commercial and government organisations. Big Data generally refers to datasets that are not susceptible to analysis by the relational database tools, statistical analysis tools and visualisation aids that have become familiar over the past twenty years since the start of the rapid increase in digitised sensor data. Instead, it requires ‘massively parallel software running on tens, hundreds, or even thousands of servers in some (currently extreme) cases’. Important characteristics of data include:  Volume. It includes the mass data of enterprise  Variety. It deals with the complexity of multiple data types, including structured and unstructured data.  Velocity. It is the speed at which data is disseminated and also the speed at which it changes or is.  Variable Veracity. It deals with the level of reliability associated with certain types of data. It uses the normalization effect to analye data at these vastly different orders of magnitude. Big data can be broken into two areas: LITERATURE SURVEY 1. Big Data Transaction Processing (a.k.a. 1. F.Chang et al. [1] suggested that Bigtable is a Big transactions) distributed storage system for which manages 2. Big Data Analytics structured data that is designed to scale to a very large size: petabytes of data across Big Data transaction processing deals with thousands of commodity servers. Many extreme volumes of transactions that may projects at Google store data in Bigtable, update data in relational DBMSs or file including web indexing, Google Earth, and systems. Typically, relational DBMSs are Google Finance. These applications place very used as it is often the case that so-called different demand son Bigtable, both in terms ACID properties are found missing in many of data size (from URLs to web pages to NoSQL DBMSs. This is only a problem if it satellite imagery) and latency requirements is unacceptable to lose a transaction e.g. a (from backend bulk processing to real-time Banking deposit. data serving). Despite these varied demands, Bigtable has successfully provided an Big Data Analytics is about segregation and extensible, high-performance solution for all segmentation of data. it is about advanced of these Google products. They describe the analytics on traditional structured and multisimple data model provided by Bigtable, structured data. It is a term associated with which gives clients dynamic control over data the new types of workloads and underlying layout and format, and we describe the design technologies needed to solve business and implementation of Bigtable. problem that we could not previously support due to technology limitations, A. Kumar et al. [2] highlighted that the cloud prohibitive cost or both. has been used as a metaphor for the internet. It is one of the most active applications for c. Types of big data enterprise. It has been more and more accepted by enterprises which can take The most popular new types of data that advantage of low cost, fast deployment and organisations want to analyse include: elastic scaling. Due to demand of large  Web data - e.g. web logs, e-commerce volume of data processing in enterprises, logs and social network interaction data huge amount of data are generated and  Industry specific big transaction data dispersed on internet. There is no guarantee e.g., Telco call data records (CDRs), that data stored on Cloud is securely geo-location data and retail transaction protected. A method to build a trusted data computing environment by providing a  Machine generated/sensor data - to secure platform in a Cloud computing system monitor everything from movement, is proposed. The proposed method can store temperature, light, vibration, location, data safely and efficiently in the Cloud. It airflow, liquid flow and pressure. RFIDs solves many problem of handling big data are another example. and security issues using encryption and  Text - e.g. from archived documents, compression technique while uploading data external content sources or customer to the Cloud storage. interaction data (including emails for sentiment analysis) X. Zhang et al. [3] discussed that with the development of cloud computing and mobile internet, the issues related to big data have been concerned by both academe and industry. Based on the analysis of the existed work, the research progress of how to use distributed file system to meet the challenge in storing big data is expatiated, including four key techniques: storage of small files, load balancing, copy consistency, and deduplication. They also indicated key issues need to concern in the future work. This paper discusses the four technologies of distributed file system, small files problem, load balancing, replica consistency, deduplication. These technologies were analyzed and new ideas were given. Z. Zheng et al. [4] discussed that with the prevalence of service computing and cloud computing, more services are emerging on the Internet, generating huge volume of data. The overwhelming service-generated data become too large and complex to be effectively processed by traditional approaches. How to store, manage, and create values from the service-oriented big data become an important research problem. On the other side, with the increasingly large amount of data, a single infrastructure which provides common functionality for managing and analyzing different types of service-generated big data is urgently required. To address this challenge, they provide an overview of service-generated big data and Big Data-as-a-Service. First, three types of service-generated big data are exploited to enhance system performance. Then, Big Data-as-a-Service, including Big Data Infrastructure-as-a-Service, Big Data Platform-as-a-Service, and Big Data Analytics Software-as-a-Service, is employed to provide common big data related services to users to enhance efficiency and reduce cost. To provide common functionality of big data management and analysis, Big Data-as-aService is investigated to provide APIs for users to access the service-generated big data and the big data analytics results. L. Zhang et al. [5] illustrated that cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. In cloud computing the data should be moved efficiently from different locations around the world. The de facto approach of hard drive shipping is not flexible or secure. This work studies timely, cost-minimizing upload of massive, dynamically-generated, geo-dispersed data into the cloud, for processing using a MapReduce - like framework. It targets at a cloud encompassing disparate data centers. Model is a costminimizing data migration problem, and propose two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm, for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. Careful comparisons among these online and offline algorithms in realistic settings are conducted through extensive experiments, which demonstrate the close-to-offline-optimum performance of the online algorithms. W. Dou et al. [6] pointed out that cloud computing promises a scalable infrastructure for processing big data applications such as medical data analysis. Cross-cloud service composition provides a concrete approach capable for large-scale big data processing. However, the complexity of potential compositions of cloud services calls for new composition and aggregation methods, especially when some private clouds refuse to disclose all details of their service transaction records due to business privacy concerns in crosscloud scenarios. If a cloud fails to deliver its services according to its “promised” quality the credibility of cross-clouds and on-line service compositions will become suspicious. In view of these challenges, they proposed a privacy-aware crosscloud service composition method, named HireSome-II (History recordbased Service optimization method) based on its previous basic version HireSome-I. In this method, to enhance the credibility of a composition plan, the evaluation of a service is promoted by some of its QoS history records, rather than its advertised QoS values. Besides, the k-means algorithm is introduced into proposed method as a data filtering tool to select representative history records. As a result, HireSome-II can protect cloud privacy, as a cloud is not required to unveil all its transaction records. Furthermore, it significantly reduces the time complexity of developing a cross-cloud service composition plan as only representative ones are recruited, which is demanded for big data processing. Simulation and analytical results demonstrate the validity of our method compared to a benchmark. This composition evaluation approach achieves two advantages. Firstly, it reduces the time complexity as only some representative history records are recruited, which is highly demanded for big data applications. Secondly, the method protects privacy of a cloud as a cloud is not required to unveil all of its transaction records, which accordingly protects privacy in big data. Simulation and analytical results have demonstrated the validity of our method compared to a benchmark. CONCLUSION In this paper, we have discussed bigtable, bigdata in cloud, types of bigdata. Research findings of various authors have been studied and their results are discussed. Bigdata is a useful technology which makes the work easy to handle. REFERENCES [1] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, R. E. Gruber, “Bigtable: A Distributed Storage System for Structured Data,” pp.1-14, OSDI 2006. [2] A. Kumar , H. Lee and R. P. Singh, “Efficient and Secure Cloud Storage for Handling Big Data,” pp.1-5, 2012. [3] X. Zhang and F. Xu, “Survey of Research on Big Data Storage, ” pp.1-5, IEEE, 2013. [4] Z. Zheng, J. Zhu, and M. R. Lyu, “Servicegenerated Big Data and Big Data-as-a-Service: An Overview,” pp.1-8, IEEE, 2013. [5] L. Zhang, C. Wu, Z. Li, C. Guo, M. Chen, and F. C. M. Lau, “Moving Big Data to The Cloud: An Online Cost-Minimizing Approach,” IJCA, Vol.31- No. 12, Dec 2013. [6] W. Dou, X. Zhang, J. Liu, and J. Chen, “HireSomeII: Towards Privacy-Aware Cross-Cloud Service Composition for Big DataApplications,” pp.1-11,IEEE TPDS-2013-08-0725. [7] K. Kanagasabapathi, S. B. Akshaya, “Secure sharing of financial records with third party application integration in cloud computing,” pp.1-3, IEEE, 2013. [8] M. Ferguson, “Enterprise Information ProtectionThe Impact of Big Data”, pp.1-40, Intelligent Business Strategies, March 2013. [9] N. Couch and B. Robins, “Big Data for Defence and Security,”mOccasional Paper, 2013. [10] P. Amirian, A. Basiri, A. Winstanley, “Effiecint Online Sharing of Geospatial Big Data Using NoSQL XML Databases ,” pp.1, IEEE, 2013.

A Review on Cloud to Handle and Process Big Data Nishu Arora1

Related documents

Products

Support

A Review on Cloud to Handle and Process Big Data Nishu Arora1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib