Name: Youssef Mohamed El-Sayed The Definition of Big Data What exactly is big data? To really understand big data, it’s helpful to have some historical background. Here is Gartner’s definition, circa 2001 (which is still the go-to definition): Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before. Characteristics Volume The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not. Variety The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion. Velocity The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in realtime. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing. Veracity It is the extended definition for big data, which refers to the data quality and the data value. The data quality of captured data can vary greatly, affecting the accurate analysis. Extensional If new fields in each element of the data collected can be added or changed easily. Scalability If the size of the data can expand rapidly. Value The utility that can be extracted from the data. Variability It refers to data whose value or other characteristics are shifting in relation to the context they are being generated. Fine-grained and uniquely lexical Respectively, the proportion of specific data of each element per element collected and if the element and its characteristics are properly indexed or identified. Relational If the data collected contains commons fields that would enable a conjoining, or metaanalysis, of different data sets. Why Is Big Data Important? The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as: Determining root causes of failures, issues and defects in near-real time. Generating coupons at the point of sale based on the customer’s buying habits. Recalculating entire risk portfolios in minutes. Detecting fraudulent behavior before it affects your organization. How Big Data works Before businesses can put big data to work for them, they should consider how it flows among a multitude of locations, sources, systems, owners and users. There are five key steps to taking charge of this big “data fabric” that includes traditional, structured data along with unstructured and semistructured data: Set a big data strategy. Analyze the data. Identify big data sources. Make data-driven decisions. Access, manage and store the data.