Uploaded by lisiyang_0105

Big data book's summary

White Book of Big Data
Big Data started to maximum the business benefit from mid-2011 around the different
industries by variate data sources. There is the 3V model to define Big Data that are volume,
velocity and variety. Based on the data type semi-structured data, the ideal of ‘Linked Data’
links cross-referencing different information in loose web that can improve the quality of
any query and the accuracy. In the book, it also concentrates a fourth V called value.
Compared to transitional business analytical system, Big Data Solution generated in ‘realtime’ information that can respond to more quickly about the market trend. The improved
business insights support organization see the pattern of product or customer.
The key functions of Structure of a Big Data Solution are Data Integration, Data
Storage, Search and data Visualization, in order to deliver the useful insights and help
company to make better decisions.
The 3V model defines what Big Data is especially data speed that can tell business
decision-maker and business partner about product information in real time. In the
addition, the four vital is value to help organization understand the business value.
The business insights will keep tract product or customer in the long-term goal by
using data analysis systems and concepts.
Compared to traditional data warehousing and BI systems, Big Data Solution
requires IT departments to provide platforms that can deliver quick answer to variate teams
based on their issues and challenges.
Many organization needs to be aware current data warehousing and BI system only
operate on the constructed data. However, there are three basic data types that are
structured data, unstructured data and semi-structured data.
Big Data isn’t just search because search can’t handle velocity.
Based on Big Data dentition of the four Vs, the structure of a Big Data solution has
four key processes that are data integration, platform infrastructure, data access interface
and visualization.
The new point of the Big data solution is data storage function because the data can
be processed and analyzed in near real time.
Data visualization can let business people be easier to understand the story behind
massive data.
In other side, data privacy becomes important based on the growing ease of access.
The Eight-Fold Path of Data Science
The emerging communicate of data scientists summarizes the practical applications of data
science. Big Data technologies provide a connection between engineer and scientist to find
the meaningful insights of product or customer. The revolution focuses on connecting the
smart devices and their data to improve action and insights, in order to figure out an issue
and prevent damage. Annika Jimenez introduces an eightfold path of data science project
including four phases and four differential factors. The four phases finally deliver a
framework to apply a model and take action, and the four differential factors require in
each phase to ensure the application integrating the model with a business need.
Data scientist is popular in the education and industry fields. People has become to
pay attention of data science.
The basic methodology of analytics is Data Mining since it is foundation of data
structure tool by data collection and the variate data source connection.
The first phase of eight-fold for successful data science project is problem
formulation. The best problem formulation is related the company target and to solve the
The second phase of eight-fold is data step that is to build data process based on
connecting different tables.
The third phase of eight-fold is modeling step that is to apply right data selection and
data modeling to predict datasets.
The fourth phase of eight-fold is application that is to create a framework to take
The first differentiating factor of eight-fold is technology selection. It means to
selection right tools and right platform to solve a question.
The second differentiating factor of eight-fold is creative. It will create a new ideal to
solve the issue by improve the speed and more efficient.
The third differentiating factor of eight-fold is iterative approach. To make sure each
phase if iterative to the goals, it will bring great impact to the company.
The fourth differentiating factor of eight-fold is to build a narrative. It will explain
clearly about the impact of the platform such as what they did and how they changed.
Transforming Your Company into a Data Science-Drive
Data Science is the core of Big Data, and it introduces new methodologies to analyze data.
Big Data is impacting C-level executives to use platform to collect, store and analyze data.
An efficient enterprise depends on data science-driven such as predictive analytics rather
than data-driven. An initiation of a data science-powered transformation not always has
good outcome because of some degree of uncertainty and vulnerability. If a data sciencedriven enterprise has more thoughtful vision, the business value can increase faster. From
the line of business, a company frequently change the model between centralize and
decentralize model based on the project prioritization.
Compared to Big Data, Data Science creates new way to analysis such as predictive
modeling, machine learning, Map reduce and database algorithms. It more efficient
methodology to solve problem and predict the trend such as customer needs and
product sales.
An initiative utilization of data science enterprise faces multiple levels such as C-level
to consider strategic shift and risk.
For data science-driven transformation, there are two extremes that are good
outcome and bad outcome.
If each level of the transformation catalyst follows the enterprise’s vision, the
organization creates more business value.
In the value chain, data scientist in the middle of the influence, and their function is
to build model to centralize or decentralize data.
Data availability is a fundamental data science-driven enterprise. It will become
easier to access data and query data to deliver data visualization.
To improve effectiveness in the company, the important idea is to choose the right
platform to collect data and manipulate data.
For advanced analytics, there are two category that are “Big Data” toolkits and
“Small Data” toolkits. The tools include R, Stata, Matlab, SAS and SPSS.
To build data scientist team, it needs to people who have solid data scientist
background and right analytical leadership.
The path of operationalization is key to deliver the value to the business. The key
factors are production, storage, application and reporting.
The lack of defined process and program management can lead unsuccessful
delivery. In order to deliver wins, the instrumentation is key such as product
managers needs to drive and deploy the project requirements.
Data Mining from A to Z
Predictive analytics and data mining become more important to delivery insight and inform
business people to make better decisions. Data mining is to find the pattern of sale,
customer and marketing, and used to predict the future trend. The methodology can help
company to detect fraud, minimize risk and increase revenue. In the SAS data mining
process, there five steps to complete the process. The applications of data mining can apply
to different industries such as finance, marketing and health insurance. The article
introduces three SAS tools that are SAS Enterprise Miner, SAS Rapid Predictive Modeler and
SAS Model Manager. They focus on the different features, and face to different clients.
Data mining can segment customer behavior by demographics or attitudes such as
age, education and gender, in order to target valuable customers and increase
The first step of data mining process is to sample the data to represent the target
data sets. The sets include training set and validation set.
The second step is to explore the data such as clustering, classification and
regression to search the relationships.
The third step is to modify the data to recognize the significant variables during
model selection process.
The fourth step is to model the data by using different analytical techniques based
on the issues.
The final step is to assess the data and models for usefulness and reliability.
SAS Enterprise Miner is a great tool for data mining and predictive analytics. The tool
follows the five steps od data mining process.
SAS Rapid Predictive Modeler can quickly generate predictive models, and it is user
friendly interface for business analysts.
SAS Model Manager is another tool for government such as promote champion,
validate models and monitor model performance.
The first two SAS tools can allow business analyst to automate develop baseline, and
the tools also can apply data mining and statistics methodology to analyze the data.
The last one SAS tool can deliver a performance-monitoring dashboard.