IT Showcase On: Big Data Analytics and Platform Services

IT Windows
Showcase On: BitLocker
Big Data Analytics
andEncryption
Platform Services
Drive
Quick Reference Guide
How Microsoft IT manages large amounts of data and produces analytic systems that maximizes big data’s
business value
The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business
decisions within your own company or organization.
EXECUTIVE OVERVIEW
The overarching challenge Microsoft faces in developing
enterprise solutions for business intelligence (BI) is the
same challenge that faces our customers and peers: big
data. With the cost of storage devices reduced to less than
the value of the data they can store, it is more cost
effective than ever to store as much information as can be
collected. Data accumulates at an expedited pace—one
estimate places the Internet alone at about 75 million
servers with more than 500 million terabytes of data.
Data on its own is not useful. Gaining insight from all of that
collected data by producing predictive analytic systems that
maximize that data’s value business value is the focus of
the work in big data at Microsoft.
Why you should care:

Big data can help unlock predictive trends and
develop proactive guidance.

Analytical insights from big data can improve the
quality of service levels.

The business collects a large amount of data as part
of its daily operations. If properly understood, that
data can provide important insights about customer
needs, business efficiency, predictions for future
opportunities, and much more.

Correlating proprietary data collected by business
and combined with publicly available information
(houses sold by zip code, laws passed in Congress,


economic forecasts) brings new insights for the
business and offers a strategic advantage for
business planning.
Understanding the use of current resources ranging
from network bandwidth to availability of natural
resources can provide important predictions for a
business’s ability to meet future demands or develop
new technology to stay competitive.
Companies that aren’t leveraging big data might be
putting themselves at a considerable competitive
disadvantage in the near future.
WHAT IS BIG DATA?
Many people think about big data as only large datasets,
but it is not just about that. Big data can solve new types of
questions and create new opportunities. A multitude of
data sources exist, including personal, organizational,
public, and private.
Some examples of where Big Data is generated include:




Enterprise resource planning, supply chain
management, customer relationship management,
and transactional web applications are classic
examples of systems-processing transactions. Highly
structured data in these systems is typically stored in
Microsoft® SQL Server® databases.
Web 2.0 is about how people and things interact
with each other or with your business. Web logs,
user click streams, social interactions and feeds, and
user-generated content are classic places to find
interaction data.
Big Data. The number of devices and technology
that generate ambient data has increased. Sensors
for heat, motion, pressure, and radio-frequency
identification and global positioning system chips
within such things as mobile devices, ATMs, and
even aircraft engines provide just some examples of
“things” that output ambient signals.
The Open Data Initiative means more and more
governmental data is being made publicly available.
WHAT ARE THE ATTRIBUTES OF BIG DATA?

Volume. Volume-size of data is relative to the
context of the current time. Like Moore’s law for
computation power doubling every 18 months, the
same is true of data.
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SQL Server, Surface, Windows, Windows Azure, Windows Server, and
Xbox are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.





Velocity. This is the rate at which data arrives at the
enterprise and is processed or well understood.
Variety. This has to do with all the various sources of
available data in all forms, formats, and shapes. For
example, structured data and unstructured data are
often used, but to clarify, all data used within the big
data context has some structure. When we refer to
unstructured data, we are actually referring to the
subcomponents that don’t have structure, such as
free-form text in a comments field or the image in an
auto-dated picture. Big data is any type of data—
structured and unstructured data such as text,
sensor data, audio, video, click streams, or log files.
Complexity/variability. This refers to the variability
of meaning as distinguished from the variety of
formats.
Veracity. This is about trusting the data being
consumed. How can data be acted upon if it can’t be
trusted? Establishing trust in big data presents a
huge challenge as the variety and number of sources
grows.
Visibility. To make informed decisions, you need to
have access to and be able to see all of the data that
is required to help you make those decisions.
Visibility is needed at the application layer to identify
emerging trends within dynamic data streams, but
the underlying infrastructure can act as sensors.
HOW DOES BIG DATA DIFFER FROM TRADITIONAL BI?
Where traditional BI relies on limited data sets, cleansed
data, and simple models and primarily supports causation
(what happened and why did it happened?), big data
analytics uses many diverse and uncorrelated data sets,
thrives on raw data, and uses ultra-complex predictive
models. It is bent toward correlation (multiple, unrelated
data sources turn up insights that cannot entirely be
explained).
Although the technology provides a scalable framework
that allows us to structure and process large amounts of
data, the scientists and statisticians create value for the
enterprise by transforming data into analysis solutions for
real-time decision making and implementing these
solutions in a production environment for access by
business users.
Traditional looks at descriptive and diagnostic analytics
focused on what has happened. Traditional BI implies
analytics and reporting on structured (SQL) data held in
relational databases in tables. In many instances, big data
processing includes a higher level of unstructured data in
combination with structured data. The real difference
between the traditional BI analysis and big data analysis is
in the results you are looking for. With traditional BI
analysis, a clearly defined set of input data exists that has a
(reasonably) well-defined set of information behind it. The
analysis is retrospective and provides information about
what has gone on in the past. With big data analysis, the
data is less well defined in meaning but provides
information for the future—not just a trend line from past
experience but predictive based on complex data.
ANALYSIS FOCUSES ON VALUE DRIVERS
Big data analytics (aka big data advanced analytics) focuses
on predictive and prescriptive analytics future trends and
how to take advantage of various trend as determined by
customers’ criteria (e.g., desire to maximize profit but take
on more risk, the desire to take less risk with modest profit
or no risk with minimal profit). Each is a valid path a
customer may choose, depending on the particular
business strategy.
Using big data analytics, we can help identify value driver
opportunities within the businesses, particularly around
revenue, cost, and risk.
ENABLING BIG DATA AT MICROSOFT IT
We think of big data as actually having two parts: the big
data technology itself, which allows us to hold and query
large volumes of data, and the people—scientists and
statisticians—who help us extract business value and drive
business insight from the data.
Extracting information out of various types of data takes
different skill sets. Data science is a multidisciplinary field: It
is important to form a team with a variety of strong
quantitative talents.
SOLUTION

Microsoft built and staffed Big Data Analytics &
Platform Services.

Microsoft IT invested in becoming an innovation
leader in Big Data–Big Math.

Microsoft IT now can offer services ranging from big
data architecture, design, and development to
operationalization of big data analytics capability
supported by data scientists, customized for the
business.
CHALLENGES

Federating external data sources. Already, most of
the data Microsoft uses comes from external
sources—partners, customers, vendors,
government, industry, social media—and the trend
is only growing. The challenge is to federate external
data with internal data so that it’s accessible and
usable whenever the business needs it. That requires
systems that validate external data rather than
trying to control it.

Producing predictive analytic systems. Traditional BI
systems produced a rear-view mirror look at data;
those systems can only analyze the results from
what has already happened. Big data offers more
than that. Big data analytic systems can predict what
is going to happen—first to gain greater competitive
advantage, and second to respond to the new
customer relationships that will increase in the
devices and services world, particularly the world of
continuous online services.
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SQL Server, Surface, Windows, Windows Azure, Windows Server, and
Xbox are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
BIG DATA ANALYTICS ARCHITECTURE
The Data Decision Sciences Group (DDSG) enables
customers to convert their raw data into credible,
consistent information by enriching data through enterprise
information management capabilities and advanced
analytics.
SQL Server provides strong data transformation capabilities
through SQL Server Integration Services, data cleansing
through SQL Server Data Quality Services, and data
governance through SQL Server Master Data Services.
Currently in the big data space, Apache Hadoop is
commonly seen as the solution to deploy. Hadoop is an
open source framework that supports data-intensive
distributed applications. The Hadoop platform includes the
Hadoop kernel, MapReduce, the Hadoop distributed file
system, and a variety of other projects, such as Apache Hive
and HBase, giving customers the ability to store and
harness unstructured and complex data types on
commodity hardware. Hortonworks is a Gold Partner that
built a Hadoop distribution that runs on top of Windows
Server® at Microsoft.
For predictive analytics, DDSG offers data-mining tools in
Microsoft SQL Server Analysis Services. Through Microsoft’s
self-service tools as well as the data-mining add-ins, you
can access and mash up data from virtually any source,
including data from the Windows Azure™ Marketplace, and
continue to refine those data sets to create compelling
analytical applications.
latest analytical techniques from the community of data
scientists.
BIG DATA ANALYTICS ENGAGEMENTS AT MICROSOFT
The High-Performance Big Data Platform and Analytic
Services support advanced analytic computing over large,
complex, diverse data sets (and often varied data types).
Microsoft IT data scientists partner with the business to
deliver actionable business insights, using their data to help
guide decision making. Microsoft IT’s Analytic Services
offering supports advanced predictive modeling, text
mining, experimental design and scenario testing, variation
detection, statistical surveys, and system simulation and
forecasting.
The DDSG at Microsoft has a history of delivering results in
big data. This section describes some of the projects for
which they helped business owners make decisions based
on data tempered with experience.
User segmentation



Built utilization-based customer segmentation by
analyzing the click stream from the Windows
Telemetry panel
Determined how customers use PCs
Segmented customers based on usage patterns
A business owner begins the engagement with the DDSG by
identifying the business problem to address. It is important
at this stage to have some idea of the kind of data and
about how much data is relevant for the analysis. Next, the
business owner works interactively with a project manager
and a data scientist to capture the business and data
requirements so that DDSG can help formulate an analysis
that will provide the decision-making information the
business needs. By its nature, this is an iterative process
and requires interaction to be sure the analysis will result in
valuable business decision-making information.
Predictive analytics. Microsoft provides out-of-the-box
data-mining algorithms with SQL Server Analysis Services:





Forecast sales and inventory, and discover which
items tend to be sold together.
Identify the most profitable customers, and
anticipate customer losses.
Uncover unintuitive relationships in data.
Look for themes and trends in unstructured text.
Identify and handle anomalies during data transfer
or data loading.
For advanced analytics, DDSG supports commonly used
non-Microsoft tools and frameworks such as Apache
Mahout and R and use the marketplace to tap into these
The DDSG runs the analysis and works with the business
partner to refine and get results. For some business
problems, that will be the end of the engagement for that
particular business problem. In other cases, the analysis will
need to be operationalized so as to provide ongoing
information as new data is collected.
BIG DATA PROJECTS AT MICROSOFT
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SQL Server, Surface, Windows, Windows Azure, Windows Server, and
Xbox are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.



Segmented for product planning in FY13
Applied advanced analysis—cookie data
Performed analyses that SQL Server couldn’t handle
End user profile (EUP)





Improved and provided insight and established a
process to identify potential licensing shortfalls for
Microsoft products and provided actionable BI to
enable cross-company antipiracy work
EUP project for software piracy detection
Found potential piracy scenarios and built user
profiles
Built a predictive model to identify piracy
Analyzed and modeled data from disparate data
with high volume and velocity
Unlicensed PCs







Analyzed the behavioral trending of new
Windows® 8 devices in the original equipment
manufacturer channel, downstream distributors and
resellers who are not properly licensed, and
subsequent impact on return on investment
Multi–billion-dollar business decisions
Marketing spend effectiveness
Segment analysis
Partner behaviors, cycle times, and trends
Software piracy
Dependency on data integrity, quality, security, and
governance
MS.com




Targeted visitors who showed an interest in
Surface™, Windows Phone, or Xbox® on the basis of
their MS.com or Windows Store behavior
Identified potential customers for Windows Phone 8,
Surface, and Xbox based on browsing behavior and
created banner ads directed toward these likely
customers
Combined data from a variety sources to target
customers
Used more than a terabyte of cookie data
BENEFITS

Able to get insights out of big data at Microsoft

Enabled enterprise-wide decision making

Microsoft IT is becoming an innovation leader in Big
Data–Big Math
CONCLUSIONS

Data storage costs are down. A vast collection of
available data from a variety of sources can now be
federated and analyzed.

Microsoft products, tools, services, and technologies
work with non-Microsoft products to deliver big data
analytics.

Microsoft IT built a big data analytics platform and
analytic services for the enterprise.

Prescriptive and proactive analytics enable
enterprise-wide decision making and drive business
value.
RESOURCES

Microsoft Big Data
www.microsoft.com/bigdata

Microsoft BI Blog
http://blogs.msdn.com/b/microsoft_business_intelli
gence1

Windows Azure
www.windowsazure.com/enus/home/scenarios/big-data

SQL Server
http://www.microsoft.com/enus/sqlserver/solutions-technologies/businessintelligence/big-data.aspx

Preview of the Windows Azure HDInsight Service
http://technet.microsoft.com/enus/library/hh315814.aspx

Microsoft Big Data Solution Sheet
http://download.microsoft.com/download/1/8/B/18
BE3550-D04C-4B3F-9310F8BC1B62D397/MicrosoftBigDataSolutionSheet.pdf
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SQL Server, Surface, Windows, Windows Azure, Windows Server, and
Xbox are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.