Big Data PPt

advertisement
The Problem and the Solution to 21st Century
Organizational Innovation.
Trever Pearson
PA 740
Professor Hyde
12.10.12
DEFINITION.

Big Data is a phenomenon
defined by the rapid
acceleration of the expanding
volume of high velocity,
complex, and diverse types of
data which require advanced
technologies and methods to
enable their collection, storage,
dissemination, management,
and analysis.
TechAmerica Foundation (2012)
WHY IS IT
PROBLEMATIC?




Increased velocity of available
data is faster than most
organizations can keep pace
with.
Data synthesis requires
advanced technologies and
appropriate staff and expertise
on an ongoing basis.
Implementation requires
structural and organizational
culture change.
… And failure to respond will
leave a lagging organization
seriously behind.
Data Storage in Exabytes
Origins & Trends.
350
300
250
Data Storage.
200
150
Global
Data Storage
has increased from 0
to over 300 exabytes
between1986 and
2007.1
100
50
0
1986
1993
2000
2007
Detail: % Exabytes
100%
The type of global
data stored has
changed from 99%
Analog in 1986 to
96% Digital in 2007

90%
80%
70%
60%
Digital
50%
Analog
40%
30%
20%
10%
0%
1986
1
1993
2000
2007
(5 Exabytes = 10^18 gigs: Enough to contain every word ever spoken by all humans on Earth.
MGI (2011)
Computation Capacity
(Million Instructions per Second
Origins & Trends.
350
(cont…)
300
Computational Capacity.
250
 Computation capacity has
grown from 0 to over 300
exabytes of traffic from
1986 to 2007.
150
200
100
50
0
1986
 Information-producing
devices such as, mobile
phones, tablets, sensors
etc… have doubled since
100%
2000. Coupled with
90%
personal computing, traffic 80%
70%
in these areas increased
60%
from under 40 to nearly
50%
90% of all data created form 40%
1986 to 2007.
30%
1993
2000
Detail: % Million Instructions per
Second
Personal Computers
Video Game Consoles
Mobile Phones/PDA
Servers and Minframes
Supercomputers
Pocket Calculators
20%
10%
MGI (2011); Economist (2012)
2007
0%
1986
1993
2000
2007

The storage required for all of this data doubled
between 1999 and 2002, a 25% compound annual
growth rate.

1.8 zetabytes of data (the amount of 200 billion 2-hour
HD movies) were created globally in 2011; an amount
projected to double every year.

800 exabytes were created in 2009, projected to
increase 44 times by 2020.
It’s just like the universe, increasingly and
exponentially expanding.
MGI (2011)
DATA TYPES.


15% Structured
(database or
spreadsheet data)
85% Unstructured
(email, video, blogs,
call center
conversations,
Facebook posts,
Tweets, etc…)
Economist (2012)
DATA SOURCES.




Customer transactions with
personal information and
consumer behavior like Visa,
Amazon, etc…)
Multimedia content such as
High-Res health procedure
videos, YouTube, etc…
Social Media such as
Facebook and Twitter
Sensors and devices used in
industries such as, retail,
healthcare & automotive

The effective response to Big Data is crucial for leading
organizations to outperform their peers.

Companies are projected to increase operating margins
by more than 60% with the effective response to BIG
DATA.

Management decision making will be built upon
evidence and information.

Data driven decisions are just plain better decisions.
“You don’t manage what you don’t measure”.
McAffee & Brynjolfsson (2012)
HOW WILL BIG DATA
HELP?









By…
Replacing human decisionmaking with automated
formulas where appropriate
Reducing inefficiencies
Creating transparency
Discovering variability
Reducing security threats and
crime
Increasing ability to predict
mission outcomes
Reducing or eliminating waste
…just being innovative.
MGI (2011)
WHO WILL BIG DATA
HELP?
The five sectors to gain the
most from the use of Big Data:

Health Care

Public Sector
Administration

Manufacturing

Retail

Business/Organization
using Personal Local data

WHO IS AFFECTED?
HOW ARE THEY
AFFECTED?

The Public


Policy Makers


Contractors


Employees

Government transparency,
Bureaucratic efficiency…
…Privacy
Informed decision-making,
evidence based legislation
Monitoring contract
deliverables, reporting
Facilitation in workplace tasks,
enhanced communication,
etc…
Increased transparency over
organizational activity
OPPORTUNITIES.




Data-driven organizations
perform better on
measures of financial and
operational results than
those who do not
Data facilitate efficient
processes, saving time
and money
Data lead to innovation
Data will ultimately lead to
funding.
McAffee & Brynjolfsson (2012)
CHALLENGES.





Data-driven decision making
and collection processes
require organizational cultural
change
Strong Leadership is
necessary to set clear goals
and to ask the right questions
Skillful and talented Data/IT
Specialists must be on staff.
Lack of statistical and
technical skills in the labor
force
Potential cost of
implementation
Step 1. Source Data: Speed, Type and Amount.
What kind and how much data are we working with?
Assessing how hard it is to access
 Determining how it needs to be transformed
 Identifying the technologies to facilitate the process

Step 2. Data Preparation: Cleansing and Verification.
What do the data need for operational requirements?

Define methods required for data prep such as:
 Standardization, verification, filtering, etc…
Step 3. Data Transformation.
What is required to leverage the data?
Unstructured data may be broken down and presented in a structured
format
 Data sources can be aggregated to determine not-so-obvious
relationships between data types

TechAmerica Foundation (2012)
Step 4. Business Intelligence/Decision Support.
Tools, methods, techniques to leverage data



Data Mining
Visualization/Simulations
Keyword Searches & Syntax Analysis
Step 5. Analysts/Visualization.
How should the data be used?


Present data visually so it can be explored
Use data as is to support/enhance/improve existing
organizational processes
TechAmerica Foundation (2012)
TechAmerica Foundation (2012)

Staffing
 Data Analysts/IT Specialists,
etc…

Infrastructure

Funding
 Data storage, Software,
Hardware, Connectivity, etc…
 Technological investment

Performance objectives
related to desired mission
outcomes
 Standards/Metrics to compare
operational efficacy with mission
outcomes

A Data-Driven
organizational culture
 Data Prioritization as driving
force of organizational direction
and the culture to support it.

Openness to organizational
change
 Data prioritization will require
change!
1. Identify and Define mission objectives that need Big Data solutions
2. Assess current organizational capability, data sources, and technical
requirements
3. Identify success criteria, implementation timeline, potential
subsequent phases, required staffing levels, and “entry point”
1. Streams as entry point for high-velocity data needs
2. Un-bounded database/warehouse infrastructure for high-volume
data needs
3. “Hadoop”1 or similar type technologies for high-variety data
needs
4. Execute the plan as required
5. Review on an ongoing basis
1
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
TechAmerica Foundation (2012), Apache Hadoop




Assessment of mission outcome achievement
with improvement measures including
increased savings, improved efficiency, etc…
Identification of gaps in the links of the process
chain, if any (see slide 13)
Assessment of decisions being made (are the
right data available to facilitate the process?)
What are your Data/IT staff telling you?
• Expand and invest in the talent pool by creating a formal track for IT/Data managers with
training and certification in BIG DATA Analysis and technologies.
•Establish and broaden coalitions between industry academic and associations to develop
professional standards and shared best practices for the field.
•Expand “college-to-government service” internship programs focused on technical aspects
of BIG DATA.
•Strengthen and expand Office of Science and Technology Policy to facilitate further
research into new techniques and their applications to important problems across program
and policy sectors.
•Align incentives to promote data sharing for the common good.
•Provide further guidance with industry and stakeholders on privacy and data protection
practices.
•Develop intellectual property policies to promote innovation.
•Support necessary underlying IT/Communications infrastructure
MGI (2012), TechAmerica Foundation (2012)
Political resistance to BIG DATA may be minimal, resulting from a
history of activity including:

Government (Library of Congress, Bureau of Information
Resource Management)

Finance (Banks, Credit Card companies)

Internet search engines (Google)
…HOWEVER…
Bottom-up Resistance is likely
 The
Public
 Privacy concerns and the notion of
“Big Brother”
 Employees
 Data errors and the documentation
of mistakes
 Contractors
 Less room for error, increased
competition and accountability
FAQs.
1. How do you know if you
have a big BIG DATA problem?
2. How do you obtain insight
from your data?
3. Which technology is right for
my organization?
4. How long should it take to
implement?
5. What skills/expertise are
required on staff?
6. What about Privacy?
City A.M. (2012)
1. When available data is beyond your ability to
manage or when tapping into the insight it provides is
problematic.
2. Start by placing mission objectives at the heart of
every decision. While this might require change, even
the more traditional change management practices
may be of service. Let your Data staff tell you what
they need.
3. It depends on your mission objectives and the
type/amount/speed of data you need to inform your
decisions. To start, build upon what you already have.
4. Start with small, manageable steps and allow for
constant evaluation and revision. If the first phase
takes longer than 6 months, you’re too slow.
5. Data Analysis and Communication, Technical skills,
Database Management, and good ol’ fashioned
Critical Thinking.
6. As with any data collection/sharing advancement,
policies must be adjusted to address issues of privacy
as they affect the organization within the context of the
standards set in place (statutory or otherwise).
Congress is working on it. As far as the public is
concerned: Welcome to the 21st Century.
Identifying what big data means to you. (2012, Feb 24). City A.M.
London.
McAffee, A., Brynjolfsson, E. (2012). Big Data: The Management
Revolution. Harvard Business Review. Pp. 59-68.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C.
Byers, A.H. (2011). Big Data: The Next Frontier for Innovation,
Competition and Productivity. McKinsey Global Institute (MGI).
TechAmerica Foundation (2012). Demystifying Big Data: A Practical
Guide to Transforming the Business of Government. Washington, D.C.
Geography matters as much as ever despite digital revolution, says Patrick
Lane. The Economist. (2012).
Trever Pearson is a third-year Master’s student
in Public Administration at San Francisco State
University. With an emphasis in Policy Analysis
and Finance, his interests lie mostly in
evidence-based improvement in the policy
arena in sectors such as health, education,
finance, and income security.
Trever comes from a solid background in health
care policy implementation and evaluation in
the San Francisco public health network. He is
currently working as a Data Analyst for Curry
Senior Center, a community clinic serving the
elderly in San Francisco’s
Tenderloin neighborhood. His achievements there include the development of agencywide data collection and reporting processes for service quality improvements and
contract reporting.
With coursework in Urban Administration, Financial Management and Applied
Statistics, Trever aspires to use BIG DATA and research solutions for the improvement
of state and federal policies and agency operations.
Download