IBM - CBS

advertisement
Big Data Driven:
Official Statistics
Amish Patel, Big Data Leader for Government, Europe
amishpat@uk.ibm.com
Information Management
© 2011 IBM Corporation
Information Management
AGENDA
Drivers for leveraging Big Data
Implications of Big Data on Official Statistics
–Challenges & Opportunities
–Industrialisation and Collaborative model
–New products and indicators
© 2011 IBM Corporation
Information Management
DRIVERS FOR LEVERAGING BIG DATA
© 2011 IBM Corporation
Information Management
© 2011 IBM Corporation
Information Management
The Big Data Conundrum
 The economies of deletion have changed….
– Leading us into new opportunities and challenges
 The percentage of available data an enterprise can analyze is decreasing
proportionately to the data available to that enterprise
– Quite simply, this means as enterprises, we are getting
“more naive” about our business over time
 Just collecting and storing “Big Data” doesn’t drive a cent
of value to an organization’s bottom line
Data AVAILABLE to an
organization
Data an organization can
PROCESS
© 2011 IBM Corporation
Information Management
Implications Of Big Data On Official Statistics
6
© 2011 IBM Corporation
Information Management
Challenges & Opportunity
1.
Impact on Policy and Development issues
2.
Methodological: bridging the gaps by combining multiple data sources
3.
Technology (processing and storage)
4.
Security/Privacy
5.
Governance
6.
Financial
© 2011 IBM Corporation
Information Management
1. Impact On Policy And Development Issues
Example: Leveraging Big Data for Currency of National Statistics
© 2011 IBM Corporation
Information Management
2. Methodological
Example: Bridging the gaps by combining multiple data sources
© 2011 IBM Corporation
Information Management
3. Technology – Processing and Storage
Example: Storage is key to your Infrastructure
Cloud Agile
Efficient by Design
Designed
Deliver insights for
in
seconds
through
data
systems built to process
a variety of data at
scale
Incorporates cloud technologies to
improve service quality, speed of
delivery and efficiency
Smarter
Storage
Optimize performance and cost by
matching workloads with the best platform
to meet specific workload requirements
Self-Optimizing
10
© 2011 IBM Corporation
Information Management
Data Footprint Reduction
Active Data Backup
Data
Real-time
Compression
40-80%
Best
40-80%
20-30%
80-95 %
Best
• Real-Time Compression is a method
of reducing storage needs by
changing the encoding scheme as
the data is being read and written
– Short patterns for frequent data
– Longer patterns for infrequent data.
– Can achieve 40 to 80 percent reduction
in storage capacity.
Data
Deduplication
• Data deduplication is a
method of reducing storage
needs by eliminating
duplicate copies of data.
– Store only one unique instance
of the data
– Redundant data replaced with
pointer
© 2011 IBM Corporation
Information Management
Storage Tiers – A trade-off between performance and
cost
Server
Faster
Performance
Cache, Flash
and Solid-State Drives
Technologies allow us to place
and move data to the
appropriate storage tier to
balance between performance
and cost
Hard Disk Drives
Tape
Lower
Cost
Cloud
© 2011 IBM Corporation
Information Management
4. Security/Privacy
Need real-time data activity monitoring for security & compliance
Data Repositories
 Continuous, policy-based, real-time
monitoring of all data traffic activities,
including actions by privileged users
(databases, warehouses, file
shares, Big Data)
 Database infrastructure scanning for
missing patches, mis-configured privileges
and other vulnerabilities
 Data protection compliance automation
Host-based
Probes (S-TAPs)
Collector
Appliance
Key Characteristics







Single Integrated Appliance
Non-invasive/disruptive, cross-platform architecture
Dynamically scalable
SOD enforcement for DBA access
Auto discover sensitive resources and data
Detect or block unauthorized & suspicious activity
Granular, real-time policies
 Who, what, when, how
 100% visibility including local DBA access
 Minimal performance impact
 Does not rely on resident logs that can easily be erased
by attackers, rogue insiders
 No environment changes
 Prepackaged vulnerability knowledge base and
compliance reports for SOX, PCI, etc.
 Growing integration with broader security and
compliance management vision
© 2011 IBM Corporation
Information Management
5. Governance
Vision for information integration & governance
Traditional Approach
Structured, analytical, logical
Systems of Record
Transaction Data
Internal App Data
Mainframe Data
New Approach
Creative, holistic thought, intuition
Systems Of Engagement
Data
Data
Warehous
Warehouse
e
Structured
Repeatable
Linear
Hadoop
Hadoop
Streams
Streams
Web Logs
Information
Integration,
Governance &
Context
Accumulation
Unstructured
Exploratory
Iterative
OLTP System Data
ERP data
Social Data
Text & Images
Sensor Data
Tradition
Traditional
al
Sources
Sources
New
New
Sources
Sources
Systems Of Record and
Systems Of Engagement
RFID
© 2011 IBM Corporation
Information Management
Governance concerns for big data customers
How do I cleanse and
validate the results of my big
data analysis ?
How do I integrate and
link my big data
environment with my
current one ?
Agile. Simple.
Trusted
Information.
How do I create a
trusted view of my
customers and
products
for big data ?
How do I protect
data in a big data
environment ?
Is a governed and
auditable archive possible
with big data ?
© 2011 IBM Corporation
Information Management
Governance in an exploratory Big Data environment
1. Ensure trust & compliance
•Lineage of data as it enters and leaves the
big data system
•Secure the big data systems from
breaches
•Create masked dev and test analytics
clusters
Create privatized data in real time
or on the cluster to ensure data
protection
High Performance and
high quality data loads
Secured
BigInsights to
prevent any data
breaches
2. Accelerate time to value
•High performance data provisioning
•Integrated data integration and stream
analytics platform
3. Lower total cost of ownership
•Simplified tooling to improve productivity
of developers and testers
•Automated system security
•Complete visibility into the data
movement and lifecycle
Low cost
historical archive
loaded to Hadoop
for exploratory
analytics
Integration for improved
segmentation of analytical
data sources
© 2011 IBM Corporation
Information Management
6. Financial
Engagement Model
Business Model
Citizens-Pay
Information
(catalogue and datasets)
Invest
and
define
• To private
Company for
value-added
services to
citizens
NS
Incubate
and
evaluate
NS co-invests
Accelerate evolution
of ecosystem
Link Data
NS-Pay
• Pay to
private
Company for
inexpensive
services
• Typically
cloud-based
Businesses-Pay
• Services free
or discounted
• Funded by
other parts of
the business
• Can be nonprofit
organisations
Motivate
and
educate
Services built &
maintained by community
on top of open-data
© 2011 IBM Corporation
Information Management
Industrialisation and Collaborative Model
Leverage City Forward model for National Statistics
© 2011 IBM Corporation
Information Management
Impact on Everyday Life
How safe is
my neighborhood?

Which career is
right for me?

What type of
education do I need?
Sources: http://www.chicagocitycrime.com/, http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm, http://cityforward.org
© 2011 IBM Corporation
Information Management
New Products and Indicators
Evolving beyond statistics to predictive analytics, sharing complementary datasets
with private sector and citizens
Examples:
Predictive models for healthcare cost reduction and outcome optimisation
Epidemic outbreak surveillance – hotspots, progression waves
Aligning public services (federal, regional and city level) to existing and predictive
demographic data
© 2011 IBM Corporation
Information Management
Example: Traffic Management for Sustainability and Efficiency
 Multimodal Data Streams
–
–
–
–
–
–
–
–
–
–
–
GPS
Cell-phones (location tracking)
Public Transport (bus, docking)
Pollution measurements
Weather Conditions (including road conditions)
Optical traffic flow detectors
Travel time data based on plate recognition
Induction loop detector data
Accidents in network as they are being recorded
Road closures (road work, etc)
Still pictures from road cameras
 Real Time Traffic Monitoring &
Information
 (Multimodal) Travel Planner
GPS
Data
Streams
Real Time
Transformation
Logic
Real Time
Geo
Mapping
Interactive
visualization
Web
Server
Google
Earth
21
Real Time
Speed &
Heading
Estimation
Real Time
Aggregates &
Statistics
Storage
adapters
Data
Warehouse
Offline
statistical
analysis
© 2011 IBM Corporation
Information Management
Thank You
22
© 2011 IBM Corporation
www.sendsteps.com
Prepare to react; keep your phone ready!
Internet
TXT
1
Go to sendc.com
2
Log in with Session
3
Type WS2 <space> your answer
1
Text to +316 4250 0030
2
Type Session <space> WS2 <space> your answer
Information Management
Posting messages is anonymous
No additional charge per message
© 2011 IBM Corporation
Information Management
What kind of Use-case enabled by Big Data
technology do you think will add value to your
organisation for calculating official statistics?
Internet
Go to sendc.com and log in with Session Type WS2 <space> Your answer
TXT
Send to 06 4250 0030: Session Type WS2 <space> Your answer
© 2011 IBM Corporation
Download