Big Data Driven: Official Statistics Amish Patel, Big Data Leader for Government, Europe amishpat@uk.ibm.com Information Management © 2011 IBM Corporation Information Management AGENDA Drivers for leveraging Big Data Implications of Big Data on Official Statistics –Challenges & Opportunities –Industrialisation and Collaborative model –New products and indicators © 2011 IBM Corporation Information Management DRIVERS FOR LEVERAGING BIG DATA © 2011 IBM Corporation Information Management © 2011 IBM Corporation Information Management The Big Data Conundrum The economies of deletion have changed…. – Leading us into new opportunities and challenges The percentage of available data an enterprise can analyze is decreasing proportionately to the data available to that enterprise – Quite simply, this means as enterprises, we are getting “more naive” about our business over time Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s bottom line Data AVAILABLE to an organization Data an organization can PROCESS © 2011 IBM Corporation Information Management Implications Of Big Data On Official Statistics 6 © 2011 IBM Corporation Information Management Challenges & Opportunity 1. Impact on Policy and Development issues 2. Methodological: bridging the gaps by combining multiple data sources 3. Technology (processing and storage) 4. Security/Privacy 5. Governance 6. Financial © 2011 IBM Corporation Information Management 1. Impact On Policy And Development Issues Example: Leveraging Big Data for Currency of National Statistics © 2011 IBM Corporation Information Management 2. Methodological Example: Bridging the gaps by combining multiple data sources © 2011 IBM Corporation Information Management 3. Technology – Processing and Storage Example: Storage is key to your Infrastructure Cloud Agile Efficient by Design Designed Deliver insights for in seconds through data systems built to process a variety of data at scale Incorporates cloud technologies to improve service quality, speed of delivery and efficiency Smarter Storage Optimize performance and cost by matching workloads with the best platform to meet specific workload requirements Self-Optimizing 10 © 2011 IBM Corporation Information Management Data Footprint Reduction Active Data Backup Data Real-time Compression 40-80% Best 40-80% 20-30% 80-95 % Best • Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as the data is being read and written – Short patterns for frequent data – Longer patterns for infrequent data. – Can achieve 40 to 80 percent reduction in storage capacity. Data Deduplication • Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data. – Store only one unique instance of the data – Redundant data replaced with pointer © 2011 IBM Corporation Information Management Storage Tiers – A trade-off between performance and cost Server Faster Performance Cache, Flash and Solid-State Drives Technologies allow us to place and move data to the appropriate storage tier to balance between performance and cost Hard Disk Drives Tape Lower Cost Cloud © 2011 IBM Corporation Information Management 4. Security/Privacy Need real-time data activity monitoring for security & compliance Data Repositories Continuous, policy-based, real-time monitoring of all data traffic activities, including actions by privileged users (databases, warehouses, file shares, Big Data) Database infrastructure scanning for missing patches, mis-configured privileges and other vulnerabilities Data protection compliance automation Host-based Probes (S-TAPs) Collector Appliance Key Characteristics Single Integrated Appliance Non-invasive/disruptive, cross-platform architecture Dynamically scalable SOD enforcement for DBA access Auto discover sensitive resources and data Detect or block unauthorized & suspicious activity Granular, real-time policies Who, what, when, how 100% visibility including local DBA access Minimal performance impact Does not rely on resident logs that can easily be erased by attackers, rogue insiders No environment changes Prepackaged vulnerability knowledge base and compliance reports for SOX, PCI, etc. Growing integration with broader security and compliance management vision © 2011 IBM Corporation Information Management 5. Governance Vision for information integration & governance Traditional Approach Structured, analytical, logical Systems of Record Transaction Data Internal App Data Mainframe Data New Approach Creative, holistic thought, intuition Systems Of Engagement Data Data Warehous Warehouse e Structured Repeatable Linear Hadoop Hadoop Streams Streams Web Logs Information Integration, Governance & Context Accumulation Unstructured Exploratory Iterative OLTP System Data ERP data Social Data Text & Images Sensor Data Tradition Traditional al Sources Sources New New Sources Sources Systems Of Record and Systems Of Engagement RFID © 2011 IBM Corporation Information Management Governance concerns for big data customers How do I cleanse and validate the results of my big data analysis ? How do I integrate and link my big data environment with my current one ? Agile. Simple. Trusted Information. How do I create a trusted view of my customers and products for big data ? How do I protect data in a big data environment ? Is a governed and auditable archive possible with big data ? © 2011 IBM Corporation Information Management Governance in an exploratory Big Data environment 1. Ensure trust & compliance •Lineage of data as it enters and leaves the big data system •Secure the big data systems from breaches •Create masked dev and test analytics clusters Create privatized data in real time or on the cluster to ensure data protection High Performance and high quality data loads Secured BigInsights to prevent any data breaches 2. Accelerate time to value •High performance data provisioning •Integrated data integration and stream analytics platform 3. Lower total cost of ownership •Simplified tooling to improve productivity of developers and testers •Automated system security •Complete visibility into the data movement and lifecycle Low cost historical archive loaded to Hadoop for exploratory analytics Integration for improved segmentation of analytical data sources © 2011 IBM Corporation Information Management 6. Financial Engagement Model Business Model Citizens-Pay Information (catalogue and datasets) Invest and define • To private Company for value-added services to citizens NS Incubate and evaluate NS co-invests Accelerate evolution of ecosystem Link Data NS-Pay • Pay to private Company for inexpensive services • Typically cloud-based Businesses-Pay • Services free or discounted • Funded by other parts of the business • Can be nonprofit organisations Motivate and educate Services built & maintained by community on top of open-data © 2011 IBM Corporation Information Management Industrialisation and Collaborative Model Leverage City Forward model for National Statistics © 2011 IBM Corporation Information Management Impact on Everyday Life How safe is my neighborhood? Which career is right for me? What type of education do I need? Sources: http://www.chicagocitycrime.com/, http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm, http://cityforward.org © 2011 IBM Corporation Information Management New Products and Indicators Evolving beyond statistics to predictive analytics, sharing complementary datasets with private sector and citizens Examples: Predictive models for healthcare cost reduction and outcome optimisation Epidemic outbreak surveillance – hotspots, progression waves Aligning public services (federal, regional and city level) to existing and predictive demographic data © 2011 IBM Corporation Information Management Example: Traffic Management for Sustainability and Efficiency Multimodal Data Streams – – – – – – – – – – – GPS Cell-phones (location tracking) Public Transport (bus, docking) Pollution measurements Weather Conditions (including road conditions) Optical traffic flow detectors Travel time data based on plate recognition Induction loop detector data Accidents in network as they are being recorded Road closures (road work, etc) Still pictures from road cameras Real Time Traffic Monitoring & Information (Multimodal) Travel Planner GPS Data Streams Real Time Transformation Logic Real Time Geo Mapping Interactive visualization Web Server Google Earth 21 Real Time Speed & Heading Estimation Real Time Aggregates & Statistics Storage adapters Data Warehouse Offline statistical analysis © 2011 IBM Corporation Information Management Thank You 22 © 2011 IBM Corporation www.sendsteps.com Prepare to react; keep your phone ready! Internet TXT 1 Go to sendc.com 2 Log in with Session 3 Type WS2 <space> your answer 1 Text to +316 4250 0030 2 Type Session <space> WS2 <space> your answer Information Management Posting messages is anonymous No additional charge per message © 2011 IBM Corporation Information Management What kind of Use-case enabled by Big Data technology do you think will add value to your organisation for calculating official statistics? Internet Go to sendc.com and log in with Session Type WS2 <space> Your answer TXT Send to 06 4250 0030: Session Type WS2 <space> Your answer © 2011 IBM Corporation