Information Management

advertisement
Smart Data Analysis for IoT (Internet of
Things) Applications
Kun-Lung Wu, Ph.D., Manager
Data-Intensive Systems & Analytics Group (IBM T. J. Watson Research Center)
InfoSphere Streams Language & Research (IBM SWG)
Information Management
© 2014 IBM Corporation
Information Management
As IoT applications become more pervasive,
there is a real-time big data explosion
Internet of
Things
Everything
Almost anything
can be equipped
and connected
to the Internet
Real-Time
Big Data Explosion
Real-time data
analysis is an
integral part of
many IoT
applications
They can generate, in
real-time, streams and
streams of data
© 2014 IBM Corporation
Information Management
Examples of IoT Applications
• Smart cities
 Traffic control, emergency management, etc
• Health care
 Aiding the elderly, ICU alert management, health
monitoring via wearable devices, etc
• Agriculture & food
 Precision farming, cold chain management, etc
• Industrial applications
 Manufacturing process monitoring, engine monitoring, etc
• Environmental monitoring
 Water, Waste, Air Quality, etc
• Retail applications
3
© 2014 IBM Corporation
Information Management
What is different in IoT data?
There are many extremes
There are greater
amounts of data
Volume
Process and act on data
more quickly in real time
Velocity
Use more types data
Variety
Use uncertain data
Veracity
© 2014 IBM Corporation
Information Management
Traditional versus IoT Big Data
Traditional Approach
IoT Big Data Approach
Analyzed
Information
Available Information
Analyze Small Subsets of Information
Analyze ALL Available Information
Analyze All Information
Leverage more of the data being captured
© 2014 IBM Corporation
Information Management
Traditional versus IoT Big Data
Traditional Approach
Analyzed
Information
A Small Amount of Carefully Cleansed Information
Carefully Cleanse
Information Before Any
Analysis
IoT Big Data Approach
Analyzed
Information
A Very Large Amount of Messy Information
Analyze Information As
Is, Cleanse As Needed
Reduce effort required to leverage data
© 2014 IBM Corporation
Information Management
Traditional versus IoT Big Data
Traditional Approach
Analyze data AFTER it has
been processed and landed
in a Warehouse or Mart
IoT Big Data Approach
Analyze data IN MOTION as
it is generated, in real-time
Leverage data as it is captured
© 2014 IBM Corporation
Information Management
RE-
8
Standard assumptions
Re-think for IoT data analysis
Clean and correct data
Take advantage of and tolerate uncertainty
Transactional guarantees
Good enough
Normalized, structured data
Store data in elemental form
Explicit relationships kept
Relationships found at query
ACID properties
Relaxed constraints
Centrally managed storage
Loosely distributed data
Store-and-process
Process in motion
Reliable hardware
Built with full expectation of failures
Query, insert, delete with SQL
Query, operators, analytics at point of data
Reference/context data on disk
Reference and context data in memory
© 2014 IBM Corporation
Information Management
From data at rest to data in motion
Data at
Data in
9
© 2014 IBM Corporation
Information Management
IBM InfoSphere Streams Delivers Real-Time Analytics
For Big Data In Motion
Real time delivery
ICU
Monitoring
Algorithmic
Trading
Volume
Terabytes per
second
Petabytes per day
Variety
All kinds of data
All kinds of analytics
Velocity
Insights in
microseconds
Cyber
Security
Millions of
events per
second
Environment
Monitoring
Powerful
Analytics
Government /
Law
enforcement
Telco Churn
Prediction
Smart
Grid
Microsecond
Latency
Traditional / Non-traditional
data sources
Example Streaming Data Sources:
Video, audio, networks, social media
© 2014 IBM Corporation
Information Management
Big Data in Real Time with Stream Processing
Filter / Sample
Modify
Annotate
Analyze
Fuse
Classify
Score
Windowed
Aggregates
© 2014 IBM Corporation
Information Management
InfoSphere Streams: For superior real time analytic processing
Streams Processing Language (SPL)
built for Streaming applications:
Compile groups of operators
into single processes:
Efficient use of cores
Distributed execution
Very fast data exchange
Can be automatic or tuned
Scaled with push of a button
Reusable operators
Rapid application development
Continuous “pipeline” processing
Use the data that gives
you a competitive
advantage:
Can handle virtually
any data type
Use data that is too
expensive and time
sensitive for traditional
approaches
Easy to extend:
Built in adaptors
Users add capability with
familiar C++ and Java
Easy to manage:
Automatic placement
Extend applications incrementally
without downtime
Multi-user / multiple applications
12
Dynamic analysis:
Flexible and high
performance transport:
Programmatically change
topology at runtime
Create new subscriptions
Create new port properties
Very low latency
High data rates
© 2014 IBM Corporation
Information Management
What Are People Doing With Streams?
Stock market
Telephony
 CDR processing
 Social analysis
 Churn prediction
 Impact of weather on securities prices
 Analyze market data at ultra-low latencies
 Geomapping
Transportation
 Intelligent traffic
management
Law Enforcement,
Defense & Cyber-Security
 Real-time multimodal surveillance
 Situational awareness
 Cyber security detection
Fraud prevention
 Detecting multi-party fraud
 Real-time fraud prevention
Smart Grid & Energy
 Transactive control
 Phasor Monitoring Unit
Health & Life
Sciences
 Neonatal ICU
monitoring
 Epidemic early warning
system
Natural Systems
 Remote healthcare
 Wildfire management
monitoring
 Water management
13
e-Science
 Space weather prediction
 Detection of transient events
 Synchrotron atomic research
Other
 Manufacturing
 Text Analysis
 Who’s Talking to Whom?
 ERP for Commodities
 FPGA Acceleration
© 2014 IBM Corporation
Information Management
Asian telco reduces
billing costs and improves
customer satisfaction
Problem: Call volume increased to
the point that batch processing in a
warehouse no longer worked
1) Too expensive, 2) too slow, and
3) no capacity left for BI
Solution:
Real-time mediation and analysis of
8B CDRs per day
Data processing time reduced from
12 hrs to 1 sec
Hardware cost reduced to 1/8th
Further enabled: Proactively addressing
issues impacting customer satisfaction,
real time offers based on usage
14
© 2014 IBM Corporation
Information Management
Harnessing the Largest Predictive Focus Group in the World
Purpose
– Understand public sentiment towards an event:
movie trailers
– Deeply understand the potential customer profile:
gender, occupation, intent to watch
– Alter marketing launch plans based on insight
Background
– 1.1 Billion Tweets analyzed
– 5.7 Million blogs/forum posts
– 3.5 million messages
– Also: Facebook, Google+, Tumblr, Flickr
© 2014 IBM Corporation
Information Management
University of Ontario
Institute of Technology
(UOIT) Detects Neonatal
Patient Symptoms Sooner
• Performing real-time analytics
using physiological data from
neonatal babies
• Continuously correlates data from
medical monitors to detect subtle
changes and alert hospital staff
sooner
“Helps detect life
threatening conditions
up to 24 hours sooner”
• Early warning gives caregivers the
ability to proactively deal with
complications
© 2014 IBM Corporation
16
Information Management
Challenges and opportunities
 Approach overload
– Is there a convergence of approaches?
– Is there a “write once, use any technology” approach across tool types
 Skills to apply techniques
– Reduce the skill required?
– More people who can be data scientists, developers, and business/domain savvy?
 Uncertain data
– Confidence levels need to follow data and decisions
 New analytic algorithms
– Real time learning and adaptation?
– More automation
 Availability
– What does it mean for in-memory systems?
– How should disaster recovery work?
 Cloud
– Security of Data
– Data movement
 Data governance, security, and privacy
 What new problems can we solve?
© 2014 IBM Corporation
Information Management
To Learn more
Resources
– Streams: streamsDev
– IBM Big Data: ibm.com/bigdata
– IBMBigDataHub.com
– BigDataUniversity.com
– Books / analyst papers
© 2014 IBM Corporation
Information Management
Try Stream Processing
http://Ibm.co/streamsqs
2 download
options!
19
© 2014 IBM Corporation
Information Management
20
© 2014 IBM Corporation
Download