Uploaded by Johnson Chan

IERG4230 BigData Analytics(1)

advertisement
IERG4230 Introduction to IoT
Big Data Analytics
for IoT
IERG4230: Big Data Analytics for IoT
P.1
Big Data

Lots of data is being collected
and warehoused
 Web data, e-commerce
 purchases at department/
grocery stores
 Bank/Credit Card
transactions
 Social Network
IERG4230: Big Data Analytics for IoT
P.2
Big Data
IERG4230: Big Data Analytics for IoT
P.3
Big Data



“Big Data” is data whose scale, diversity, and complexity
require new architecture, techniques, algorithms, and
analytics to manage it and extract value and hidden
knowledge from it…
Big data is high volume, high velocity, and/or high variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery and
process optimization.
The challenges include capture, curation, storage, search,
sharing, transfer, analysis, and visualization.
IERG4230: Big Data Analytics for IoT
P.4
Big Data
IERG4230: Big Data Analytics for IoT
P.5
Big Data: 3Vs
IERG4230: Big Data Analytics for IoT
P.6
Big Data: 3Vs
IERG4230: Big Data Analytics for IoT
P.7
Big Data: Volume

Data Volume



44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
Exponential increase
in collected/generated
data
IERG4230: Big Data Analytics for IoT
P.8
Big Data: Volume
30 billion RFID
12+ TBs
tags today
(1.3B in 2005)
4.6 billion
camera
phones
world wide
of tweet data
every day
? TBs of
data every day
100s of
millions of
GPS
enabled
devices
sold
annually
2+ billion
25+ TBs of
log data
every day
76 million smart meters
people on
the Web
by end
2011
in 2009…
200M by 2014
IERG4230: Big Data Analytics for IoT
P.9
Big Data: Variety







Relational Data (Tables/Transaction/Legacy
Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data

Social Network, Semantic Web (RDF), …
Streaming Data

You can only scan the data once
A single application can be generating/
collecting many types of data
Big Public Data (online, weather, finance, etc)
IERG4230: Big Data Analytics for IoT
P.10
Big Data: Types of Data





Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
 Social Network, Semantic Web (RDF), …
Streaming Data
 You can only scan the data once
IERG4230: Big Data Analytics for IoT
P.11
Big Data: Types of Data
•
Structured data
– Typically stored in databases or spreadsheets, required to be managed in
accordance with a standardised storage format and ontology e.g. names, place
names,
– e.g. SATAC applications, load, enrolments, FLO usage data
•
Unstructured data
– text, audio, imagery, video
– e.g. student email, chat rooms, questionnaire responses, lecture videos (audio &
video)
•
Different data types lend themselves to different analytical techniques. Unstructured
data often requires pre- processing prior to enable structured data analysis
•
Unstructured data analysis
– Text : document clustering , topic detection, entity extraction (people, places,
locations, dates, times etc., sentiment analysis (+,-)
– Audio : speaker identification, language identification, speech to text, keyword
spotting
– Video analysis : face recognition, object recognition, target tracking
IERG4230: Big Data Analytics for IoT
P.12
Big Data: Data Types
IERG4230: Big Data Analytics for IoT
P.13
Big Data: Velocity
• Data is generated fast and need to be processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase
history, what you like  send promotions right now for store next to
you
• Healthcare monitoring: sensors monitoring your activities and body
 any abnormal measurements require immediate reaction
IERG4230: Big Data Analytics for IoT
P.14
Big Data: Velocity
IERG4230: Big Data Analytics for IoT
P.15
Big Data: Source of Data
Mobile devices
(tracking all objects all the time)
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Sensor technology and
networks
(measuring all kinds of data)


The progress and innovation is no longer hindered by the ability to collect
data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable
fashion
16
IERG4230: Big Data Analytics for IoT
P.16
Big Data: Data Generation
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming
data
IERG4230: Big Data Analytics for IoT
P.17
Big Data: Sources
IERG4230: Big Data Analytics for IoT
P.18
Big Data: 4Vs?
IERG4230: Big Data Analytics for IoT
P.19
Big Data: More Vs?
IERG4230: Big Data Analytics for IoT
P.20
Big Data: Drivers
-
Optimizations and predictive analytics
Complex statistical analysis
All types of data, and many sources
Very large datasets
More of a real-time
-
IERG4230: Big Data Analytics for IoT
Ad-hoc querying and reporting
Data mining techniques
Structured data, typical sources
Small to mid-size datasets
P.21
Harnessing Big Data



OLTP: Online Transaction Processing (DBMSs)
OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
IERG4230: Big Data Analytics for IoT
P.22
Challenges in Handling Big Data

The Bottleneck is in technology


New architecture, algorithms, techniques are needed
Also in technical skills

Experts in using the new technology and dealing with big
data
23
IERG4230: Big Data Analytics for IoT
IERG4230: Big Data Analytics for IoT
P.24
Big Data: Use Cases
IERG4230: Big Data Analytics for IoT
P.25
Big Data: Market
IERG4230: Big Data Analytics for IoT
P.26
Big Data Technology
IERG4230: Big Data Analytics for IoT
P.27
Big Data: Enabling Technology
IERG4230: Big Data Analytics for IoT
P.28
Cloud Computing

IT resources provided as a service


Clouds leverage economies of scale of commodity hardware



Compute, storage, databases, queues
Cheap storage,
processors
high
bandwidth
networks
&
multicore
Geographically distributed data centers
“Out-sourced”
deployment
resource
management,
reduced
Time

Scaling: On demand provisioning, co-locate data and compute

Reliability: Massive, redundant, shared resources

Sustainability: Hardware not owned
IERG4230: Big Data Analytics for IoT
to
IoT and Cloud
PaaS
Public
cloud
IaaS
SaaS
Public
cloud
domain
Cloud management
server
Network
control
system
Home
cloud
Mobile
cloud
Network
domain
Local cloud
domain
Object
domain
IERG4230: Big Data Analytics for IoT
Location management,
Service exposure, Billing,
Identity management, Service
Support functions
Local resource management,
Public cloud interaction
Resource exposure,
Resource Request
NFC/
Bluetooth/
ZIgBee/ WiFi
indoor objects
Public resource management,
QoS management, Service
invocation, Admission control
outdoor objects(wireless)
P.30
Big Data :Computation Architecture
IERG4230: Big Data Analytics for IoT
Big Data : Distributed Algorithms
on Hadoop
IERG4230: Big Data Analytics for IoT
Big Data – Storage Architecture
IERG4230: Big Data Analytics for IoT
Big Data – Storage Architecture
IERG4230: Big Data Analytics for IoT
Big Data – Special-Purpose
Database
IERG4230: Big Data Analytics for IoT
Big Data – Special-Purpose
Database
IERG4230: Big Data Analytics for IoT
Big Data – Special-Purpose
Database
IERG4230: Big Data Analytics for IoT
Big Data – Special-Purpose
Database
IERG4230: Big Data Analytics for IoT
Big Data – Special-Purpose
Database
IERG4230: Big Data Analytics for IoT
Big Data – Platform Stack
Examples
IERG4230: Big Data Analytics for IoT
Big Data Components
IERG4230: Big Data Analytics for IoT
Value of Big Data Analytics



Big data is more real-time in
nature than traditional DW
applications
Traditional DW architectures (e.g.
Exadata, Teradata) are not wellsuited for big data apps
Shared nothing, massively parallel
processing, scale out architectures
are well-suited for big data apps
42
IERG4230: Big Data Analytics for IoT
Big Data: Analytics

Aggregation and Statistics



Data warehouse and OLAP
Indexing, Searching, and Querying

Keyword based search

Pattern matching (XML/RDF)
Knowledge discovery

Data Mining

Statistical Modeling
IERG4230: Big Data Analytics for IoT
P.43
Big Data: Analytics
•
Learning analytics draws upon techniques from a number of established fields:
– Statistics
– Artificial Intelligence
– Machine Learning
– Data mining
– Social Network Analysis
– Text Mining and Web Analytics
– Operational Research
– Information Visualization
•
Application domains such as business intelligence, national security intelligence
and learning analytics all have an interest in analysing large volumes of data
from disparate data sources and are providing the business cases for the rapid
growth in ‘big data’ & data analytics.
•
Learning analytics encompasses support to both the business and teaching
functions of the learning institution.
IERG4230: Big Data Analytics for IoT
P.44
Big Data: Analytic Tools









Data mining
Statistical analysis
Predictive analysis
Correlation
Regression
Forecasting
Process Modeling
Optimization
Simulation
IERG4230: Big Data Analytics for IoT
Business Intelligence: BI
IERG4230: Big Data Analytics for IoT
P.46
Big Data: Analytics
IERG4230: Big Data Analytics for IoT
P.47
Big Data Analytics
IERG4230: Big Data Analytics for IoT
P.48
Big Data Analytics
IERG4230: Big Data Analytics for IoT
P.49
Big Data: Structural Data Analysis
Descriptive statistics – sums, means, std devs, basic plotting (graphs,
charts, histograms)
Data visualisation –
tools that enable the human to see meaningful patterns in data
Machine learning tools that enable computers to find patterns in data to perform either
classification, clustering or prediction
e.g. decision trees, neural networks, support vector machines, linear
regression, self organising maps, k-means
Predictive analytics –
Algorithmic approaches (generally machine learning) for predicting
key target variables of interest.
IERG4230: Big Data Analytics for IoT
P.50
Big Data: Visualization
Structured Data
IERG4230: Big Data Analytics for IoT
Unstructured Data
P.51
Big Data: Visualization
Combining Structured & Unstructured Data Sources
IERG4230: Big Data Analytics for IoT
P.52
Dangers in Analytics






Privacy
Security
Drawing decisions on incomplete data
Drawing decisions on inaccurate data
Using only data that supports our gut decisions
Drawing the wrong conclusion from the data
 Stock prices example
IERG4230: Big Data Analytics for IoT
Big Data, IoT, Analytics
IoT will enable Big Data
Big Data needs Analytics
Analytics will improve processes
for more IoT devices
IERG4230: Big Data Analytics for IoT
P.54
Download