IERG4230 Introduction to IoT Big Data Analytics for IoT IERG4230: Big Data Analytics for IoT P.1 Big Data Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network IERG4230: Big Data Analytics for IoT P.2 Big Data IERG4230: Big Data Analytics for IoT P.3 Big Data “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. IERG4230: Big Data Analytics for IoT P.4 Big Data IERG4230: Big Data Analytics for IoT P.5 Big Data: 3Vs IERG4230: Big Data Analytics for IoT P.6 Big Data: 3Vs IERG4230: Big Data Analytics for IoT P.7 Big Data: Volume Data Volume 44x increase from 2009 2020 From 0.8 zettabytes to 35zb Data volume is increasing exponentially Exponential increase in collected/generated data IERG4230: Big Data Analytics for IoT P.8 Big Data: Volume 30 billion RFID 12+ TBs tags today (1.3B in 2005) 4.6 billion camera phones world wide of tweet data every day ? TBs of data every day 100s of millions of GPS enabled devices sold annually 2+ billion 25+ TBs of log data every day 76 million smart meters people on the Web by end 2011 in 2009… 200M by 2014 IERG4230: Big Data Analytics for IoT P.9 Big Data: Variety Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can only scan the data once A single application can be generating/ collecting many types of data Big Public Data (online, weather, finance, etc) IERG4230: Big Data Analytics for IoT P.10 Big Data: Types of Data Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can only scan the data once IERG4230: Big Data Analytics for IoT P.11 Big Data: Types of Data • Structured data – Typically stored in databases or spreadsheets, required to be managed in accordance with a standardised storage format and ontology e.g. names, place names, – e.g. SATAC applications, load, enrolments, FLO usage data • Unstructured data – text, audio, imagery, video – e.g. student email, chat rooms, questionnaire responses, lecture videos (audio & video) • Different data types lend themselves to different analytical techniques. Unstructured data often requires pre- processing prior to enable structured data analysis • Unstructured data analysis – Text : document clustering , topic detection, entity extraction (people, places, locations, dates, times etc., sentiment analysis (+,-) – Audio : speaker identification, language identification, speech to text, keyword spotting – Video analysis : face recognition, object recognition, target tracking IERG4230: Big Data Analytics for IoT P.12 Big Data: Data Types IERG4230: Big Data Analytics for IoT P.13 Big Data: Velocity • Data is generated fast and need to be processed fast • Online Data Analytics • Late decisions missing opportunities • Examples • E-Promotions: Based on your current location, your purchase history, what you like send promotions right now for store next to you • Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction IERG4230: Big Data Analytics for IoT P.14 Big Data: Velocity IERG4230: Big Data Analytics for IoT P.15 Big Data: Source of Data Mobile devices (tracking all objects all the time) Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Sensor technology and networks (measuring all kinds of data) The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 16 IERG4230: Big Data Analytics for IoT P.16 Big Data: Data Generation • The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data IERG4230: Big Data Analytics for IoT P.17 Big Data: Sources IERG4230: Big Data Analytics for IoT P.18 Big Data: 4Vs? IERG4230: Big Data Analytics for IoT P.19 Big Data: More Vs? IERG4230: Big Data Analytics for IoT P.20 Big Data: Drivers - Optimizations and predictive analytics Complex statistical analysis All types of data, and many sources Very large datasets More of a real-time - IERG4230: Big Data Analytics for IoT Ad-hoc querying and reporting Data mining techniques Structured data, typical sources Small to mid-size datasets P.21 Harnessing Big Data OLTP: Online Transaction Processing (DBMSs) OLAP: Online Analytical Processing (Data Warehousing) RTAP: Real-Time Analytics Processing (Big Data Architecture & technology) IERG4230: Big Data Analytics for IoT P.22 Challenges in Handling Big Data The Bottleneck is in technology New architecture, algorithms, techniques are needed Also in technical skills Experts in using the new technology and dealing with big data 23 IERG4230: Big Data Analytics for IoT IERG4230: Big Data Analytics for IoT P.24 Big Data: Use Cases IERG4230: Big Data Analytics for IoT P.25 Big Data: Market IERG4230: Big Data Analytics for IoT P.26 Big Data Technology IERG4230: Big Data Analytics for IoT P.27 Big Data: Enabling Technology IERG4230: Big Data Analytics for IoT P.28 Cloud Computing IT resources provided as a service Clouds leverage economies of scale of commodity hardware Compute, storage, databases, queues Cheap storage, processors high bandwidth networks & multicore Geographically distributed data centers “Out-sourced” deployment resource management, reduced Time Scaling: On demand provisioning, co-locate data and compute Reliability: Massive, redundant, shared resources Sustainability: Hardware not owned IERG4230: Big Data Analytics for IoT to IoT and Cloud PaaS Public cloud IaaS SaaS Public cloud domain Cloud management server Network control system Home cloud Mobile cloud Network domain Local cloud domain Object domain IERG4230: Big Data Analytics for IoT Location management, Service exposure, Billing, Identity management, Service Support functions Local resource management, Public cloud interaction Resource exposure, Resource Request NFC/ Bluetooth/ ZIgBee/ WiFi indoor objects Public resource management, QoS management, Service invocation, Admission control outdoor objects(wireless) P.30 Big Data :Computation Architecture IERG4230: Big Data Analytics for IoT Big Data : Distributed Algorithms on Hadoop IERG4230: Big Data Analytics for IoT Big Data – Storage Architecture IERG4230: Big Data Analytics for IoT Big Data – Storage Architecture IERG4230: Big Data Analytics for IoT Big Data – Special-Purpose Database IERG4230: Big Data Analytics for IoT Big Data – Special-Purpose Database IERG4230: Big Data Analytics for IoT Big Data – Special-Purpose Database IERG4230: Big Data Analytics for IoT Big Data – Special-Purpose Database IERG4230: Big Data Analytics for IoT Big Data – Special-Purpose Database IERG4230: Big Data Analytics for IoT Big Data – Platform Stack Examples IERG4230: Big Data Analytics for IoT Big Data Components IERG4230: Big Data Analytics for IoT Value of Big Data Analytics Big data is more real-time in nature than traditional DW applications Traditional DW architectures (e.g. Exadata, Teradata) are not wellsuited for big data apps Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps 42 IERG4230: Big Data Analytics for IoT Big Data: Analytics Aggregation and Statistics Data warehouse and OLAP Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining Statistical Modeling IERG4230: Big Data Analytics for IoT P.43 Big Data: Analytics • Learning analytics draws upon techniques from a number of established fields: – Statistics – Artificial Intelligence – Machine Learning – Data mining – Social Network Analysis – Text Mining and Web Analytics – Operational Research – Information Visualization • Application domains such as business intelligence, national security intelligence and learning analytics all have an interest in analysing large volumes of data from disparate data sources and are providing the business cases for the rapid growth in ‘big data’ & data analytics. • Learning analytics encompasses support to both the business and teaching functions of the learning institution. IERG4230: Big Data Analytics for IoT P.44 Big Data: Analytic Tools Data mining Statistical analysis Predictive analysis Correlation Regression Forecasting Process Modeling Optimization Simulation IERG4230: Big Data Analytics for IoT Business Intelligence: BI IERG4230: Big Data Analytics for IoT P.46 Big Data: Analytics IERG4230: Big Data Analytics for IoT P.47 Big Data Analytics IERG4230: Big Data Analytics for IoT P.48 Big Data Analytics IERG4230: Big Data Analytics for IoT P.49 Big Data: Structural Data Analysis Descriptive statistics – sums, means, std devs, basic plotting (graphs, charts, histograms) Data visualisation – tools that enable the human to see meaningful patterns in data Machine learning tools that enable computers to find patterns in data to perform either classification, clustering or prediction e.g. decision trees, neural networks, support vector machines, linear regression, self organising maps, k-means Predictive analytics – Algorithmic approaches (generally machine learning) for predicting key target variables of interest. IERG4230: Big Data Analytics for IoT P.50 Big Data: Visualization Structured Data IERG4230: Big Data Analytics for IoT Unstructured Data P.51 Big Data: Visualization Combining Structured & Unstructured Data Sources IERG4230: Big Data Analytics for IoT P.52 Dangers in Analytics Privacy Security Drawing decisions on incomplete data Drawing decisions on inaccurate data Using only data that supports our gut decisions Drawing the wrong conclusion from the data Stock prices example IERG4230: Big Data Analytics for IoT Big Data, IoT, Analytics IoT will enable Big Data Big Data needs Analytics Analytics will improve processes for more IoT devices IERG4230: Big Data Analytics for IoT P.54