DBI303 SELECT COUNT(*) FROM ParkingLot WHERE type = ‘AUTO’ AND color = ‘RED’ red cars last hour Doesn’t seem like a great solution… This is the streaming data paradigm in a nutshell – ask questions about data in flight. $ value of analytics Web Analytics – Ad placement, Financial Services, Smart Grids, Monitoring – Systems mgmt, Health Care, Manufacturing, etc. Forecasting in Enterprises Historical Trend Analysis years months days Time of interest hrs min sec Present Sources Caching Processing Distribution Visualization Refresh (Push) Operational Analytics Cache Reference Data Data Bus Web servers Microsoft StreamInsight Automated Decisions Message Bus Devices, Sensors Operational Dashboard (Ticking - Snapshot) Refresh (Push) Reporting Dashboard (Refreshed) Relational Database ETL Intra-Day Cubes Stock tickers & News feeds Service Broker Static Reports Re-compute (Pull) ETL Historic Cubes Mining, Validation, “What-If” Scenarios Analytical results need to reflect important changes in business reality immediately and enable responses to them with minimal latency Database Applications Event-driven Applications Query Paradigm Ad-hoc queries or requests Continuous standing queries Latency Seconds, hours, days Milliseconds or less Data Rate Hundreds of events/sec Tens of thousands of events/sec or more Query Semantics Declarative relational analytics Declarative relational and temporal analytics request response Event input stream output stream Latency Months StreamInsight Target Scenarios Days Relational Database Applications hours Operational Analytics Applications, e.g., Logistics, etc. Data Warehousing Applications Web Analytics Applications Minutes Seconds 100 ms Monitoring Applications Manufacturing Applications < 1ms 0 10 100 1000 10000 Aggregate Data Rate (Events/sec.) Financial trading Applications 100000 ~1million StreamInsight Application Development StreamInsight Application at Runtime Event sources Devices, Sensors Input Adapters StreamInsight Engine Output Adapters Event targets Pagers & Monitoring devices Standing Queries ` Web servers Query Logic Event stores & Databases Stock ticker, news feeds Query Logic KPI Dashboards, SharePoint UI Trading stations Query Logic Event stores & Databases Industry trends CEP advantage • Data acquisition costs are negligible • Process data incrementally, i.e., while it is in flight • Avoid loading while still doing the processing you want • Seamless querying for monitoring, managing and mining • Raw storage costs are small and continue to decrease • Processing costs are non-negligible Monitor KPIs Record raw data (history) Manage business via KPI-triggered actions • Data loading costs continue to be significant Mine historical data Devise new KPIs Manufacturing: • Sensor on plant floor • React through device controllers • Aggregated data • 10,000 events/sec Web Analytics: • Click-stream data • Online customer behavior • Page layout • 100,000 events /sec Financial Services: • Stock & news feeds • Algorithmic trading • Patterns over time • Super-low latency • 100,000 events /sec Power Utilities: • Energy consumption • Outages • Smart grids • 100,000 events/sec Visual trend-line and KPI monitoring Batch & product management Automated anomaly detection Real-time customer segmentation Algorithmic trading Proactive condition-based maintenance Asset Specs & Parameters Stream Data Store & Archive Data Stream Data Stream Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds Event Processing Engine Lookup • Threshold queries • Event correlation from multiple sources • Pattern queries Push StreamInsight Grouping Aggregati on Output Adapters Input Adapters Market Feed: -MSFT -IBM -etc. Push Push Asset Class Ticker Exchange SUM Volume SUM Bid SUM Ask Stock MSFT NASDAQ 100 100 100 Stock IBM NASDAQ 200 200 200 Pull Pull Temporal LINQ Example – JOIN, PROJECT, FILTER: from e1 in MyStream1 join e2 in MyStream2 on e1.ID equals e2.ID where e1.f2 == “foo” select new { e1.f1, e2.f4 }; Join Filter Project LINQ Example – GROUP&APPLY, WINDOW: from e3 in MyStream3 group e3 by e3.i into SubStream from win in SubStream.HoppingWindow( FiveMinutes,ThreeSeconds) select new { i = SubStream.Key, a = win.Avg(e => e.f) }; Grouping Window Project & Aggregate Web servers Data Sources StreamInsight Sensors StreamInsight Devices Feeds Aggregation & Correlation StreamInsight StreamInsight StreamInsight StreamInsight StreamInsight StreamInsight Complex Analytics & Mining Event processing engines are deployed at multiple places on different scales: • At the edge close to the data source • In the mid-tier consolidate related data sources • In the data center historical archive, mining, large scale correlation StreamInsight CEP for lightweight processing and filtering StreamInsight CEP for aggregation and correlation StreamInsight CEP for complex analytics including historical data Parallel Data Warehouse Workload Standard Enterprise Datacenter Custom/Packaged OLTP Apps 4 procs, 64GB RAM, Backup Compression 8 procs, 2TB RAM, Adv. Security, Backup Compression >8 procs, OS Max, Adv. Security, Backup Compression N/A 1 VM/license 4 VMs/license, Resource Governor App & Multi-Server Mgmt (up to 25 instances) Unlimited Virtualization, Resource Governor, App & Multi-Server Mgmt (> 25 instances) N/A Scale-Up DW, Data Compression Scale-Up DW, Data Compression Scale-Out DW 10s of TBs, Up to 30 TB with FastTrack 10s of TBs 10s - 100s of TBs Server Consolidation Data Warehousing Business Intelligence Dept/Team BI Enterprise-Scale BI, Master Data Services, PowerPivot Mgmt Enterprise-Scale BI, Master Data Services, PowerPivot Mgmt Complex Event Processing (StreamInsight) <5000 events/sec & > 5 sec latency <5000 events/sec & > 5 s latency >5000 events/sec & < 5 s latency Integrated with SSIS, SSAS and SSRS Future coverage Scenarios: Manufacturing Utilities Oil & Gas Financial Services Web Analytics Telco Alarming AMI/SmartGrid Well Monitoring Risk Management Behavioral Targeting CDR Aggregation Notifications Outage Management Operational Intelligence Market Monitoring Load Monitoring Real-Time Analysis ISV: OSIsoft Matrikon ICONICS OSIsoft Matrikon Telvent ICONICS OSIsoft Matrikon Lab49 IMGroup MSFT AdCenter XBox DPE SI: Logica Logica Logica Hitachi Consulting Lab49 IMGroup MSFT AdCenter XBox DPE DevelopmentStreamInsight experience with .NET, Application Development C#, LINQ and Visual Studio 2008 and 2010 Event sources Devices, Sensors Support for .NET sequences as Web servers and sources sinks; Flexible adapter SDK to Eventconnect stores & Databases to other event sources and sinks Stock ticker, news feeds StreamInsight Application at Runtime CEP platform from Microsoft to build event-driven applications Input Adapters StreamInsight Engine Event targets Output Adapters Standing Queries Query Query Event-driven applications are fundamentally Logic Logic different from traditional database applications: queries are continuous, consume and produce streams, and compute results incrementally Query Logic Pagers & Monitoring devices The CEP KPI Dashboards, platform does ` SharePoint UI the heavy lifting for you to deal with temporal Trading stations characteristics of event stream data Event stores & Databases product main page blog Hitchhiker’s guide Blog post with download location MSDN documentation samples http://northamerica.msteched.com www.microsoft.com/teched www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn