• “The half-life of BI is typically shorter than the life of the project needed for its implementation.” --Industry whitepaper (see references) • “Predicting is hard… • …especially about the future” --Yogi Berra • A recent Gartner Group Advanced Technology Research Note listed data mining at the top of the five key technology areas that "will clearly have a major impact across a wide range of industries within the next 3 to 5 years." • Data Mining finds patterns in data • Data Mining finds patterns in data – Using Machine Learning Algorithms • Don’t worry: the hard yards are done – A lot at Microsoft Research • Data Mining finds patterns in data • Uses these patterns to make predictions SSAS ≠ Cube Dimensional Modelling: Build a Cube Learn MDX Construct Analyses …of the PAST Data Mining: Build Structure Use Model Make Predictions …about the Future • Cubes summarize facts: – For Example: • • • • Sums of Sales in all regions for all months Aggregated by Gender and Age For each Product … • Cubes summarize facts: – For Example: • • • • Sums of Sales in all regions for all months Aggregated by Gender and Age For each Product … – Data mining find patterns in data • Cubes summarize facts: – For Example: • • • • Sums of Sales in all regions for all months Aggregated by Gender and Age For each Product … – Data mining find patterns in data – Cubes abstract much of the interesting information • Cubes summarize facts: – For Example: • • • • Sums of Sales in all regions for all months Aggregated by Gender and Age For each Product … – Data mining find patterns in data – Cubes abstract much of the interesting information • Facts that form the patterns are lost in the Cube’s summations • Connect to Data Source • Highlight Exceptions • Forecasting • Key Influencers • Is it all just smoke and mirrors??? • Is it all just smoke and mirrors??? • “Excel data mining add-in was invented to make astrology look respectable!” – Donald Data, industry pundit Jargon: ADO = ActiveX Data Objects ADO MD = ADO Multidimensional AMO = Analysis Management Objects DSO = Decision Support Objects XMLA = XML for Analytics Books Online Contents or… Search For Data Mining Tutorials • Business Intelligence Development Studio • Demo: Key Influencers – Models and Model Viewers • • • • Decision Tree Cluster Naïve Bayes Neural Network Correlation Tree Node Correlation Tree Node • Hybrid • Linear regression & association & classification • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Group values into bins for performance • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Group values into bins for performance • Correlate input attributes with outcomes • Hybrid • Linear regression & association & classification • Algorithm highlights • • • • Remove rare attributes (“Feature Selection”) Group values into bins for performance Correlate input attributes with outcomes Find attribute separating outcomes with maximum information gain • Hybrid • Linear regression & association & classification • Algorithm highlights Remove rare attributes (“Feature Selection”) Group values into bins for performance Correlate input attributes with outcomes Find attribute separating outcomes with maximum information gain • Split tree and re-apply • • • • • Algorithm options: • Non-scalable (all records) • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) – 3 x faster than non-scalable • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) – 3 x faster than non-scalable • K – means (hard) • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) – 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) – 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) – Form initial cluster • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) – 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) – Form initial cluster – Assign probability each attribute-value in each cluster • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) – 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) – Form initial cluster – Assign probability each attribute-value in each cluster – Iterate until model = likelihood of data • Simple, fast, surprisingly accurate • Simple, fast, surprisingly accurate • “Naïve”: attributes assumed to be independent of each other • Simple, fast, surprisingly accurate • “Naïve”: attributes assumed to be independent of each other • Pervasive use throughout Data Mining • Simple, fast, surprisingly accurate • “Naïve”: attributes assumed to be independent of each other • Pervasive use throughout Data Mining P(Result | Data) = P(Data | Result) * P(Result) / P(Data) P(Girl | Trousers) = ? P(Trousers | Girl) = 20/40 P(Girl) = 40/100 P(Trousers) = 80/100 P(Girl | Trousers) = ? P(Trousers | Girl) = 20/40 P(Girl) = 40/100 P(Trousers) = 80/100 P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers) P(Girl | Trousers) = ? P(Trousers | Girl) = 20/40 P(Girl) = 40/100 P(Trousers) = 80/100 P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers) = (20/40)(40/100)/(80/100) = 20/80 = 0.25 2 Weight Cars W W W W W W 3 Weight Cars W W W W Weight Age Input Neurons Buy No W W W W W Hidden Neurons Output Neurons • Multilayer Perceptron Network = • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Assign weights: assess importance of input on output using training dataset • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Assign weights: assess importance of input on output using training dataset • Batch Learning – Start at outputs and propagate back through the network: • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Assign weights: assess importance of input on output using training dataset • Batch Learning – Start at outputs and propagate back through the network: – Evaluate weight accuracy: predicted value vs. holdout value • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Assign weights: assess importance of input on output using training dataset • Batch Learning – Start at outputs and propagate back through the network: – Evaluate weight accuracy: predicted value vs. holdout value – Adjust weights to improve prediction • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Assign weights: assess importance of input on output using training dataset • Batch Learning – Start at outputs and propagate back through the network: – Evaluate weight accuracy: predicted value vs. holdout value – Adjust weights to improve prediction » Weight can be negative to show inhibiting influence • Multilayer Perceptron Network = • Back-Propagated Delta Rule Network • Assign weights: assess importance of input on output using training dataset • Batch Learning – Start at outputs and propagate back through the network: – Evaluate weight accuracy: predicted value vs. holdout value – Adjust weights to improve prediction » Weight can be negative to show inhibiting influence • Iterate using conjugate gradient algorithm to converge • SSMS (aka SQL Mangler) – Analysis Services Database • Data Mining • Business Intelligence Development Studio • Lift Chart: Key Influencers – Decision Tree – Cluster – Naïve Bayes – Neural Network Lift Chart Operation Random: 50% Population Ideal: 100% Targeted Data Mining: 85% Bike Buyers • Demo: Targeted Mailing – Find prospective customers – Save results to database – Import in a new Data Source View – Process again with Data Mining! • • • • • Fill By Example Goal Seek What If Highlight Exceptions Data Mining Tab: – Explore Data – Clean Data, etc…. • • • • • Off-the-shelf toolkit No Cube required No code required Good default parameters Easily explored models – Change parameters, filter input, compare lift • Excel Add-In • • Data Mining Add-ins http://office.microsoft.com/en-us/excel-help/data-mining-add-insHA010342915.aspx#_Toc257717762 • • Analysis Services - Data Mining Videos http://msdn.microsoft.com/en-us/library/dd776389(v=SQL.100).aspx • • SQL Server Data Mining Home http://www.sqlserverdatamining.com/ssdm/ • • Microsoft Contoso BI Demo Dataset for Retail Industry http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=868662dc -187a-4a85-b611-b7df7dc909fc • • What Every IT Manager Should Know About Business Users’ Real Needs for BI http://docs.media.bitpipe.com/io_25x/io_25515/item_392177/Tableau_S_MktgLtr_BI_IT.pdf • • An Introduction to Data Mining : Discovering hidden value in your data warehouse http://www.thearling.com/text/dmwhite/dmwhite.htm • Problems: – Data to old to be useful – Need for instantaneous feedback • Solution: – StreamInsight • Complex Event Processing • Processing and querying of event data streams • Data queried while “in flight” • May involve multiple concurrent event sources • Works with high data rates • Aims for near-zero latency Months CEP Target Scenarios Days Relational Database Applications Hours Operational Analytics Applications (e.g., Logistics) Data Warehousing Applications Web Analytics Applications Minutes Seconds Monitoring Applications 100 ms Manufacturing Applications Financial Trading Applications < 1 ms 0 10 100 1000 10000 Aggregate Data Rate (Events/sec) 100000 higher Data Sources, Operations, Assets, Feeds, Sensors, Devices Input Data Streams Input Data Streams Output Data Streams CEP Engine Monitor & Record Operational Data Store & Archive Mine & Design f(x) f'(x) g(y) h(x,y) Manage & Benefit CEP Engine Results f(x) g(y) f'(x) h(x,y) StreamInsight Architecture • • • • • • Algorithmic trading Smart order routing Real-time profit and loss Rapid analysis of transactional cost Fraud detection Risk management • Often 100,000 events per second • Automate – Page layout – Navigation – Presentation – Targeted advertising • • • • • Real-time network monitoring Quality of service monitoring Location-based services Fraud detection Intrusion detection • • • • Battlefield control Monitoring of resource locations Intrusion detection Network traffic analysis – Emails – Network traffic – Watch lists – Financial movements • Asset monitoring • Aggregation of machine-based sensor data • Generation of alerts in error conditions • Identifying the “golden batch” • • • • • Real-time monitoring Managing player interest Website traffic analysis Detecting and eliminating undesired behaviors Understanding behavioral patterns • • • • Patient management Outbreak management Trend detection Insurance risk analysis • • • • Vehicle management Supply chain forecasting and tracking Maritime logistics GPS tracking • Monitoring – Consumption – Variations • Detecting outages • Smart grid management • Aggregating data across the grid • Gaming machine event analysis • Card table analysis – Fraud detection – Profit and loss in real-time • Targeted advertising – Player behavior – Loyalty system implementation