Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4th, 2014 [insert obligatory ‘Big Data’ slide here] “Hyper intelligent computer systems crunching mega giga tera exa lots of bytes of data” “No, no let’s not throw that away. I might need that in the future” Analytic algorithms to the rescue! Our world in 20 years? Descriptive statistics don’t always tell us everything about data = 7.5, 2 = 4.12, correlation = 0.81 and regression : y = 3 + 0.5x Interpreting statistics is not a simple task for automated systems. Analytic results should be used with care and supervision “A computer systems let you make more mistakes faster than any invention in human history – with the possible exception of handguns and Tequila.” (Mitch Radcliffe) Big Data and Big Data analytics schematized Verify / Monitor Human “Model” Influence Analytic Systems (Statistics / Heuristics) Compute Influence Real world Measure Simplified machine “Model” Big Data world Humans are slow at computing statistics, but fast at contextualizing (though not necessarily good). + Computers are bad at grasping context, but very fast at computing statistics. = Humans can lead computers in the right direction, with computers doing the “heavy lifting”. “The lame leading the blind” – J. Turcan To work with our data reliably, we need to understand it But unfortunately our data is inside a computer system…. To understand our data, we need to see our data Visualization is not a cure all magic technology that allows humans to instantly understand data… Visualization is a medium to bridge the “last 50 cm” in data analysis. Industry data tools trends : From Reporting to User-Driven analytics Past Current Data warehouse (Daily) Report User Data warehouse Real-time on demand Report User Analytics User Data warehouse Future Drive analytics Visual Interface Analytics Algorithm results Industry data tools trends : from IT to Line of Business user 21st century Big Data BI will require tools that • Deal better with our data – Connect to data transparently in whatever form – Can mash different data sources together intelligently – Automatically clean and model our data where appropriate • Deal better with us – Are simple and flexible to query – Communicate with us in human friendly ways – Are smart enough to use best (business) practices • Make analytics accessible to everyone – Act as analytics based guides in our data – Allow non-expert users work with analytic algorithms. – Turn analytics into actionable insights IBM Watson Analytics – IBM’s push into this area Visit http://watsonanalytics.com and sign up for our free beta! In summary • To realize the possibilities of Big Data we need both – Scalable infrastructure. – Tools that allow us to make sense of all this data. • Visualization and analytic algorithms are essential for data analysis. – One does the heavy lifting – One tells us where we’re going. • Research/design problems to target, from a business perspective – – – – Data-generic data visualization tools Simplifying statistics output Different input modalities Pluggable analytic algorithms Please Note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Thank you! Questions? Remarks? frankvanham@nl.ibm.com