Watson analytics secured content

advertisement
Data Analysis and
Visualization
Dr. Frank van Ham,
IBM Netherlands
Target Conference 2014, Groningen
Nov 4th, 2014
[insert obligatory ‘Big Data’ slide here]
“Hyper intelligent computer systems crunching
mega giga tera exa lots of bytes of data”
“No, no let’s not throw that away. I might
need that in the future”
Analytic algorithms to the rescue!
Our world in 20 years?
Descriptive statistics don’t always tell us everything about data
 = 7.5, 2 = 4.12, correlation = 0.81 and regression : y = 3 + 0.5x
Interpreting statistics is not a simple task for automated systems.
Analytic results should be used with care and supervision
“A computer systems let you make more mistakes faster than any
invention in human history –
with the possible exception of handguns and Tequila.”
(Mitch Radcliffe)
Big Data and Big Data analytics schematized
Verify /
Monitor
Human
“Model”
Influence
Analytic
Systems
(Statistics /
Heuristics)
Compute
Influence
Real world
Measure
Simplified
machine
“Model”
Big Data
world
Humans are slow at computing statistics, but
fast at contextualizing (though not
necessarily good).
+
Computers are bad at grasping context, but
very fast at computing statistics.
=
Humans can lead computers in the right
direction, with computers doing the “heavy
lifting”.
“The lame leading the blind” – J. Turcan
To work with our data reliably, we need to understand it
But unfortunately our data is inside a computer system….
To understand our data, we need to see our data
Visualization is not a cure all magic technology that allows
humans to instantly understand data…
Visualization is a medium to bridge the “last 50 cm” in data analysis.
Industry data tools trends : From Reporting to User-Driven analytics
Past
Current
Data warehouse
(Daily) Report
User
Data warehouse
Real-time
on demand Report
User
Analytics
User
Data warehouse
Future
Drive analytics
Visual Interface
Analytics
Algorithm
results
Industry data tools trends : from IT to Line of Business user
21st century Big Data BI will require tools that
• Deal better with our data
– Connect to data transparently in whatever form
– Can mash different data sources together intelligently
– Automatically clean and model our data where appropriate
• Deal better with us
– Are simple and flexible to query
– Communicate with us in human friendly ways
– Are smart enough to use best (business) practices
• Make analytics accessible to everyone
– Act as analytics based guides in our data
– Allow non-expert users work with analytic algorithms.
– Turn analytics into actionable insights
IBM Watson Analytics – IBM’s push into this area
Visit http://watsonanalytics.com and sign up for our free beta!
In summary
• To realize the possibilities of Big Data we need both
– Scalable infrastructure.
– Tools that allow us to make sense of all this data.
• Visualization and analytic algorithms are essential for data analysis.
– One does the heavy lifting
– One tells us where we’re going.
• Research/design problems to target, from a business perspective
–
–
–
–
Data-generic data visualization tools
Simplifying statistics output
Different input modalities
Pluggable analytic algorithms
Please Note
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product
direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract. The development, release, and
timing of any future features or functionality described for our products remains at our sole
discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience
will vary depending upon many factors, including considerations such as the amount of
multiprogramming in the user’s job stream, the I/O configuration, the storage configuration,
and the workload processed. Therefore, no assurance can be given that an individual user
will achieve results similar to those stated here.
Thank you!
Questions? Remarks?
frankvanham@nl.ibm.com
Download