Big Data: A Door Open for Financial Innovations ? Steve Wilcockson Industry Manager – Financial Services © 2014 The MathWorks, Inc.1 Financial Services: Big Data, Machine Learning and Model Integration. 2 Your Datasets: How large ? “Garbage in, garbage out – data quality is key” – tier one investment bank “We encounter more challenges with simulated data, than with real data.” – wealth manager. 3 Machine Learning: Data-Driven Univariate Pie chart, Histogram, etc… Multivariate Feature selection and transformation Exploration K-means Machine Learning Partitive Gaussian mixture model Hierarchical SOM Clustering Discriminant Modelling Decision Tree Classification Neural Network Regression Support Vector Machine 4 Advantages & Pitfalls: Machine Learning Investment Manager “I use Bayesian estimation, Markov Chain Monte Carlo, dynamic Bayesian networks, Hidden Markov Modelling and various classification algorithms: svms [support vector machines] and decision trees” Investment Banker “I developed and traded my own intra-day, trendfollowing G10 FX strategies, which used a unique combination of traditional machine learning algorithms (Neural and Bayesian Networks) with a Genetic Algorithm optimization wrapper” Portfolio Advisor “I use a range of machine learning classification algorithms to aggregate useful index, stock and economic information from which I build my portfolio strategies.” US Prop Trading Shop “We are going to use machine learning tools to analyze predictability in publically available daily stock returns.” Hedge Fund Prop Trading Firm “I risk-managed a guy who was terrible for over-fitting. His models were optimised to within an inch of his life and did not work out of sample. They were too oriented to the noise..” “I would like to hear your experience on the use of state space models in stat arb. I do believe they offer a superior way to model the equilibrium dynamically allowing it to evolve through time. The tricky part is how to deal with the risk of over fitting.” Systematic Fund Manager “No matter what cool algorithms we threw at the testbench and then live, simple linear modelling worked surprisingly well; we could understand the model, apply judgment over risk factors and model parameters. Far more satisfying” Fund Manager “I started to use state space models to get a framework for testing parameter stability to avoid over fitting.” 5 Flexible Research; Effective Implementation Application/ Data Servers Production: Take Algorithms to Data Research: Bring Data to Algorithms 6 Modelling and Model Implementation: Now Development and Model Testing Historical Data Modeling / Analysis Model Testing End of Day / Intraday Research / Algorithms Model Validation Files Model Development Databases Calibration Back-Testing Decision Engine Client Real-Time Feeds Models Web Approved Data Rules Spreadsheets Production Production Data 7 Modelling and Model Implementation: Emerging Development and testing Managed/ Consistent Data Modeling / Analysis Model Testing Research / Algorithms Model Validation Model Development End of Day / Intraday Calibration Files Back-Testing Databases Software Test Testing Unit Real-Time Derived Coverage Client User Contributed Text / Social Decision Engine Models Spreadsheets Web On-Demand Reporting/Vis Rules Production 8 Example 1: Map/Reduce in Research (Linear Regression / Machine Learning) 9 Example 2: Fraud Detection 10 Challenges & Opportunities: “Algorithms Everywhere” Cultural Clash/Marriage of Multiple Teams and Infrastructures – – – – Data Quants IT “The Business” Data-Driven Modelling and Equation-Based Modelling. – Dark Art / Cool Science; – Complexity / Simplification How will the Education Community Respond ? – Rise of Data Science – Multidisciplinary Collaboration – Project-based Learning 11 www.mathworks.co.uk/financeskills 12