POLYNOMIAL TIME HEURISTIC OPTIMIZATION METHODS APPLIED TO PROBLEMS IN COMPUTATIONAL FINANCE Ph.D. dissertation of Fogarasi Norbert, M.Sc. Supervisor: Dr. Levendovszky János, D. Sc. Doctor of the Hungarian Academy of Sciences Department of Telecommunications Budapest University of Technology and Economics Budapest, 2014 May 20 1 Outline of Presentation • Introduction • • Motivation: Computational Finance and NP hard probems My contributions • Thesis Group I. Mean reverting portfolio selection • Thesis Group II. Optimal scheduling on identical machines • Summary of results and real world applications 2 Computational Finance and NP hard problems • • • • • Relatively new branch of computer science (Markowitz 1950s Modern Portfolio Theory. Nobel Prize in 1990) Numerical methods and algorithms with huge focus on applicability (quantitative study of markets, arbitrage, options pricing, mortgage securitization) Recent focus: Algorithmic trading, quantitative investing, high frequency trading Post the 2008 financial crisis financial services industry has faced new challenges: • Regulatory pressure (timely reporting, transparency) • High-frequency trading (flash crashes) • Unprecedented attention on cost and efficiency Focus of interest: Finding quick (polynomial time) approximate solutions to difficult (exponential, NP hard) in order to pave the way towards a safer financial world 3 Computational Finance Open Issues Challenges Real-time portfolio identification NP hard problems which need fast suboptimal solutions! My Contribution Polynomial time approximation using stochastic optimization Overnight Monte-Carlo risk calculation scheduling Polynomial time heuristic scheduling algorithms 4 My Contribution (cont’d) • Finding polynomial time approx solutions to NP hard problems: • Mean Reverting Portfolio selection (Thesis Group I) • Task Scheduling on Identical machines (Thesis Group II) • Show measurable improvement to existing approximate methods • Prove practical applicability in real world settings • Have very quick runtime characteristics for high frequency trading, timely regulatory reporting and hardware cost savings • 5 refereed journal publications, 1 conference presentation 1. Fogarasi, N., Levendovszky, J. (2012) A simplified approach to parameter estimation and selection of sparse, mean reverting portfolios. Periodica Polytechnica, 56/1, 21-28. 2. Fogarasi, N., Levendovszky, J. (2012) Improved parameter estimation and simple trading algorithm for sparse, meanreverting portfolios. Annales Univ. Sci. Budapest., Sect. Comp., 37, 121-144. 3. Fogarasi, N., Tornai, K., & Levendovszky, J. (2012) A novel Hopfield neural network approach for minimizing total weighted tardiness of jobs scheduled on identical machines. Acta Univ. Sapientiae, Informatica, 4/1, 48-66. 4. Tornai, K., Fogarasi, N., & Levendovszky, J. (2013) Improvements to the Hopfield neural network solution to the total weighted tardiness scheduling problem. Periodica Polytechnica, 57/1, 1-8. 5. Fogarasi, N., Levendovszky, J. (2013) Sparse, mean reverting portfolio selection using simulated annealing. Algorithmic Finance, 2/3-4, 197-211. 6. Fogarasi, N., Levendovszky, J. (2012) Combinatorial methods for solving the generalized eigenvalue problem with cardinality constraint for mean reverting trading. 9th Joint Conf. on Math and Comp. Sci. February 2012 Siofok, Hungary 5 Summary of numerical results on real world problems Field Portfolio optimization Schedule optimization Real world problem Average performance of traditional approaches Average performance of the proposed new method Impact on computational finance (improvement in percentage) Convergence trading on US S&P 500 stock data 11.6% (S&P 500 index return) 34% 22.4% Morgan Stanley overnight scheduling problem 24709 (LWPF performance) 22257 (PSHNN performance) 10% 6 Thesis Group I. Mean reverting portfolio selection • Modern Portfolio Theory (MPT) – maximize expected return for a given amount of risk • Profitability vs. Predictability • Mean-reverting portfolios have a large degree of predictability • Therefore, we can develop profitable convergence trading strategies (~35% annual return on portfolio selected from SP500) 7 Intuitive task description Asset prices – multi-dimensional time series x1 x2 x3 optimal linear combination with card constraint exhibiting mean reversion My contribution: Developing novel algorithms for identifying mean reverting portfolios with cardinality constraints, trading and performance analysis Trade with mean reverting portfolio sell profit sell profit buy 8 Thesis Group I. Problem Description How to identify mean reverting portfolios based on multivariate historical time series? Constraint: Sparse portfolio (limited transaction costs, easier to understand/interpret strategy) d’Aspremont, A.(2011) Identifying small mean-reverting portfolios. Quantitative Finance, 11:3, 351-364 (Ecole Polytechnique, Paribas London, Phd-Stanford, Postdoc-Berkeley, Princeton) 9 Thesis Group I. - The model si (t ), i 1,..., n xi , i 1,..., n price of asset i at time instant t quantity at hand of asset i x x1 ,..., xn portfolio vector n p(t ) x j s j (t ) j 1 Mean reversion: p(t) is an Ornstein – Uhlenbeck process dp(t ) p(t ) dt dW t t p (t ) p 0 e t 1 e t e t s dW s 0 Key parameter: fast return to the mean smallest uncertainty in stationary behaviour CHALLENGE: x opt : m ax x 10 The discrete model - VAR(1) First degree vector autoregressive process st Ast 1 w t where G E st sTt wt N 0, K sTt x sTt 1Ax w Tt x x : E xT AT st 1sTt 1Ax xopt : max x x E xT st sTt x max x xT AT GAx xT Gx E xT AT st 1sTt 1Ax E xT st sTt x xT AT GAx max x xT Gx 11 Optimal portfolio as a generalized eigenvalue problem xopt : max x x max x E xT AT st 1sTt 1Ax E xT st sTt x xT AT GAx max x xT Gx under the constraints of x 1; card x k sparse AT GAx Gx; max. eigenvalue det AT GAx G 0 max. root Problem: develop a fast solution to the generalized eigenvalue problem under the cardinality constraint – NP hard Poly time ?? 12 Thesis I.1 Estimation of Model Parameters • Given nxT historical VAR(1) data st we need to estimate A, K (covar matrix of W) and G (covar matrix of st) st Ast 1 Wt • A and K can be estimated using max likelihood T ˆ : min A st 1 Ast A 2 t 1 • G can be estimated using sample covariance. Classical research focuses on regularization techniques (Dempster 1972, Banerjee et al 2008, d’Aspremont et al 2008, Rothman et al 2008) T 1 T ˆ : G s s st s , 1 t T 1 t 1 13 Thesis I.1 Estimation of covariance • My novel approach: use sample covariance and an iterative recursive estimate in tandem to approximate G. • From definition of VAR(1), we have the Lyapunov relationship in the stationary case T G A GA K • However, the solution may be non-positive definite so we introduce a numerical method that ensures positive definiteness G(k 1) G(k ) (G(k ) A G(k ) A K ) T • Start with G(0)=sample covariance • Also gives a goodness of model fit ˆ G ˆ : G 1 2 2 14 • Close to 0 for generated VAR(1) data, shows how well VAR(1) assumption works for real data. Thesis I.1 Numerical results 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 ˆ G ˆ G 1 2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 vs. t for n=8, σ=0.1, 0.3, 0.5, generating 100 independent time series for each t and plotting the average norm of error 15 Cardinality reduction by exhaustive search T T x AGA x xopt : max x xT Gx Big space NxN KxK A, G A, G Asset selection by dimension reduction (only a few assets) Eigenvalue solver in the small space satisfying the cardinality constraint xT AGAT x xopt : max x xT Gx Small space Solution with the required cardinality New selection, by visiting all the states (sparse portfolio) fulfilling the cardinality constraints N! O K !( N K )! complexity, is there any better solution? 16 Polynomial Time Heuristic Approaches • Greedy Method (d’Aspremont 2011) On each iteration, consider adding all remaining n-k dimensions and choose the one that yields the largest max eigenvalue. • Let Ik be the set of indices belonging to the k non-zero components of x. T l1 arg max • A GA G ii ii , i [1, n] On each iteration, we consider adding all remaining n-k dimensions and we choose the one that yields the largest max eigenvalue. xT AT GAx c ik 1 arg max max , where J I i k T N c x R : x 0 x Gx Ji iI k • i Amounts to solving (n − k) generalized eigenvalue problems of size k + 1. Polynomial runtime: O n 4 17 Polynomial Time Heuristic Approaches • Truncation Method (Fogarasi et al 2012) Compute unconstrained solution then use k heaviest dimensions to solve the constrained problem. Super fast heuristic (only 2 eigenvalue computations) 35 Greedy Truncation 30 CPU Runtime (sec) 25 20 15 10 5 0 10 18 20 30 40 50 60 Cardinality 70 80 90 100 Thesis I.2: Novel approach Application of SA by random projection • Restrict the portfolio vector x to have only integer values of which only k are non-zero. • Consider the Energy function to be minimized: wT AGAT w E w wT Gw • At each step of the algorithm, we consider a neighboring state w' of the current state w and decide between moving or staying 1 if E( w n ) E ( w ') P w n 1 w ' E ( w ') E ( w n ) T if E(w n ) E (w ') e • At each step, a random projection of the vector is performed to an appropriate subspace 19 Thesis I.2: Novel approach Application of SA by random projection • Cardinality constraint can easily be built into the neighbor function • Starting point can be selected as Greedy solution • Memory feature can be built in to ensure solution is at least as good as starting point • Periodic revert to starting point improves performance • Cooling schedule can be set to be fast enough for the specific application • Procedure can be stopped at any point or an adaptive stopping condition has been developed. 20 Thesis I.2 Numerical Results • For n=10, k=5 Greedy and SA find theoretical best in 70% of cases, but in 11% of the remaining 30%, SA outperforms Greedy. • For larger problem sizes, SA performs even better (eg. for n=20, k=10 it outperforms Greedy 25% of the cases) 300 Exhaustive Greedy Simulated Annealing Truncation 250 Mean Reversion 200 150 100 50 0 21 1 2 3 4 5 6 Cardinality 7 8 9 10 Thesis I.2 Runtime Analysis • Truncation method: sub-second portfolio selection, can be used in realtime algorithmic trading • Greedy: seconds to compute, can be used in intraday trading • Simulated Annealing: minutes to compute, improves upon Greedy, can be used to finetune intraday trading • Exhaustive: impractical for n>20, can be used for low frequency trading 1500 CPU Runtime (sec) Exhaustive Greedy Sim Ann Truncation 1000 500 0 5 10 15 Cardinality 20 25 30 CPU runtime (in seconds) versus total number of assets n, to compute a full set of sparse portfolios, with cardinality ranging from 1 to n 22 Thesis I.3 Portfolio mean estimation • Given historical portfolio valuations pt and assuming it follows O-U process, estimate μ. Classical methods in literature • • Sample mean estimate 1 ˆ1 : T T p t 1 Least squares regression pt 1 apt b • t b ˆ 2 : 1 a Max likelihood estimator (numerically complex) I developed a novel mean estimation method based on “pattern matching” and decision theory 23 Thesis I.3 Novel portfolio parameter estimation using pattern matching • Starting from definition of Ornstein-Uhlenbeck process: dp(t ) p(t ) dt dW t p (t ) p 0 e t 1 e t t e 0 t s dW s (t ) (0) et • Taking expected value of the above: • Use max likelihood estimation techniques to decide which pattern they match the most, and determine long term μ U t ˆ3 : t i 1 j 1 1 0 2e i j e i e j 2p j e i 1 i, j 2 U e t t i 1 j 1 1 i, j i j . e i e j 1 where U is the time correlation matrix of pt • This estimate is more accurate than sample mean and more resilient to small λ than linear regression. 24 Thesis I.4 Simple Convergence Trading Model • We are deciding whether μ(t)< μ by only observing p(t) using an approach based on decision theory • We can use this simplified model to prove the economic viability of our algorithms and compare them to each other. 25 Thesis I.4 Simple Convergence Trading Model If process p (t ) is in stationary state then the samples generated by a Gaussian distribution 2 p(t ), t 1,..., T are N , 2 As a result, for a given rate of acceptable error ε , we can select an α for which (u ) 1 e / du 1 . As such, having observed the sample p(t ) , 2 2 2 2 / 2 it can be said that we accept the stationary hypothesis which holds with probability1 . Thus the trading strategy can then be summarized as follows Observed sample Accepted hypothesis p (t ) (t ) p(t ) (t ) p(t ) , (t ) Error probability 2 / 2 2 2 / 2 2 e 1 1 1 e ( u )2 2 / ( u )2 2 / 1 2 / 2 2 e Action (Cash / Portfolio) du / 2 Buy / Hold du / 2 No Action / Sell ( u )2 2 / du No Action / Sell 26 Thesis Group I. S&P500 Test • Consider the 500 stocks that make up the SP500 during 20092010 and select the K=4 stock portfolio to maximize meanreversion. • Repeat for 250 trading days (1 year) • SP500 went up by 11.6%, our method generates 34% return • Minimum, maximum, average and final portfolio values starting from 100%. SP500 Convergence Trading results 155.00% 145.00% 135.00% 125.00% G_min G_max 115.00% G_avg 105.00% G_final 27 95.00% 85.00% 75.00% L=3 Greedy L=3 Sim Ann L=4 Greedy L=4 Sim Ann Thesis Group I. Summary Thesis I.1 Thesis I.2 Thesis I.3 Thesis I.4 New numerical method for estimating covariance matrix of VAR1 process Adopted simulated annealing to probl of maximizing mean reversion under cardinality constraint Novel mean estimation technique for OU processes using pattern matching Simple trading strategy based on decision theoretic formulation Periodica Polytechnica 2011 Algorithmic Finance 2013 Annales Univ Sci Bp 2012 Joint Conf on Math and Comp Sci 2012 28 Thesis Group II. Optimal Scheduling • Complex portfolios are evaluated and risk managed using MonteCarlo simulations at many financial institutions (eg. Morgan Stanley) • Future trajectories of market variables are simulated and portfolio value/risk is evaluated on each trajectory, then a weighted avg is used • Each night a changed portfolio needs to be evaluated/risk managed with new market data/model parameters • Need a quick way to schedule 10000’s of jobs on 10000’s of machines in a near optimal way • Why? $10M/year spend on hardware, timely response to clients and regulators regarding portfolio values and VaR. • My novel method saved 53 minutes on top priority jobs running for 12 hours overnight, compared to the next best heuristic. 29 Thesis Group II. Problem Formulation • Scheduling jobs on a finite number (V) of identical processors under constraints on the completion times • Given n users/jobs of sizes • Cutoff times K K1 , K2 ,..., Kn Nn • Weights/priorities w w1 , w2 ,..., wn Rn • Scheduling matrix: C 0,1 • Where • Jobs can stop/restart on different machine (preemption) • For example V=2, n=3, x={2,3,1}, K={3,3,3}. x x1 , x2 ,..., xn Nn nm Ci , j 1 if job i is processed at time step j. Time Steps 1 0 1 C 1 1 1 0 1 0 Jobs 30 Thesis Group II. Problem Formulation • Define “Tardiness” as • where • Ti max(0, Fi Ki ) Fi : arg max Ci , j 1 is the finishing time of job i as per C. j Minimizing Total Weighted Tardiness (TWT) is stated as N Copt : arg min wiTi C • Under the following constraints: L C j 1 • i 1 i, j N C xi , i 1,..., N i 1 i. j V , j 1,..., L For example V=2, n=3, x={2,3,2}, K={3,3,3}, w={3,2,1} All jobs cannot complete before their cutoff times, but the optimal TWT solution is: Time Steps 1 0 1 0 C 1 1 1 0 0 1 0 1 31 Jobs Heuristic Approaches to TWT • 1990 Du and Leung prove that TWT is NP-hard • 1979 Dogramaci, Sulkis – simple heuristic • 1983 Rachamadugu – myopic heuristic, compares to Earliest Due Date (EDD) and WSPT (Weighted Shortest Processing Time) • 1998 Azizoglu – branch and bound heuristic, too slow > 15 jobs • 1994 Koulamas – KPM algorithm • 2000 Armentano – tabu search • 1995 Guinet – simulated annealing, lower bound • 2002 Sen, 2008 Biskup – Surveys of existing methods • 2000 – Artificial Neural Network approach to scheduling problems • 2004 Maheswaran – Hopfield Neural Network approach to single machine TWT on a specific 10-job problem. 32 Thesis II.1 Novel Approach: TWT to QP • HNN are a recursive Neural Network which are good for solving quadratic optimization problems in the form 1 f y y T Wy bT y , 2 • Our task is to transform the TWT to a quadratic optimization problem. L min wi Ci , j i 1 j Ki 1 N L i : Ci , j xi min Ci , j xi C j 1 i 1 j 1 L N N j : Ci. j V min Ci , j V i 1 i 1 j 1 N L 2 2 33 Thesis II.1 Novel Approach: TWT to QP Move constraints to objective function: 2 L N L 2 min E (C) min w j C jl Ci , j xi Ci , j V j 1 l K j i 1 j 1 i 1 j 1 J • L N 2 Each member of the above addition can be converted to quadratic Lyapunov form separately to bring the expression into the form 1 f y y T Wy bT y , 2 W WA WB WC R NL NL b b A b B bC R NL1 34 Thesis II.1 Novel Approach: TWT to QP Results of the matrix conversions: b A 0JL1 , b B 2 x11L D1 0 WA 2 0 D2 0 0 x21L xJ1L 0 0 D J 0K j K j Dj 0L K K j j 0 1LL 0 1LL WB 2 0 0 0 0 1LL I M M D 0 L M M bC VM 1, 0LM 1, VM 1, 0LM 1,, VM 1, 0LM 1 , D D D D WC 2 D D RLL w j * I L K j L K j 0K j L K j D D D 0M L M 0L M L M 35 Thesis II.2 Applying HNN • Hopfield (1982) proved that the recursion N ˆ y i (k 1) sgn Wij y j (k ) bˆi , i mod N k , j 1 • converges to its fixed point, so minimizes a quadratic Lyapunov function N 1 N N ˆ 1 ˆ L (y) : Wij yi y j yibˆi yT Wy bˆ T y. 2 i 1 j 1 2 i 1 • I implemented this in MATLAB, including systematic selection of the heuristic constants α,β and γ. I also developed algorithms to validate and correct the resulting schedule matrix if needed. 36 Thesis II.2 HNN outperforms other simple heuristics For each problem size (# of jobs) 100 random problems were generated and the average TWT was computed and plotted Results 1000 HNN Random EDD LWPF WSPT LBS 900 Averaged TWT 800 700 600 500 400 300 200 100 5 10 15 20 25 30 35 40 45 50 55 60 Number of jobs 65 70 75 80 85 90 95 100 37 Thesis II.2 HNN outperforms other simple heuristics Outperformance is consistent over a broad spectrum of problems over simple heuristics in literature (LWPF – Largest Weighted Process First, WSPT – Weighted Shortest Processing Time, EDD – Earliest Due Date) HNN performance vs EDD, WSPT, LWPF 90 HNN/EDD HNN/WSPT HNN/LWPF 80 Average % 70 60 50 40 30 20 5 10 15 20 25 30 35 40 45 50 55 60 Number of jobs 65 70 75 80 85 90 95 100 Job size 5 10 15 20 30 40 50 75 100 % outperf 99.9 100 100 99.5 99.2 99.6 99.3 98.6 98.8 38 Thesis II.3 Further improving HNN Smart HNN (SHNN) • Use the result of Largest Weighted Path First (LWPF) as starting point for HNN rather than random starting points • Speeds up HNN due to single starting point, but still require multiple iterations due to setting of heuristic constants. Perturbed Smart HNN (PSHNN) • Consider random perturbations of LWPF as starting point to HNN, in order to avoid getting stuck in local minima 39 Thesis II.3 Further improving HNN Perturbed Largest Weighted Path First (PLWPF) • Simple, but surprisingly well performing heuristic • The idea is to avoid getting stuck in local minima by trying starting points near LWPF solution Lyapunov function (TWT) Local minimum Global minimum Initial state of HNN recursion 40 State of the HNN (C matrix) Thesis II.3 Further improving HNN • For small job sizes, we compare performance to the theoretical best: exhaustive search over 100 randomly generated problems per job size • PSHNN consistently outperforms other methods, but there is room for improvement 41 Thesis II.3 Further improving HNN • For small job sizes, we compare performance to the theoretical best: exhaustive search over 100 randomly generated problems per job size • PSHNN outperforms other methods by increasing margin as job size grows 42 Thesis Group II. Practical Application • Monte Carlo simulation based risk calculations scheduling at Morgan Stanley overnight for trading and regulatory reporting • 100 portfolios, 556 jobs, 792 seconds average size • 7% improvement over HNN, 10% over LWPF (best method in literature prior to my study). • 53 minutes saved on top 3 priority jobs compated to next best heuristic 8 9 10 SUM Increment to PSHNN Weight 3 4 5 6 7 PSHNN 4401 11116 4020 1620 1092 8 0 0 22257 0% PLWPF 3513 9624 5130 1788 490 312 2304 190 24019 5% HNN 4404 11040 4735 1824 1092 456 468 0 24019 7% LWPF 4404 11140 5470 2472 1183 40 0 0 24709 10% EDD 4401 9940 1770 636 1134 464 22752 1430 42527 48% 43 Thesis Group II. Optimal Scheduling Thesis II.1 I converted TWT problem to quadratic form including the constraints with heuristic constants Acta Univ Sapientiae 2012 Thesis II.2 I applied the Hopfield Neural Net (HNN) and found approximate solutions in polynomial time I showed that HNN solution outperforms other simple heuristics on large set of random problems Acta Univ Sapientiae 2012 Thesis II.3 I improved HNN by intelligent selection of starting point and random perturbations 44 Periodica Polytechnica 2013 Numerical results on real world problems Field Portfolio optimization Schedule optimization Real world problem Average performance of traditional approaches Average performance of the proposed new method Impact on computational finance (improvement in percentage) Convergence trading on US S&P 500 stock data 11.6% (S&P 500 index return) 34% 22.4% Morgan Stanley overnight scheduling problem 24709 (LWPF performance) 22257 (PSHNN performance) 10% 45 Summary of my Contribution Managed find a generic Provedtothe practical This can speed up approach to approximating effectiveness and financial NP hard problems in applicability on real world calculations and polynomial problems for 2time veryusing difficult their scheduling heuristic methods open problems Provides faster, more timely data to banks, clients and financial regulators improves society as a whole 46 Thank You For Your Attention! Questions and Answers 47 Questions and Answers Q: Regarding the description of the HNN, the state transition rule is asynchronous, i.e. only one of the state variables (elements of vector y) is updated. What was the reason of using only asynchronous update instead of testing also synchronous one, which later would be more suitable for massively parallel implementations? A: • Synchronous updating implies updating nodes at exactly the same time (requires a “global clock tick” – unrealistic for biological/physical applications. R.Rojas: Neural Networks, Springer-Verlag, Berlin, 1996) • On CPU’s only a “quasi-synchronous” implementation is possible (see next slide) • Due to the inherent sequential updating and the storage/copying overhead, this implementation is slower than asynchronous updating on CPU’s • For hardware-level implementation, indeed synchronous updating is faster, but this was not available and therefore, I put this beyond the scope of my dissertation (see pg. 57, paragraph 1) 48 Questions and Answers (2) (1) y1 (k 1) ... yl (k 1) ... yN (k 1) ... y1 (k ) l = modNk ... cyclic yl (k ) asynchronous ... yN (k ) y1 (k 1) y1 (k ) truly ... ... synchronous yl (k 1) yl (k ) hardware ... ... impl. yN (k 1) yN (k ) (3) ... y1 (k ) ... yl (k ) ... yN (k ) y1 (k 1) ... “quasi -synchronous” yl (k 1) ... yN (k 1) 49