Image 3 Part 1 literature review This part of the study has two sections. The first section explores the definition of machine learning (ML) and provides an extensive explanation of the ML applications in finance. The second section firstly try to answer the research question that how to use ML for stock selection? Then, after searching the best LM algorithms that can be used for the stock selection, it explains the deep learning algorithm in details Image 4 Section 1 : Machine Learning This section aim to define machine learning, investigate ML application and explain relation between data and algorithm types. Finance application of ML In this section we will define ML and then explain the theoretical background of applications categorized under supervised and unsupervised learning algorithms. We will also provide information about reinforcement learning and deep learning. What is ML Like statistical method, machine learning techniques analyse observations to disclose some underlying process, yet both techniques diverge in their assumptions, terminology and techniques. Relying on foundational assumptions and explicit models of structure, statistical approaches can fail because of these priori restrictive assumptions. However, ML used a large amount of data without any restrictive assumptions and extract information. By exploring the underlying structure of data and learning from known examples, ML automate the decision- making process (DeRose & Le Lannou, 2020) Image 5 ML in finance has three categories; supervised, unsupervised and reinforcement learning. To analyse inputs (X’s), unsupervised ML does not make use of target (Y) data, in other words, it does not require labelled target or dependent variable (DeRose & Le Lannou,2020). The principal component analysis is a clustering techniques classified under unsupervised ML and is used for portfolio selection (Dixon & Halperin, 2019). Supervised learning is used with labelled data. The dependent variable, also called the target variable (Y) and the independent variables (X’s), also named as features, are used in supervised learning. Figure 1 depicts the process of supervised learning. Image 7 Overview of Supervised learning (Source: DeRose & Le Lannou,2020) Under supervised learning both Y and Xk data are split into two groups, train and test data. After the model is trained by the training data set, test data used in the ML model to compared model fit (Ypredict) with the actual data (Yactual). (DeRose & Le Lannou,2020). Regression and classification are the main groups in supervised learning that are often used for building predictive models. Linear regression, decision trees and artificial Neural Networks (ANNs) are examples of linear regression model that predict continuous variables. Support vector Machines (SVMs), logistics regressions, K-Nearest Neighbours (KNN) are classification techniques used with discrete data (Emerson et al. 2019). In the AI field, some distinguished algorithms can not be categorised under supervised or unsupervised learninf. Deep learning uses comples algorithms to overcome sophisticated image classification, face recognition, speech recognition and natural language problems. While deep learning is learning from the training data, reinforcement learning interacts with the computer and adjust its action to maximize a reward (DeRose & Le Lannou,2020). Generalizing dynamic programing, reinforcement learning is a complex but the most impactful method on trading and investment management. The main areas of reinforcement learning are derivative pricing, optimal hedging, Merton’s portfolio problem and optimal trade executions (Dixon & Halperin, 2019). ANNs, also called NNs, are the alogorithm that both deep learning and reinforcement learning built on. Although being regularly used for regression and classification. ANNs which have Diversification is one of the first rules of investing, as long as the assets you're investing in aren't highly correlated with each other The notion that a portfolio shared between stocks and bonds lowers the degree of risk collides with the empirical evidence of the markets. This holds even more true after the collapse due to the coronavirus. A financial portfolio shouldn’t be judged only by performance, but it also should be judged by the balance between risk and return in the long term. A relationship that, by its nature, is precarious and changeable and therefore must be periodically sought through the investment-rebalancing activity. Looking at the tables below, you can easily see that Morningstar categories equity and bond funds - have generally increased their correlation over the last year. There are no longer negative correlations, nor any close to zero (the lowest, 0.12, is the one between euro government-bond funds and Japan large-cap equities). USD diversified bond funds - also partially euro government-bond funds - are the only ones that have guaranteed low correlations with equity categories in the last 12 months, although these are decidedly higher compared with those registered over three and five years. It should be noted that these two categories have also seen their correlation rate with other groups of fixed-income funds increase significantly. The rate between euro government bonds and USD high-yield bonds, for example, went from 0.30 over five years to 0.68 over last year. Calculating your portfolio’s correlation coefficient is a rather complex exercise. To get a general idea, we calculated the correlation coefficients of the 15 main Morningstar categories, over one, three, and five years, to the end of September 30. The correlation coefficient measures how the performance of one instrument affects the performance of another: it varies between negative 1 and positive 1. A coefficient of 0 indicates that there is no correlation between the two funds. A coefficient of 1 indicates that there is a perfect positive correlation, which means that the two instruments move together: if one rises by 10%, the other does too, and vice versa. Obviously, in the case of perfect negative correlation (equal to negative 1) the ratio is inverse: if the first rises by 10%, the second loses 10%.