PLAYING CARTPOLE GAME TO TRADING STOCKS 19I603 ARTIFICIAL INTELLIGENCE NITHYAPRIYAA V (19I236) PAVITHRA SHRI S (19I240) RUBASHREE R (19I250) SANDIYAA B (19I252) VIVITHA L E (19I261) BACHELOR OF TECHNOLOGY Branch: INFORMATION TECHNOLOGY Of Anna University MAY 2022 DEPARTMENT OF INFORMATION TECHNOLOGY PSG COLLEGE OF TECHNOLOGY (Autonomous Institution) COIMBATORE – 641 004 i LIST OF FIGURES Figure no Title Page no 1 Flow Chart of Proposed Work 8 2 Reinforcement Learning Environment 9 3 Training Output at the end of First Episode 10 4 Sample Output 10 5 Profit Over Training 11 ii CONTENTS Title Page no Abstract 1 Introduction 1 Contribution Made 1 Literature Survey 2 Objective 7 Proposed Methodology 7 Block Diagram 8 Experimental Results with Tabulations and Visualization 9 Inference 10 Conclusion and Future Work 11 References 11 iii Abstract: The buying and selling of shares in a specific company is referred to as stock trading; if you own the stock, you own a piece of the company. While trading individual stocks can result in quick profits for those who time the market correctly, it also carries the risk of large losses. A single company's fortunes can rise faster than the market as a whole, but they can also fall just as quickly. An active trader is one who makes 10 or more trades per month. They typically employ a strategy that heavily relies on market timing, attempting to profit from short-term events (at the company level or based on market fluctuations) in the coming weeks or months. Day trading is a strategy used by investors who play with stocks, buying, selling, and closing positions in the same stock on the same trading day, with little regard for the underlying businesses. (Position refers to how much of a particular stock or fund you own.) The goal of a day trader is to make a few dollars in the next few minutes, hours, or days based on daily price fluctuations. Keywords: Reinforcement learning, DQN algorithm, stock market prediction. Introduction: The stock market is basically an aggregation of various buyers and sellers of stock. A stock (also known as shares more commonly) in general represents ownership claims on business by a particular individual or a group of people. The attempt to determine the future value of the stock market is known as a stock market prediction. Stock market prediction is the process of trying to determine the future worth of any stock. Use of ML and DL techniques were so popular among researchers and organizations to perform complex tasks like automating stock market prediction and investment. However, because of the dynamic nature of the market these algorithms weren't able to perform predictions with higher accuracy in the real world. Therefore, researchers started applying reinforcement-learning (RL) techniques in stock market prediction. Traditional RL algorithms explore an unknown environment and make an optimal decision by trial and error method. By this self-learning it can achieve human-level accuracy for doing a given task. RL agents always try to maximize the future reward by applying some action on the given environment. Considering this, researchers started applying RL algorithms in stock market prediction problems that show remarkable success in that domain. The learning task for stock market prediction is challenging. Learning only from historical data doesn't help because of its volatile nature, there is a need for the model to learn continuously. In this paper we suggested an state-of-the-art architecture that learns the stock market continuously. This paper adopts the deep deterministic policy gradient reinforcement learning algorithm. Multiple models created, learn from historical data every day. Reinforcement learning (RL) is a branch of machine learning that studies how software agents should behave in a given environment in order to maximise a metric of cumulative reward. Along with supervised and unsupervised learning, reinforcement learning is one of three basic machine learning paradigms. The problem at hand is defined by the environment. This could be a computer game or a financial market in which to trade. A state is a vector that contains all important parameters that describe the environment at a given point (in time). This might be the entire screen with all of its pixels in a computer game. This could comprise current and historical price levels, financial indicators such as moving averages, macro economic information, and so on in a financial market. All aspects of the RL algorithm that interact with the environment and learn from them are 1 referred to as agents. In a gaming context, the agent could represent a game participant. The agent could represent a trader (trading bot) betting on rising or falling markets in a financial context. A single action from a (limited) set of options is available to an agent. Movements to the left or right in a computer game may be permissible activities, whereas going long or short in a financial market may be permissible. A reward (or penalty) is given depending on the activity taken by the agent. Points are a common reward in computer games. Profit (or loss) is a basic reward in the financial world. Contribution Made: Nithyapriyaa V Creating the Agent Pavithra shri S Evaluation of the Model Rubashree R Sandiyaa B Vivitha L E Training the Agent and report Dataset collection and report Implemention and report 1 Literature Survey: S.no Title Authors and publication year Proposed System Algorithm used 1. Stock Market Prediction and Investment using Deep Reinforcement Learning- a Continuous Training Pipeline Amritha Sharma R, Debjyoti Guha, Hitesh Agarwal, Kothiya Meetkumar Harshadbhai [2020] This paper proposes an agent-based Deep Deterministic Policy Gradient system to emulate professional trading methods, which is a state-of-the-art framework that can predict and make highreturn investments of customers' money. Furthermore, while dealing with trading strategies, the suggested architecture is built as a continuous training pipeline so that the model saved is up-to-date with current market patterns, resulting in improved prediction accuracy. Deep Reinforcement Learning, Artificial Neural Network, 2. Stock Price Prediction using Reinforcement Learning and Feature Extraction R. Sathya, Prateek Kulkarni, Momin Nawaf Khalil, Shishir Chandra Nigam [2020] The purpose of this Reinforcement project is to create a Sentiment new method for analysis predicting stock value through the use of Reinforcement Sentiment analysis and learning from social media .We will analyze a method for successfully predicting stock movement using Data that is both historical and current. 3. STOCK PRICE PREDICTION USING REINFORCEMENT LEARNING Jae Won idee [2001] This paper provides a strategy for predicting stock prices via reinforcement learning, which is suited for modelling and learning 2 reinforcement learning algorithm numerous types of interactions in real-world scenarios. The problem of stock price prediction is modelled as a Markov process that can be improved using a reinforcement learning approach. 4. Stock Trading Strategies Based on Deep Reinforcement Learning 5. A Deep Reinforcement Learning Approach to Stock Trading Yawei Li, Peipei Liu, Ze Wang[2022] Gran, Petter Kowalik; Holm, August Jacob Kjellevold; Søgård, Stian Gropen[2019] 3 This paper offers a deep reinforcement learning model for stock trading that analyses the stock market using stock data, technical indicators, and candlestick charts, as well as learning dynamic trading strategies. The agent in reinforcement learning makes trading decisions based on the properties of different data sources retrieved by the deep neural network as the status of the stock market. Deep neural network This paper investigates the feasibility and possibility of applying state-of-the-art Deep Reinforcement Learning for stock trading. We use a Deep Deterministic Policy Gradient (DDPG) in particular. We discovered that DDPG agents that use historical log return (R) and trading volume (TV) as predictors perform the best. In terms of mean return, the models exceed a buy-andhold benchmark across all markets. The DDPG agent consistently outperforms linear regressions. Deep Reinforcement Learning, Deep Deterministic Policy Gradients 6. Stock Market Prediction Using an Improved Training Algorithm of Neural Network Mustain Billah, Sajjad Waheed, Abu Hanifa [2016] An enhanced Levenberg Marquardt(LM) artificial neural network training technique is proposed in this paper. With previous historical stock market data from Dhaka Stock Exchange such as opening price, highest price, lowest price, and total share traded, an improved Levenberg Marquardt algorithm of neural network can predict the possible day-end closing stock price with less memory and time. Stock prediction, Neural Network, Training algorithm 7. Stock Market Prediction Using Machine Learning Algorithms K. Hiba Sadia, Aditya Sharma, Adarrsh Paul, SarmisthaPadhi, Saurav Sanyal [2019] This paper focus on data preprocessing of the dataset.They use machine learning methods like Random Forest and Support Vector Machines to estimate stock values. We proposed the "Stock market price prediction" system, and we used the random forest algorithm to predict the stock market price Machine learning algorithms like Random Forest and Support Vector Machines 8. Stock Market Prediction: Using Historical Data Analysis Vivek Kanade, Bhausaheb Devikar, Sayali Phadatare, Pranali Munde, Shubhangi Sonone[2017] Both fundamental and machine technical analyses are learning considered in this study. algorithm The sentiment analysis process is used to perform fundamental analysis on social media data. Today, social media data has a greater impact than ever before, and it can be useful in predicting stock market trends. Machine learning algorithms are used to do technical analysis on historical stock price data. The 4 association between attitudes and stock prices is then examined. 9. Reinforcement Learning in Financial Markets Meng, T.L.; Khushi, M All recent stock/forex prediction or trading publications that used reinforcement learning as their principal machine learning method were rigorously reviewed. When compared to the algorithms studied, transaction costs had a considerable impact on the profitability of reinforcement learning algorithms. 10. A Survey on Stock Market Prediction Using SVM Sachin Sampat Patil , Prof. Kailash Patidar, Asst. Prof. Megha Jain [2016] We provide a theoretical framework for predicting the stock market using the Support Vector Machines method. For further stock multivariate analysis, four company-specific and six macroeconomic elements that may influence the stock movement are first chosen. Second, Support Vector Machine is utilised to examine the relationship between these variables and forecast stock performance. Our findings imply that SVM is a useful technique for predicting stock prices in the financial market. 11. Machine Learning Approach In Stock Market Prediction RautSushrut Deepak, ShindeIshaUday, Dr. D. Malathi [2017] This paper presents a Machine Learning (ML) approach that will be trained using publicly available stock data, gain intelligence, and then use that intelligence to make accurate predictions. 5 Machine learning algorithm Machine Learning algorithm, Artificial Neural network Artificial Neural Network (ANN) was discovered to be the most practical consideration after a thorough examination of numerous algorithms and their suitability for various problem areas. The major strategy for forecasting results in this paper is a concept of machine learning, which was tested using the Bombay Stock Exchange (BSE) index data set. 12. Impact of Financial Ratios and Technical Analysis on Stock Price Prediction Using Random Forests Loke K.S. [2017] 6 Using quarterly financial Random ratio data from Hong Forest Kong corporations from 2011 to 2014, a stock movement prediction approach is given. Over numerous quarters, we discovered that the accuracy of price movement forecast utilising the Random Forest approach was fairly low. However, in the fourth quarter of 2014, we were able to predict with high accuracy, but not in previous years. Objective: The objective is to develop a trading BOT that predict when the market goes up and down compared with a game. Proposed Methodology: OpenAI Gym is a toolset that lets you train agents, compare them, and create new Machine Learning algorithms in a range of simulated environments (Atari games, board games, 2D and 3D physical simulations, and so on) (Reinforcement Learning). Import all the necessary Python libraries for modelling the neural network layers. Also import NumPy library and Time for basic operations, s`to create the reinforcement learning model. We import gym and we have observation space with low and high values. when we reset the environment we get into the initial state of the environment, we get cart position, cart velocity, pole angle and pole angular velocity as the output. Each time we reset the state gets different values. The cart position is randomized. there are only 2 actions ,0 and 1. Epsilon is the factor which specify the ratio between exploration and exploitation. Epsilon is one ,when we do exploration. So we just do randomised action at the minimum to set the epsilon at one. For replay ,the learning rate can be defaut 0.001 and memory is where the experience are stored and maximum reward is also stored to track it. Neural network which approximate the Qoptimal policy .And there are 24 hidden units and 2 dense layers so this constitues the DQL agent Whenever a random number generated between 0 and 1, is below the epsilon ,it takes the random action. otherwise it relies on DQL network. It is trained over and over ,The replay ,takes a batch size of 32 from the memory .In the Qlearning formula we add immediate reward and delayed reward to predict the future. This relies on bellman equation that leads to optimality .Here epsilon decay is present where the epsilon value is decreasing /there is main method called learn method, it iterates over a number of episodes and also we reset the environment every time and also we reshape we also take actions like move the environment one step forward, we have next state we append our experience to the memory and we go step by step. And when we are done ,we calculate the total reward and also track the average total reward to get the performance measure to perform some actions. when the score is more than 195 then new score is updated. And the test function we used is to just test the performance measure of the agent. We use the helper classes that mimic behaviour of the open AI gym we have observation space and action space and there is finance class where we reply on particular dataset from alif_eikon end data. The dataset is about euro US dollar exchange rate 7 Block Diagram: Fig 1 Flowchart of Proposed Work 8 Fig 2 Reinforcement Learning Environment Experimental Results with Tabulation and Visualizations Fig 3 cartpoal 9 Fig 4 cartpoal visualiztion Fig 5 Finance environment 10 Inference: The agent is set with an minimum accuracy of 50 % if it didnt reach that then it is not a intelligent agent , the data returned here is the price which is used as the features and the agent learns from the process whether the market goes up or down and other class provides the state and we reset the reward where the total reward is 0 in accuracy in the beginning in the step method the state balance is returned. So the reward is increased when the action is correct when the action is same as the market direction is right so when the market goes up the agent should also give the same output then the agent is intelligent then the reward is increased to 1 otherwise the reward is 0 and also calculate the accuracy. The agent stops when the reward value is 1 otherwise the agent learns again. When first 10 trades get complete trading bot is expected to have a minimum accuracy of 50% . After 20 trades it falls to an accuracy of 45%. Conclusion and Future Work: We have successfully implemented the trade bot for Euro US stock exchange. Future work includes implementation of trading bot using large dataset with the help of GPU. Reference [1] Amritha Sharma R, Debjyoti Guha, Hitesh Agarwal, Kothiya Meetkumar Harshadbhai,”Stock Market Prediction and Investment using Deep Reinforcement Learning- a Continuous Training Pipeline “,International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249-8958 (Online), Volume-10 Issue-2, December 2020 [2] R. Sathya, Prateek Kulkarni, Momin Nawaf Khalil, Shishir Chandra Nigam.”Stock Price Prediction using Reinforcement Learning and Feature Extraction”, International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878 (Online), Volume-8 Issue-6, March 2020 [3] Jae Won idee,“STOCK PRICE PREDICTION USING REINFORCEMENT LEARNING”,ISIE 2001, Pusan, KOREA [4] Yawei Li, Peipei Liu, Ze Wang, "Stock Trading Strategies Based on Deep Reinforcement Learning", Scientific Programming, vol. 2022, Article ID 4698656, 15 pages, 2022. [5] Gran, Petter Kowalik; Holm, August Jacob Kjellevold; Søgård, Stian Gropen “A Deep Reinforcement Learning Approach to Stock Trading”,Norwegian University of Science and Technology,2019 [6] Mustain Billah, Sajjad Waheed, Abu Hanifa In “Stock Market Prediction Using an Improved Training Algorithm of Neural Network”,2nd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE),8-10 December 2016. [7] K. Hiba Sadia, Aditya Sharma, Adarrsh Paul, SarmisthaPadhi, Saurav Sanyal ,“Stock Market Prediction Using Machine Learning Algorithms”,International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019 [8] VivekKanade, BhausahebDevikar, SayaliPhadatare, PranaliMunde, ShubhangiSonone. “Stock Market Prediction: Using Historical Data Analysis”, IJARCSSE 2017 11 [9] Meng, T.L.; Khushi, M. Reinforcement Learning in Financial Markets. Data 2019, 4, 110. [10] SachinSampatPatil, Prof. Kailash Patidar, Asst. Prof. Megha Jain, “A Survey on Stock Market Prediction Using SVM”,International Journal of Current Trends in Engineering & Technology Volume: 02, Issue: 01 ,JAN-FAB 2016. [11] RautSushrut Deepak, ShindeIshaUday, Dr. D. Malathi, “Machine Learning Approach In Stock Market Prediction”, IJPAM 2017 [12] Loke K.S. In “Impact of Financial Ratios and Technical Analysis on Stock Price Prediction Using Random Forests”,International Conference on Computer and Drone Applications (IConDA),2017. Dataset and Colab Link: 12