A NOVEL TIME SERIES FORECASTING METHOD USING FUZZY INFORMATION RETRIEVAL SYSTEM A project report submitted in partial fulfillment of the requirements for B.Tech. Project B.Tech. by Anuj Bhatt(2016IPG-017) Prakhar Sharma(2016IPG-071) Pranjal Srivastava(2016IPG-072) ATAL BIHARI VAJPAYEE INDIAN INSTITUTE OF INFORMATION TECHNOLOGY AND MANAGEMENT-474 010 2019 2 CANDIDATES DECLARATION We hereby certify that the work, which is being presented in the report, entitled A Novel Time Series Forecasting Method Using Fuzzy Information Retrieval System, in partial fulfillment of the requirement for the award of the Degree of Bachelor of Technology and submitted to the institution is an authentic record of our own work carried out during the period May 2019 to September 2019 under the supervision of Dr. W. Wilfred Godfrey and Dr. Jeevaraj S.. We also cited the reference about the text(s)/figure(s)/table(s) from where they have been taken. Date: Signatures of the Candidates This is to certify that the above statement made by the candidates is correct to the best of my knowledge. Date: Signatures of the Research Supervisors 3 Abstract The time series is an efficient way to study about the existing trends and then take future decisions according to the results obtained during the analysis. Stock market is one such platform where time series is of utmost use and importance in order to predict the future market trend. Even though stock markets have a high level of entropy and randomness about them , still they are somewhat time driven and by sentiments of market players. This leads to the task of modelling of the stock market which is a difficult task as it includes the raw, random data and the hidden market sentiment. To solve this problem, we convert the time series consisting of raw data of stock markets having Open, High, Low, Close values into a fuzzy linguistic time series. Information Retrieval systems are used to find the most relevant documents based upon a query. Fuzzy Information Retrieval Systems use this logic and find the most relevant document based upon the tf-idf scores of values in the documents. The novelty of the approach followed here is that we include the different kinds of candlesticks that are used to quantify the trend reversals in a market (for example - hanging man, kicking bullish candlestick) and the relative strength index (RSI) values (which signify momentum of the market) to our list of parameters. These added factors to the fuzzy representation of trends in our documents makes the future trend prediction more accurate. Keywords: RSI, tf-idf, candlestick, document, fuzzy inference system, hanging man, kicking bullish. fuzzy linguistic time series 4 ACKNOWLEDGEMENTS We are highly indebted to Dr. W. Wilfred Godrey and Dr. Jeevaraj S., and are obliged for giving us the autonomy of functioning and experimenting with ideas. We would like to take this opportunity to express our profound gratitude to them not only for their academic guidance but also for their personal interest in our project and constant support coupled with confidence boosting and motivating sessions which proved very fruitful and were instrumental in infusing self-assurance and trust within us. The nurturing and blossoming of the present work is mainly due to their valuable guidance, suggestions, astute judgment, constructive criticism and an eye for perfection. Our mentor always answered myriad of our doubts with smiling graciousness and prodigious patience, never letting us feel that we are novices by always lending an ear to our views, appreciating and improving them and by giving us a free hand in our project. It’s only because of their overwhelming interest and helpful attitude, the present work has attained the stage it has. Finally, we are grateful to our Institution and colleagues whose constant encouragement served to renew our spirit, refocus our attention and energy and helped us in carrying out this work. (Anuj Bhatt) (Prakhar Sharma) (Pranjal Srivastava) Contents List of Tables 6 List of Figures 7 1 Introduction and Literature Review 8 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.1 Japanese Candlestick Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.2 Candlestick Types (Trend Reversal Patterns) . . . . . . . . . . . . . . . . . . . 8 1.1.3 Relative Strength Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.1.4 Fuzzy Logic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.1.5 Fuzzy Information Retrieval System . . . . . . . . . . . . . . . . . . . . . . . . 15 1.1.6 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.1.7 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Design Details and Implementation 18 2.1 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Rearrangement and Reformation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 RSI Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 RSI - Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 RSI - Swing Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Classification and Function Definition . . . . . . . . . . . . . . . . . . . . . . . 20 20 2.4.1 Fuzzification of candlestick properties . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 Previous Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.3 2.4 3 4 Fuzzy Rules for Candlestick Classification . . . . . . . . . . . . . . . . . . . . 24 2.5 Document Formulation for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6 Document Formulation for Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.7 Document Matching and TF-IDF Score Calculation . . . . . . . . . . . . . . . . . . . . 28 Results and Discussion 29 3.1 Future Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Future Trend Prediction (Final Output) . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Conclusion 32 Bibliography 33 5 List of Tables 3.1 BSE Sensex Data without RSI values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 An example cluster of 5 days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 BSE Sensex Data with RSI values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 An example candlestick cluster of 5 days . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 An example fuzzy candlestick cluster of 5 days . . . . . . . . . . . . . . . . . . . . . . 31 6 List of Figures 1.1 A White Japanese Candlestick and a Black Japanese Candlestick . . . . . . . . . . . . . 9 1.2 Marubozu Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Doji Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Umbrella Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Kicking Bearish Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Engulfing Bearish Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Bearish Harami Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.8 Bearish Meeting Line Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.9 Bearish Hanging Man Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.10 Bearish One Black Crow Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.11 Bearish Descending Hawk Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.12 Bullish Kicking Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.13 Bullish Engulfing Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.14 Bullish Harami Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.15 Bullish Meeting Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.16 Bullish Hammer Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.17 Bullish Piercing Line Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.18 Bullish Homing Pigeon Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.19 Bullish One White Soldier Candlestick . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Bullish Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Bullish Swing Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Membership function for US(k) & LS(k) . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Membership function for BL(k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Membership function for gap(k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Membership function for trend(k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Membership function for difclose(k), difopen(k) & difcentral(k) . . . . . . . . . . . . . . 24 3.1 Candlesticks for 2-July-2019 to 8-July-2019 . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 RSI graph for 2-July-2019 to 8-July-2019 . . . . . . . . . . . . . . . . . . . . . . . . . 31 7 Chapter 1 Introduction and Literature Review This chapter includes the details of time series and fuzzy information retrieval system. 1.1 Introduction The amount of information present in this world is growing with each day in volume as well as complexity. The large volume of data available to us is crossing the limits of our existing search technologies/ information retrieval systems to provide us the required information with precision and time boundedness. So in order to solve the existing problem, we have tried to develop a novel Fuzzy Information Retrieval system to use it for time series forecasting over a given dataset. A new aggregation operator is used in fuzzy information retrieval to overcome the drawbacks of the existing methods. A complete model in python is being developed to successfully showcase these concepts proposed by us. 1.1.1 Japanese Candlestick Theory The traditional statistical inference system lacks the robustness of dealing with complex real world time series since they are based on strict assumptions while the computational inference systems ignore the dependency structure of time series observations. Thus to overcome the shortcomings of these previous two models, a fuzzy information retrieval system is proposed that will be utilised as an inference system. 1.1.2 Candlestick Types (Trend Reversal Patterns) • Basic Candlestick Types Some basic candlesticks types are defined below: – Normal Candlestick: This is characterized by a candlestick having a significant body length and shadow length. – Marubozu: This is characterized by a candlestick with negligible shadow lengths. – Doji: This is characterized by a candlestick with negligible body length and significant shadow lengths. – Umbrella: This is characterized by a candlestick when one of the shadows is negligible while the other is significant along with very small body length. 8 1.1. INTRODUCTION 9 Figure 1.1: A White Japanese Candlestick and a Black Japanese Candlestick Figure 1.2: Marubozu Candlestick Figure 1.3: Doji Candlestick • Trend Reversal Patterns Trend reversal patterns are characteristic patterns associated with the candlesticks that showcase the reversal of the trend of the current market. These are helpful in forecasting the future trend of the market. – Bearish Pattern i.e. the patterns which signify a downward future trend – Bullish Pattern i.e. the patterns which signify an upward future trend Bearish Patterns Bearish candle reveals the bearish trend of the market. They can be of numerous types : 10 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW Figure 1.4: Umbrella Candlestick – Kicking Bearish This pattern is characterised by a white marubozu and then immediately followed by a black marubozu. The new session is opened below the opening session of previous candlestick and therefore there is a gap between the two candlestick. Figure 1.5: Kicking Bearish Candlestick – Engulfing Bearish In this the market is an upward trend. After the final white body the next body is a black body that entirely engulfs the preceding white body. Figure 1.6: Engulfing Bearish Candlestick – Bearish Harami The prevailing trend in the market is an upward trend. The final white body is followed by a black body which is completely engulfed by the white body formed on the previous day. – Bearish Meeting line This candlestick is characterized by a white candlestick formed on the first day followed by a black candlestick the next day. The black candlestick opens at a sharply higher level as 1.1. INTRODUCTION 11 Figure 1.7: Bearish Harami Candlestick compared to the white candle but closes at the same level as the previous session’s close. Figure 1.8: Bearish Meeting Line Candlestick – Bearish Hanging Man The trend in the market before this pattern is an uptrend. It is characterised by the formal of a small black body formed at the top of its daily trading range. It has a lower shadow twice the size of its body and its looks like a hanging man. This the reason why the pattern is named like this. Figure 1.9: Bearish Hanging Man Candlestick – Bearish One Black Crow The market is characterised by an upward trend. A white candlestick is formed on the first day and is followed by a black candlestick. The black candlestick opens at a value lower than last day’s close and closes below the open of the previous candlestick. – Bearish Descending Hawk This pattern is observed when a big white body on the first day which engulfs the other small 12 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW Figure 1.10: Bearish One Black Crow Candlestick white body formed on the next day. Its similar to harami pattern except the fact that both the candlesticks in this are white. Figure 1.11: Bearish Descending Hawk Candlestick Bullish Patterns Bullish candle reveals the bearish trend of the market. They can be of numerous types : • Kicking Bullish This pattern is characterised by a black marubozu and then immediately followed by a white marubozu. The new session opens above the previous day’s opening. Figure 1.12: Bullish Kicking Candlestick • Engulfing Bullish The market is characterised by downward trend in which a black body is observed on the first day. On the second day a white body is formed which engulfs the previous day’s black body. • Bullish Harami 1.1. INTRODUCTION 13 Figure 1.13: Bullish Engulfing Candlestick The prevailing trend in the market is downward trend. The black body formed on the first day is followed by a white body which is engulfed by the black body. Figure 1.14: Bullish Harami Candlestick • Bullish Meeting line This candlestick is characterized by a black candlestick formed on the first day followed by a white candlestick the next day. The white candlestick closes at the same level as the previous session’s close. Figure 1.15: Bullish Meeting Line • Bullish Hammer The trend in the market before this pattern is a downtrend. It is characterised by the formation of a small body either black or white formed at the top of its daily trading range. It has a lower shadow twice the size of its body and it looks like a hammer. This the reason why the pattern is named like this. • Bullish Piercing Line The market is characterised by an downward trend. A black candlestick is formed on the first day 14 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW Figure 1.16: Bullish Hammer Candlestick and is followed by a white candlestick. The white candlestick opens with a gap down and closes halfway into the body of the black candlestick but not above it. Figure 1.17: Bullish Piercing Line Candlestick • Bearish Homing Pigeon This pattern is observed when a big black body on the first day which engulfs the other black white body formed on the next day. Its similar to harami pattern except the fact that both the candlesticks in this are black. Figure 1.18: Bullish Homing Pigeon Candlestick • Bullish One White Soldier The prevailing trend in the market is a downward trend and is characterised by two candlesticks. The candlestick on the first day has a black body and is followed by a white body. The white candlestick opens above the previous day’s close and closes above its open. 1.1. INTRODUCTION 15 Figure 1.19: Bullish One White Soldier Candlestick 1.1.3 Relative Strength Index The relative strength index (RSI) is a momentum indicator that measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock or other asset. 1.1.4 Fuzzy Logic Theory L.A. Zadeh introduced fuzzy set theory in which a fuzzy set F defined over a universe of discourse U is a set of pairs: F = {(x, µF (x)) : xU, µF (x)[0, 1]} where µF (x) is called membership degree of the element x to the fuzzy set F. In the method proposed in this report, the concept of fuzzy logic is used to represent the approximate nature of candlestick time series and its properties in terms of linguistic variables which are saved as a collection of documents. 1.1.5 Fuzzy Information Retrieval System Information retrieval systems are defined to obtain the information resource which is most relevant to a query made to a collection of these resources. These are broadly classified as: • Algebraic models modifies queries and documents into mathematical objects like vectors. matrices,etc. – Vector space model – Extended Boolean model – Latent semantic indexing model • Set-theoretic models modifies queries and documents into sets of phrases. These are used to derive similarities by utilising set-theory. Examples: – Boolean model – Fuzzy retrieval model Boolean models use Boolean indexing process. Thus, they are intolerant to any kind of the imprecision in the information. To overcome this limitation, fuzzy retrieval systems are being developed. The process 16 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW of fuzzification can effectively handle the user introduced vagueness in queries and is also very effective in estimating partial relevance of the documents for a query. In the method proposed in this report, The concept of fuzzy information retrieval is used to fetch the most relevant document which is used to predict the future trend with respect to the query. 1.1.6 Literature Review According to Fama [3] stock markets are random walks and this limits the predictability of a stock market. As per Bagheri [2] there are mainly two kind of tools for predicting a stock market trend, first being fundamental analysis while the other being technical analysis. Fundamental analysis utilises the knowledge of the structure of the company and the market it functions in. Technical analysis uses a data mining techniques to find the association rules in the dataset. The approach we are using in the method proposed in this paper is based on technical analysis. Zhang [6] used neural networks with Bacterial Chemo-taxis Optimization(IBCO) for predicting the stock market values. L. Wang proposed the conversion of normal time series into fuzzy time series and used it to make stock market prediction. The data was fuzzified to the cluster centers in their approach. W. Zhang suggested methods for indexing and classification of text. So, the assessment of semantic and statistical qualities of text still isn’t standardised. Attia[4] proposed a linguistic fuzzy information retrieval model. Gupta proposed that performance is increased by use of fuzzy logic. Korol [7] proposed a system that works by using fuzzy rules contained in a knowledge base. Partha[10] designed a time series forecasting method by utilising document retrieval and a modified tf-idf scheme. Gupta [9] proposed a ranking function which was used to find the most relevant document corresponding to a query based on weights of the terms. Fuzzy logic was used to implement ranking on two levels, thus increasing the total number of fuzzy rules and increasing the accuracy of the output. Zadrozny [8] proposed a new Information Retrieval (IR) system based on Zadeh’s calculus of linguistic statements. This model extend the normal fuzzy logic by extending the usual method of information retrieval based on finding the most relevant document out of the pool. Hong [11] drew a comparison between the various power-mean averaging operators currently used in retrieving relevant documents in information retrieval systems. They proposed weighted power-mean averaging operators which found the most relevant document based on the cumulative weights of each term in query and document base. Naranjo[14] proposed a way to identify patterns of candlestick in the stock market by making use of fuzzy logic. The use of fuzzy logic made it easier to quantify the uncertainty of the market. The performance was tested against two different stock markets Nasdaq-100 and Eurostoxx. Using fuzzy rules and candlesticks, they were able to improve results as it was less risky and showed a stable behavior for which it was tested. The literature review helped in formulation of these concepts: • It was observed that forecasting is very complex process and if the time series data involves a financial aspect, then the complexity further increase. 1.1. INTRODUCTION 17 • In most of the approaches the information retrieval systems have a very limited purpose i.e. to assign relevance score to documents and returning the document with the highest value as per the query. • Time series contains a lot of information. A forecasting algorithm would give accurate results if and only if it can be extracted completely. 1.1.7 Objectives The main objective of this project is to design a time series forecasting system by utilising fuzzy logic. Sub-objectives are as follows: • To convert given time series data to a linguistic fuzzy time series data. This is done by fuzzifying all the relevant properties of time series. • To convert the fuzzified time series into documents. • To design a fuzzy information retrieval system that utilises these documents to predict future trends. • To improve the existing ranking functions used in calculating relevance of a document to a query. Chapter 2 Design Details and Implementation This chapter covers topics about the details of the design aspects of our project and its implementation. 2.1 Proposed Methodology The methodology used in the process described in this report is presented in Fig 2.1. The historical stock market data contains Open, High, Low and close values of each day. This data is represented as Japanese Candlesticks. The properties extracted from these candlesticks are then fuzzified. This fuzzified data is used in formation of rule-base to the model. This fuzzy rule base is saved in the form of document corpus which is served as an input to the information retrieval system. Simultaneously, we develop fuzzy queries which are used in the information retrieval process. We use tf-idf scheme to perform the fuzzy query processing. The result of this process gives the forecasted trend. 2.2 Data Collection We collected S&P BSE SENSEX Index data from BSE India’s website. This data contains opening, closing, high and low values for each trading session that happened from 1 st January 1991 to 31 st May 2019. It is stored as a csv file. 2.3 Rearrangement and Reformation of Data In this section we rearrange the data as clusters of 5 days. 1. (-> Monday -> Tuesday -> Wednesday -> Thursday -> Friday) 2. (-> Tuesday -> Wednesday -> Thursday -> Friday -> Saturday) 3. (-> Wednesday -> Thursday -> Friday -> Saturday -> Sunday) 4. (-> Thursday -> Friday -> Saturday -> Sunday -> Monday ) 5. (-> Friday -> Saturday -> Sunday -> Monday -> Tuesday) The clusters we arranged for each consecutive day , this way we were able to create more number of clusters for the time period provided. 2.3.1 RSI Calculation Let Xkj represent the j value (Close, Low, High or Open value(OPCL)) on kth day . Then, We define change as the absolute difference of closing values on kth and (k − 1)th day. 18 2.3. REARRANGEMENT AND REFORMATION OF DATA 19 i.e. change = XCk − XCk−1 Let Uk and Dk represent the upward trend and downward trend respectively on kth day. Then, change ≤ 0 0 Uk = change change > 0 and change change < 0 Dk = 0 change ≥ 0 Let Uk and Dk as the average upward movement and average downward movement from (k − 4)th day to kth day respectively. Then, Pk Uk = and i=k−4 Ui 5 Pk Dk = i=k−4 Di 5 Relative Strength(RS) is defined as average upward movement divided by average downward movement. i.e. RS k = Uk Dk and relative strength index(RSI) is formulated as RS Ik = 100 − 2.3.2 100 1 + RS k RSI - Divergence Divergence is an indicator of coming trend reversal in the stock market.It may be bullish or bearish in nature. Bearish divergence occurs when the market is in an uptrend and the trend is going to reverse. • Bullish divergence occurs when the market is in a downtrend and the trend is going to reverse. Bullish Divergence is characterised by the dropping of RSI value into the oversold category i.e. less than 30, after that a higher low is achieved that matches correspondingly lower lows in the price. The RSI reading stays for some time in the oversold region gaining in strength signifying an upcoming trend reversal to a bullish trend. • Bearish divergence occurs when the market is in an uptrend and the trend is going to reverse. Bearish Divergence is characterised by the floating of RSI values in the overbought region i.e. greater than 70, after that a lower high is achieved that matches correspondingly lower highs in the price. The RSI reading stays for some time in the overbought region losing in strength signifying an upcoming trend reversal to a bullish trend. 20 CHAPTER 2. DESIGN DETAILS AND IMPLEMENTATION Figure 2.1: Bullish Divergence 2.3.3 RSI - Swing Rejection Swing Rejection is another indicator of coming trend reversal in the stock market. It also might be bullish or bearish in nature. Bullish swing rejection is characterised by a bullish trend after a downtrend in the market. Bearish swing rejection is characterised by a bearish trend after an uptrend in the market. • Bullish swing rejection occurs when the market is in a downtrend, the RSI falls into the oversold territory (<30). After that RSI climbs back up without falling back into the oversold territory and keeps on climbing. After some time the RSI dips without crossing back into the oversold territory. After that RSI climbs much higher than its previous high signifying an uptrend or a trend reversal. • Bearish swing rejection occurs when the market is in an uptrend, the RSI climbs into the overbought territory (>70). After that RSI falls down without climbing back into the overbought territory and keeps on falling. After some time the RSI rises without crossing back into the overbought territory. After that RSI falls much lower than its previous low signifying a downtrend or a trend reversal. – RSI falls into oversold territory. – RSI crosses back above 30. – RSI forms another dip without crossing back into oversold territory. – RSI then breaks its most recent high. 2.4 Cluster Classification and Function Definition Let high(k), low(k), open(k) and close(k) be the highest, lowest, opening and closing values and let US (k), LS (k) and BL(k) be the upper shadow, lower shadow and body length respectively for a trading day k. Then, US (k) = 100. high(k) − max(open(k), close(k)) open(k) 2.4. CLUSTER CLASSIFICATION AND FUNCTION DEFINITION 21 Figure 2.2: Bullish Swing Rejection LS (k) = 100. min(open(k), close(k) − low(k) open(k)) BL(k) = 100. close(k) − open(k) close(k) To gain more insight on this time series data, we define following fuzzy variables: • gap: It is defined if and only if the highest value of preceding day was less than the lowest value encountered today. It is the percentage relation between the gap-size and open/close value. i.e. 0 gap(k) = 100. low(k)−high(k−1 low(k) low(k) ≤ high(k − 1) in other cases • trend: It represents the trend of the last two candlesticks, whether they represent bullish or bearish trend. trend(k) = 100. close(k) − close(k − 1) close(k) • open-difference: It is defined as the percentage difference between low(k-1) and open(k). i.e. 0 dopen(k) = 100. low(k−1)−open(k) low(k) low(k − 1) ≤ open(k) in other cases • central-difference: It is defined as the percentage difference between the closing value on some day and the average of open and close values of preceding day. 22 CHAPTER 2. DESIGN DETAILS AND IMPLEMENTATION i.e. 0 dcentral(k) = 100. close(k)−(open(t−1)+close(t−1))/2 close(k) close(k) ≤ open(t−1)+close(t−1) 2 in other cases • closing-difference: It is defined as the percentage difference between high(k-1) and close(k). i.e. 0 dclose(k) = 100. close(k)−high(k−1) close(k) 2.4.1 close(k) ≤ high(k − 1) in other cases Fuzzification of candlestick properties Fig 2.1 describes the fuzzy membership function used for used for fuzzifying upper shadow length and lower shadow length of a candlestick. It converts the crisp value into one of the four linguistic variables: NULL, SHORT, MIDDLE and LONG. Similarly, Fig 2.2 to 2.5 describe membership functions used for fuzzifying other crisp variables defined above. For body length, there are 7 possible labels: BLACK_LONG, BLACK_MIDDLE, BLACK_SHORT, EQUAL, WHITE_SHORT, WHITE_MIDDLE and WHITE_LONG. For gap, close-difference, central-difference and open-difference, there are 4 possible values: NULL, SHORT, MIDDLE and LONG. For trend, there are 7 possible values: LONG_BEARISH, MIDDLE_BEARISH, SHORT_BEARISH, NULL, SHORT_BULLISH, MIDDLE_BULLISH and LONG_BULLISH. Figure 2.3: Membership function for US(k) & LS(k) 2.4. CLUSTER CLASSIFICATION AND FUNCTION DEFINITION 23 Figure 2.4: Membership function for BL(k) Figure 2.5: Membership function for gap(k) Figure 2.6: Membership function for trend(k) 2.4.2 Previous Trend The crisp value of previous trend is derived as the average of crisp trend values(defined by trend(x) function). i.e. PrevT rend(cluster) = trend(day1) + trend(day2) + trend(day3) 3 The fuzzy value of previous trend is generated by using the membership function as shown in figure 2.4. • If the fuzzy value comes out to be either one of LONG_BEARISH, MIDDLE_BEARISH or SHORT_BEARISH, we call the value of previous trend to be BEARISH. 24 CHAPTER 2. DESIGN DETAILS AND IMPLEMENTATION Figure 2.7: Membership function for difclose(k), difopen(k) & difcentral(k) • If the fuzzy value comes out to be NULL, we call the value of previous trend to be NEUTRAL. • If the fuzzy value comes out to be either one of LONG_BULLISH, MIDDLE_BULLISH or SHORT_BULLISH, we call the value of previous trend to be BULLISH. 2.4.3 Fuzzy Rules for Candlestick Classification Kicking Bullish : This candlestick is observed if : • value of fourth fuzzified candlestick day variable FuzzyUpper is NULL AND • value of fourth fuzzified candlestick day variable FuzzyLower is NULL AND • value of fifth fuzzified candlestick day variable FuzzyUpper is NULL AND • value of fifth fuzzified candlestick day variable FuzzyLower is NULL AND • value of fifth day variable of Low > value of fourth day variable of High AND • value of fourth candlestick day variable of Body < -0.5 AND • value of five candlestick day variable of Body>0.5 AND Piercing Line : This candlestick is observed if: • value of fourth candlestick day variable Body is <-0.5 AND • value of fifth candlestick day variable Body is >0.5 AND • value of fifth day variable of Open < value of fourth day variable of Low AND • value of fifth day variable of Close > value of fourth day variable of Body/2 AND • value of fifth day variable of Close < value of fourth day variable of Open AND • value of fourth fuzzified candlestick day variable FuzzyBody is BLACKMIDDLE OR value of fourth fuzzified candlestick day variable FuzzyBody is BLACKLONG Engulfing : This candlestick is observed if: 2.4. CLUSTER CLASSIFICATION AND FUNCTION DEFINITION 25 • fourth candlestick day variable Body < -0.5 AND • fifth candlestick day variable Body > 0.5 AND • value of fourth day variable of High <= value of fifth day variable of Close AND • value of fourth day variable of Low >= value of fifth day variable of Open Harami : This candlestick is observed if: • fourth candlestick day variable Body <-0.5 AND • fifth candlestick day variable Body > 0.5 AND • value of fourth day variable of Open >= value of fifth day variable of High AND • value of fourth day variable of Close <= value of fifth day variable of Low Inverted Hammer : This candlestick is observed if: • value of fifth day variable of Low < value of fourth day variable of Low AND • value of fifth fuzzified candlestick day variable FuzzyLower is NULL AND • value of fourth day variable of Body <-0.5 AND • (value of fifth day variable of Low - MIN (value of fifth day variable of Open, value of fifth day variable of Close)) < value of fifth day variable of Body/5 AND • value of fifth day variable of High - MAX(value of fifth day variable of Open , value of fifth day variable of Close)> 2*ABS(value of fifth day variable of Open - value of fifth day variable of Close) One White Soldier : This candlestick is observed if: • value of fourth day variable of Body < -0.5 AND • value of fifth day variable of Body > 0.5 AND • value of fifth day variable of Open > value of fourth day variable of Close AND • value of fifth day variable of Close > value of fourth day variable of Open AND • (value of fourth fuzzified candlestick day variable FuzzyBody is BLACKMIDDLE or value of fourth fuzzified candlestick day variable FuzzyBody is BLACKLONG)) Homing Pigeon : This candlestick is observed if: • value of fourth day variable of Body < -0.5 AND • value of fifth day variable of Body < -0.5 AND • value of fourth day variable of High > value of fifth day variable of High AND • value of fourth day variable of Low < value of fifth day variable of Low) 26 CHAPTER 2. DESIGN DETAILS AND IMPLEMENTATION Meeting Line : This candlestick is observed if: • value of fourth day variable of Body < -0.5 AND • value of fifth day variable of Body > 0.5 AND • ((value of fourth day variable of Close-value of fifth day variable of Close])/value of fourth day variable of Close) <= 0.5 AND • ((value of fourth day variable of Close-value of fifth day variable of Close])/value of fourth day variable of Close) >= 0) Kicking Bearish : This candlestick is observed if: • value of fourth fuzzified candlestick day variable FuzzyUpperis NULL AND • value of fourth fuzzified candlestick day variable FuzzyLoweris NULL AND • value of fifth fuzzified candlestick day variable FuzzyUpperis NULL AND • value of fifth fuzzified candlestick day variable FuzzyLower is NULL AND • value of fourth day variable Low > value of fifth day variable High AND • value of fourth candlestick day variable of Body > 0.5 AND • value of fifth candlestick day variable of Body < -0.5 Engulfing : This candlestick is observed if: • value of fourth day variable of Body > 0.5 AND • value of fifth day variable of Body <- 0.5 AND • value of fourth day variable of High <= value of fifth day variable of Open AND • value of fourth day variable of Low >= value of fifth day variable of Close Harami : This candlestick is observed if: • value of fourth candlestick day variable Body >0.5 AND • value of fifth candlestick day variable Body < -0.5 AND • value of fourth day variable of Close >= value of fifth day variable of High AND • value of fourth day variable of Open <= value of fifth day variable of Low Meeting Line : This candlestick is observed if: • value of fourth day variable Body > 0.5 AND • value of fifth day variable of Body < -0.5 AND 2.4. CLUSTER CLASSIFICATION AND FUNCTION DEFINITION 27 • ((value of fifth day variable of Close-value of fourth day variable of Close)/value of five day variable of Close) <= 0.5 AND • ((value of fifth day variable of Close-value of fourth day variable of Close)/value of five day variable of Close) >= 0) Hanging Man : This candlestick is observed if: • value of fifth day variable High > value of fourth day variable High AND • value of fifth fuzzified candlestick day variable FuzzyUpperis NULL AND • (value of fifth day variable High - MAX(value of fifth day variable Open, value of fifth day variable Close) < value of fifth day variable Body/5) AND • MIN((value of fifth day variable Open, value of fifth day variable Close) - value of fifth day variable Low > 2*ABS(value of fifth day variable Open - value of fifth day variable Close)) Descending Hawk : This candlestick is observed if: • value of fifth day variable Body > 0.5 AND • value of fifth day variable Body > 0.5 AND • value of fourth day variable Close > value of fifth day variable High AND • value of fourth day variable Open < value of fifth day variable Low) One Black Crow : This candlestick is observed if: • value of fourth day variable Body > 0.5 AND • value of fifth day variable Body < -0.5 AND • (value of fourth fuzzified candlestick day variable FuzzyBody is WHITEMIDDLE OR value of fourth fuzzified candlestick day variable FuzzyBody is WHITELONG) AND • value of fifth day variable Close < value of fourth day variable Low AND • value of fifth day variable Open > value of fourth day variable Body/2) Dark Cloud Clover : This candlestick is observed if: • value of fourth day variable Body > 0.5 AND • value of fifth day variable Body < -0.5 AND • (value of fourth fuzzified candlestick day variable FuzzyBody is WHITEMIDDLE OR value of fourth fuzzified candlestick day variable FuzzyBody is WHITELONG) AND • value of fifth day variable Open > value of fourth day variable Close AND • value of fifth day variable Close > value of fourth day variable Open 28 CHAPTER 2. DESIGN DETAILS AND IMPLEMENTATION 2.5 Document Formulation for Data In our method, we have categorised the data of each fuzzified cluster into 3 documents: representing bearish, neutral and bullish previous trend. A document is formulated as a string of fuzzified values of : Previous Candlestick Trend + Identified Candlestick Cluster+ Previous RSI Trend+ Divergence + Swing Rejection. Each of these documents is appended in one of three documents mentioned above as per their future trends. For example Bullish + Hammer + Overbought + Bearish Divergence + No Swing Rejection 2.6 Document Formulation for Query Query is formulated the same way as the previous trends are defined. For each OHLC, RSI values in query doc the corresponding fuzzified candlestick clusters and candlesticks are identified. Thus, a document is formed for query just like before. 2.7 Document Matching and TF-IDF Score Calculation To find the most relevant document, we use the tf-idf scheme in our method. tf-idf scheme ranks the documents by their tf*idf value, where tf is the term frequency i.e. number of occurrences of term in the document; while idf is the inverted document frequency i.e. total number of occurrences of these terms in the documents. In our method, we have normalized the values of tf and idf before calculating the final score. Let tf and idf be the term frequency and inverted document frequency for any term. Then, we define t flog = log10 (t f ) id flog = log10 (id f ) We define the normalized tf and idf scored as tf-norm and idf-norm respectively. Let k be the total number of documents (tf−norm)i = (t flog )i Pk , then i=1 (t flog )i (idf−norm)i = (id flog )i Pk i=1 (id flog )i For each term in the query, we calculate its tf-idf value as (tf−idf)term = (t fnorm )term ∗ (id fterm )term We calculate relevance scores corresponding to each future trend document (BR, BL, NT). After that, the relevance value of the document is calculated as the sum of tf-idf scores of all the terms in the query. The document is maximum relevance score is considered to define the future trend. Chapter 3 Results and Discussion 3.1 Future Trend Future Trend is defined as the fuzzy trend of the set of first three days of the next cluster. It is calculated exactly the same way as the previous trend is calculated. Future Trend can be BEARISH, BULLISH OR NEUTRAL. It is expressed as - ’ Previous Candlestick Trend + Identified Candlestick Cluster + Previous RSI Trend + Divergence + Swing Rejection ’ and saved in three documents - BL, BR, NT. 3.2 Future Trend Prediction (Final Output) • For our future trend prediction lets take an example of trend prediction for dates 16 August 2019 to 20 August 2019. We have included bse data only upto 31 May 2019 so there will be no exact matching of trends for the upcoming future. Table 3.1: BSE Sensex Data without RSI values Date 02-Jul-19 03-Jul-19 04-Jul-19 05-Jul-19 08-Jul-19 09-Jul-19 10-Jul-19 11-Jul-19 12-Jul-19 15-Jul-19 16-Jul-19 17-Jul-19 18-Jul-19 19-Jul-19 Open 39811.68 39907.57 39917.65 39990.4 39476.38 38754.47 38701.99 38751.62 38941.1 39009.95 38961.86 39171.1 39204.47 39058.73 High 39838.49 39934.99 39979.1 40032.41 39476.38 38814.23 38854.85 38892.5 39021.84 39023.97 39173.89 39284.73 39204.47 39058.73 Low 39499.19 39732.38 39858.33 39441.38 38605.48 38435.87 38474.66 38631.31 38684.85 38696.6 38845.27 39081.14 38861.25 38271.35 Close 39816.48 39839.25 39908.06 39513.39 38720.57 38730.82 38557.04 38823.11 38736.23 38896.71 39131.04 39215.64 38897.46 38337.01 • Lets take 25-June-2019 to 19-July-2019 data for our query formulation. • After arranging the data, we include the RSI values in our data. 29 30 CHAPTER 3. RESULTS AND DISCUSSION Table 3.2: An example cluster of 5 days Date 03-Jul-19 04-Jul-19 05-Jul-19 08-Jul-19 09-Jul-19 Open 39907.57 39917.65 39990.4 39476.38 38754.47 High 39934.99 39979.1 40032.41 39476.38 38814.23 Low 39732.38 39858.33 39441.38 38605.48 38435.87 Close 39839.25 39908.06 39513.39 38720.57 38730.82 RSI 75.469 78.360 42.475 19.756 20.444 Table 3.3: BSE Sensex Data with RSI values Date 02-Jul-19 03-Jul-19 04-Jul-19 05-Jul-19 08-Jul-19 09-Jul-19 10-Jul-19 11-Jul-19 12-Jul-19 15-Jul-19 16-Jul-19 17-Jul-19 18-Jul-19 Open 39811.68 39907.57 39917.65 39990.4 39476.38 38754.47 38701.99 38751.62 38941.1 39009.95 38961.86 39171.1 39204.47 High 39838.49 39934.99 39979.1 40032.41 39476.38 38814.23 38854.85 38892.5 39021.84 39023.97 39173.89 39284.73 39204.47 Low 39499.19 39732.38 39858.33 39441.38 38605.48 38435.87 38474.66 38631.31 38684.85 38696.6 38845.27 39081.14 38861.25 Close 39816.48 39839.25 39908.06 39513.39 38720.57 38730.82 38557.04 38823.11 38736.23 38896.71 39131.04 39215.64 38897.46 74.570 75.469 78.360 42.475 19.756 20.444 17.301 36.099 33.034 44.009 56.902 60.959 42.258 • After that we take a cluster of 5 days representing one of our clusters from which the fuzzy values will be generated. • From the discrete OHLC and RSI values, we derive the normalised data representing various attributes. • After that, we fuzzify the normalised data. • Document Created for this cluster - Bearish Inverted Hammer Overbought No Divergence No Swing Rejection • Future Trend - BR ( Query matches results in document BR ) • The future trend predicted for 2 July is Bearish which matches the actual trend as can be seen from the RSI graph. The market is going to have a downward trend after an initial rise. This downward trend will continue until the market starts climbing again. Table 3.4: An example candlestick cluster of 5 days Date 02-Jul-19 03-Jul-19 04-Jul-19 05-Jul-19 08-Jul-19 Upper 0.0687 0.153 0.105 0.0 0.154 Lower 0.267 0.124 0.180 0.291 0.761 Body -0.171 -0.024 -1.207 -1.951 -0.061 Gap 0 0 0 0 0 Trend 0.057 0.172 -0.998 -2.047 0.026 Difopen 0 0 0 0 0 Difclose 0.002 0 0 0 0 Difcentral 0.063 0.086 0 0 0 RSI 75.469 78.360 42.475 19.756 20.444 3.2. FUTURE TREND PREDICTION (FINAL OUTPUT) 31 Table 3.5: An example fuzzy candlestick cluster of 5 days Date 02-Jul-19 03-Jul-19 04-Jul-19 05-Jul-19 08-Jul-19 FuzzyLower NULL NULL NULL NULL NULL FuzzyUpper NULL NULL NULL NULL NULL FuzzyBody BLACKSHORT BLACKSHORT BLACKSHORT BLACKSHORT BLACKSHORT FuzzyTrend NULL NULL SHORTBEARISH MIDDLEBEARISH NULL FuzzyGap NULL NULL NULL NULL NULL FuzzyDifopen NULL NULL NULL NULL NULL FuzzyDifclose NULL NULL NULL NULL NULL Figure 3.1: Candlesticks for 2-July-2019 to 8-July-2019 Figure 3.2: RSI graph for 2-July-2019 to 8-July-2019 FuzzyDifcentral NULL NULL NULL NULL NULL RSI VERYHIGHBEARISH VERYHIGHBEARISH LOWBEARISH HIGHBEARISH HIGHBEARISH Chapter 4 Conclusion The model proposed by us covers a lot of characteristics of the share market. RSI(Relative Strength Index) helps to gather information about the momentum of the stock market. The data in the time series that we have used is represented by candlesticks in the form of Open, High, Low, Close values as well as the properties of the candlesticks. Candlesticks are useful for representing the discrete data in an understandable form. There exist some candlesticks for identifying particular patterns of the market which helps to predict the future trends in the market (such as trend reversals). We then created fuzzy rules on the basis of cluster of days , each set of 5 consecutive days being one cluster. This helps us to create more number of clusters and expanding our document base. After that we normalised the discrete value by defining some functions which quantify the properties of candlesticks and trends. On the basis of this data, we derived membership functions through which we created fuzzified data. With the help of this fuzzy data, we then created fuzzy rules through which we were able to realise the different candlestick patterns which exist in the stock market. Along with momentum of the market, we used all of these characteristics to create documents for each cluster. Through fuzzy information retrieval systems, we were able to rank each term according to its relevance in the pool of documents representing the future trend. The query with the highest relevance is matched and we finally get the future trend of the market along with the fuzzy documents representing the strength of our prediction. So finally, the motivation behind this work was to measure the quantitative as well as the qualitative aspect of the time series. With the help of fuzzy logic we were able to preserve the information that is otherwise lost in other methods, this ensured a more accurate prediction which is not quantified but is linguistically stable. This motivates us to find trends in time series and be able to predict future trends in our time series analysis. 32 Bibliography [1] Hesham Ahmed Hefny Zeinab E. Attia, Ahmed M. Gadallah. An enhanced multi-view fuzzy information retrieval model based on linguistics. IERI Procedia 7, 7:90–95, 2014. [2] Peyhani Bagheri and Akbari. Financial forecasting using anfis networks with quantum-behaved particle swarm optimization. Expert Systems With Applications, 41:6235–6350, 2014. [3] Fama. Efficient capital markets: A review of theory and empirical work. The journal of finance, 25:383–417, 1970. [4] Ahmed M. Gadallaha Zeinab E. Attiaa and Hesham M. Hefnya. An enhanced multi-view fuzzy information retrieval model based on linguistics. IERI Procedia 7, 7:90–95, 2014. [5] Moya F Cordon O. and Zarco C. Automatic learning of multiple extended boolean queries by multiobjective ga-p algorithms. Studies in Fuzziness and Soft Computing, Springer, 137:100–127, 2004. [6] Ngai Zhang, Hu and Liu. Stock trading rule discovery with an evolutionary trend following model. Expert Systems with Applications, 42:212–222, 2015. [7] Korol. A fuzzy logic model for forecasting exchange rates. Knowledge Based Systems, 67:49–60, 2014. [8] Katarzyna Nowacka Slawomir Zadrozny. Fuzzy information retrieval model revisited. Fuzzy Sets and Systems, 160:2173–2191, 2009. [9] Ashish Saini Yogesh Gupta and A.K. Saxena. A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications, 42:1223–1234, 2015. [10] Ramesh Kumar Partha Roy and Sanjay Sharma. A novel fuzzy document based information retrieval model for forecasting. Fuzzy Information and Engineering, 9:137–159, 2017. [11] Li-Hui Wang Shyi-Ming Chen Won-Sin Hong, Shi-Jay Chen. A new approach for fuzzy information retrieval based on weighted power-mean averaging operators. Computers Mathematics with Applications, 53:1800–1819, 2007. [12] Ashish Saini Yogesh Gupta and A.K.Saxena. A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications, 42:1223–1234, 2015. [13] SÅĆawomir ZadroÅijny and Katarzyna Nowacka. Fuzzy information retrieval model revisited. Fuzzy Sets and Systems, 160:2173–2191, 2009. [14] Javier Arroyo Rodrigo Naranjo and Matilde Santos. Fuzzy modeling of stock trading with fuzzy candlesticks. Expert Systems With Applications, 162:2173–2191, 2017. 33