Use of data mining techniques for better insights of iron making processes Authors Arunabh Bhattacharjee, Sr Manager IT, Tata Steel Ltd., Jamshedpur 831001, India, arunabh.b@tatasteel.com Shambhu Tiwary, Head IT, Tata Steel Ltd., Jamshedpur 831001, India, sambhu@tatasteel.com Abhijit Roy, Head By Products plant, Tata Steel Ltd., Jamshedpur 831001, India, abhijit.roy@tatasteel.com ABSTRACT Data mining is a very powerful technique to extract useful, unknown and actionable information from large volumes of data. Whether it’s used to drive new business, reduce costs or gains the competitive edge, data mining is a valuable asset for every organization. By using data mining techniques to analyse the data that is accumulating and filling vast data warehouses, organizations can harness more insight from their large data stores to drive proactive decision making. With this technique, highly accurate predictive and descriptive models can be created for the organization to understand not only what has happened, but what will likely to happen next. Data mining has traditionally been used to drive market basket analysis, cross selling, reduce customer attrition by banking industry, reduce fraud, and detect criminal activities and patterns related to terror networks and so on. However in addition to the well-known applications of data mining it is now increasingly used to gain meaningful insights in very complex processes like that of coke making, sinter making, iron making etc. The strength of this technique is that it uses a methodology that is tool independent and industry neutral. Data mining techniques can be used to analyse and predict CSR (coke strength after reaction) for stamp charge coke, reduce NH3 in clean coke oven gas at coke by product plant, study the effect of sinter granulation index, reduce Si in hot metal etc. This paper will present in detail how data mining has been used to analyse NH3 in clean coke oven gas based on actual plant data. Keywords: Data mining, Coke Oven gas, Ammonia,ANN INTRODUCTION TO DATA MINING Data mining is the process of selecting, exploring and modelling large amounts of data to uncover previously unknown patterns with speed and scale. Because data mining technologies and predictive analytics bring value to all industries, these techniques are widely used around the world, and usage continues to grow. Whether it’s used to drive new business, reduce costs or gains the competitive edge, data mining is a valuable asset for every organization. By using data mining techniques to analyse the data that is accumulating and filling vast data warehouses, organizations can harness more insight from their large data stores to drive proactive decision making. Data mining is not just any other analysis technique but it is a technique with a difference. It can analyse large volumes of data in just few seconds. It can generate visualizations which is not possible with conventional data analysis techniques. It presents the data in a way we want to look at it. The life cycle of a data mining project consists of six phases as shown in figure1.0.The sequence of the phases is not rigid. Moving back and forth between different phases is always required. The outcome of each phase determines which phase, or particular task of a phase, has to be performed next. The arrows indicate the most important and frequent dependencies between phases. The outer circle in figure symbolizes the cyclical nature of data mining itself. Data mining does not end once a solution is deployed. The lessons learned during the process and from the deployed solution can trigger new, often morefocused business questions. Subsequent data mining processes will benefit from the experiences of previous ones. In the following, we briefly outline each phase: 1.1 Business understanding This initial phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives Figure 1.0: Life cycle of a data mining project consists of four phases. 1.2 Data understanding The data understanding phase starts with initial data collection and proceeds with activities that enable you to become familiar with the data, identify data quality problems, discover first insights into the data, and/or detect interesting subsets to form hypotheses regarding hidden information. 1.3 Data preparation The data preparation phase covers all activities needed to construct the final dataset data that will be fed into the modeling tool(s) from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Maximum amount of time and effort is invested in this stage. This step consists of data cleaning, removal of outliers, addressing the missing values with moving averages and so on. 1.4 Modeling In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, going back to the data preparation phase is often necessary. 1.5 Evaluation At this stage in the project, a model (or models) is ready to have high quality from a data analysis perspective. Before proceeding to final deployment of the model, it is important to thoroughly evaluate it and review the steps executed to create it, to be certain the model properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached. 1.6 Deployment Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the users can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases, it is the end user or the customer, who carries out the deployment steps so that he understands up front what actions need to be carried out in order to actually make use of the created models. This following application case will present in detail how data mining has been used to control ammonia in coke oven gas at coke by product plant based on actual plant data. COKE BY PRODUCT PLANT PROCESS The coke oven by-product plant is an integral part of the by-product coke making process. In the process of converting coal into coke using the by-product coke oven, the volatile matter in the coal is vaporized and driven off. This volatile matter leaves the coke oven chambers as hot, raw coke oven gas. After leaving the coke oven chambers, the raw coke oven gas is cooled which results in a liquid condensate stream and a gas stream. The functions of the by-product plant are to take these two streams from the coke ovens, to process them to recover by-product coal chemicals and to condition the gas so that it can be used as a fuel gas. Figure 2.0: Gas generated from coke oven battery Historically, the by-product chemicals were of high value in agriculture and in the chemical industry, and the profits made from their sale were often of greater importance than the coke produced. Nowadays however most of these same products can be more economically manufactured using other technologies such as those of the oil industry. Therefore, with some exceptions depending on local economics, the main emphasis of a modern coke by-product plant is to treat the coke oven gas sufficiently so that it can be used as a clean, environmentally friendly fuel. Because of the corrosive nature of ammonia, its removal is a priority in coke oven byproduct plants. Presence of moisture further aggravates the corrosive action of the gas. The basic layout of a coke by product plant is shown in figure 2.0. Figure 3.0: Schematic layout of by product plant The hot coke oven gas generated from the batteries is passed through a unit called primary cum deep cooling(PCDC).The purpose of passing through PCDC is to bring down the temperature of the gas from 80°Cto 22°C in order to facilitate separation of impurities, like tar, naphthalene and ammonical vapours,by means of condensation. The function of exhauster operation is to suck the C.O gas from batteries and discharge it at a higher pressure, so that it can smoothly flow through the downstream units. In pre scrubbing the gas is scrubbed with flushing liquor to remove tar and coal fines loading in the gas as well reduce the temperature. In ammonia scrubber, gas is scrubbed with circulating lean liquor which absorbs ammonia from gas .This rich liquor is sent to ammonia still to remove ammonia. The ammonia vapor released from ammonia still is then incinerated in ammonia incinerator as shown in ammonia removal circuit in figure 4.0 Figure 4.0: Ammonia removal circuit Therefore the key challenge in any coke by product plant is to control the level of impurities like ammonia (NH3) and to minimize its effect as far as possible by better process control. As a next step, we examine the scenarios where we might have to operate with only three temperatures and still aim to minimize ammonia in clean coke oven gas. For example say, we cannot control T (Average PCDC temp) and GT1 (Gas scrubber1 temperature) i.e. T + GT1 off. Similarly consider the following cases Figure 5.0: Ammonia scrubbing Case2 T+GT2, Case3 T+GT3 Data preparation This step involves identifying and selecting all relevant data that can be used for data mining. For a more comprehensive analysis following key parameters were considered viz Gas scrubber temperatures(GT1,GT2,GT3) Gas temperature after PCDC(T) Stripped liquor flow (m³/hr) Stripped liquor Conc. (mg/100cc) Stripped liquor Temp.(ºC) Ammonia in clean C.O. gas(mg/Nm³/hr) More than 2 years of data (FY13, FY15) have been used. Final subset of data was then treated for missing values, outliers etc. Data preparation tasks were performed multiple times and not in any prescribed order. Maximum amount of time and effort was spent at this stage. For this analysis we considered a volume of 10000 actual data points of by product plant. Modeling Data mining involves the execution of the various data mining algorithms against the prepared data sets. Several (tens to hundreds) mining runs were completed for this data mining project. As mentioned at the beginning our objective was to find out the best operating conditions where ammonia content in clean coke oven gas was lowest so that those conditions can be replicated. As a customary first step we need to find out the principal components. In principal component analysis few important factors are selected from the large number of factors impacting the process. Once we have shortlisted the key parameters we can concentrate on these vital few to get the desired effect. One of the ways to find the principal components is to find the correlation matrix. Since the number of variables considered are not many, instead of shortlisting the parameters on the basis of correlation matrix all the parameters were considered for further analysis .From the correlation matrix it is quite evident that ammonia is highly dependent on the gas scrubber temperatures i.e. GT1, GT2 and GT3.Next multiple prediction algorithms were used to predict the best operating range for ammonia but looking at the variation in the data, neural networks was considered the most appropriate for this case. Figure 7.0 shows the output of the neural networks prediction model. Table 1.0: Correlation Matrix Correlation analysis helps to identify the key parameters impacting NH3 in coke oven gas Neural networks are very sophisticated modeling and prediction making techniques capable of modeling extremely complex functions and data relationships. The sweeping success of neural networks over almost every other statistical technique can be attributed to power, versatility and ease of use Neural networks have a remarkable ability to derive and extract meaning, rules and trend from complicated, noisy and imprecise data. They can be used to extract patterns and detect trends that are governed by complicated mathematical functions which are too difficult, if not impossible, to model using analytic or parametric techniques. One of the abilities of neural networks is to accurately predict data that were not part of the training dataset, a process known as generalization. Refer to figure 6.0 for a better understanding of artificial neural networks (ANN). (b) A more typical NN (a) Input Layers X1 Output Nodes Hidden Layers X2 Output Layers X3 (c) Combination function X1 w1 w2 X2 Combination function + Transfer function = Active function X3 w3 Transfer function Figure 6.0: Artificial neural network (ANN) mechanism A neural network prediction is based on combination function as well transfer function. For NH3< 40 GT2 should lie between 29.5-30 Figure 7.0: Results of neural network output After analyzing all the histograms the results were tabulated.[Refer to table2.0a,2.0b] Table 2.0 a: Result summary for NH3 <40 . Table 2.0 b: Prediction of operating range at different conditions (NH3<40) The results had a very overwhelming response from the shop floor leading to revision of the standard operating procedures (SOP).Thus it was put to use in daily management practices of bye product plant showing steady decline in ammonia in clean coke oven gas as against the internal MOU of 120 mg/Nm3/hr. 2012-13 2013-14 2014-15 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15 200 150 100 50 0 MOU ACTUAL Monthly Figure 8.0: Ammonia in clean C O gas Results interpretation and discussion From the correlation matrix it is quite evident that ammonia (NH3) has a strong positive correlation with gas scrubbing temperatures viz GT1, GT2, GT3 and T where T is average PCDC temperature. After several mining runs the partisan with least average ammonia (NH3) was selected. Then each parameter was analyzed in depth to predict the best operating range. The results are shown in the form of a double histogram were the red bar superimposed over the brown bar shows the distribution of the parameter for the partisan with least average ammonia whereas the brown bars shows the distribution of the overall population of data.IBM’s intelligent miner was used as the data mining tool for this analysis. The results drawn from the data mining techniques were used to revise the standard operating procedures (SOP) which in turn has helped to strengthen the daily management practices in the plant. CONCLUSIONS The sweeping success of data mining over other statistical techniques can be attributed to power, versatility and ease of use. Thus data mining can not only be used in the areas of marketing and sales, fraud detection etc but also to gain meaningful insights in the area of complex manufacturing processes like that of iron and steel making. REFERENCES Training manual for Coke by product plant, Tata Steel Ltd Data mining techniques, Michael J. A Berry, Gordon Linoff, Wiley computer publishing. Intelligent Miner for data Application guide, Peter Cabena, Hyun Hee Choi, Il Soo Kim, Shuichi Otsuka,Joerg Reinschmidt, Gary Saarenvirta,IBM redbooks