P009-Data Mining

advertisement
Use of data mining techniques for better insights of iron making
processes
Authors
Arunabh Bhattacharjee, Sr Manager IT, Tata Steel Ltd., Jamshedpur 831001, India,
arunabh.b@tatasteel.com
Shambhu Tiwary, Head IT, Tata Steel Ltd., Jamshedpur 831001, India, sambhu@tatasteel.com
Abhijit Roy, Head By Products plant, Tata Steel Ltd., Jamshedpur 831001, India,
abhijit.roy@tatasteel.com
ABSTRACT
Data mining is a very powerful technique to extract useful, unknown and actionable
information from large volumes of data. Whether it’s used to drive new business, reduce
costs or gains the competitive edge, data mining is a valuable asset for every organization.
By using data mining techniques to analyse the data that is accumulating and filling vast
data warehouses, organizations can harness more insight from their large data stores to
drive proactive decision making. With this technique, highly accurate predictive and
descriptive models can be created for the organization to understand not only what has
happened, but what will likely to happen next.
Data mining has traditionally been used to drive market basket analysis, cross selling,
reduce customer attrition by banking industry, reduce fraud, and detect criminal activities
and patterns related to terror networks and so on. However in addition to the well-known
applications of data mining it is now increasingly used to gain meaningful insights in
very complex processes like that of coke making, sinter making, iron making etc. The
strength of this technique is that it uses a methodology that is tool independent and
industry neutral. Data mining techniques can be used to analyse and predict CSR (coke
strength after reaction) for stamp charge coke, reduce NH3 in clean coke oven gas at coke
by product plant, study the effect of sinter granulation index, reduce Si in hot metal etc.
This paper will present in detail how data mining has been used to analyse NH3 in clean
coke oven gas based on actual plant data.
Keywords: Data mining, Coke Oven gas, Ammonia,ANN
INTRODUCTION TO DATA MINING
Data mining is the process of selecting, exploring and modelling large amounts of data to
uncover previously unknown patterns with speed and scale. Because data mining
technologies and predictive analytics bring value to all industries, these techniques are
widely used around the world, and usage continues to grow.
Whether it’s used to drive new business, reduce costs or gains the competitive edge, data
mining is a valuable asset for every organization. By using data mining techniques to
analyse the data that is accumulating and filling vast data warehouses, organizations can
harness more insight from their large data stores to drive proactive decision making. Data
mining is not just any other analysis technique but it is a technique with a difference. It can
analyse large volumes of data in just few seconds. It can generate visualizations which is
not possible with conventional data analysis techniques. It presents the data in a way we
want to look at it.
The life cycle of a data mining project consists of six phases as shown in figure1.0.The
sequence of the phases is not rigid. Moving back and forth between different phases is
always required. The outcome of each phase determines which phase, or particular task of
a phase, has to be performed next. The arrows indicate the most important and frequent
dependencies between phases. The outer circle in figure symbolizes the cyclical nature of
data mining itself. Data mining does not end once a solution is deployed. The lessons
learned during the process and from the deployed solution can trigger new, often morefocused business questions. Subsequent data mining processes will benefit from the
experiences of previous ones. In the following, we briefly outline each phase:
1.1
Business understanding
This initial phase focuses on understanding the project objectives and requirements from a
business perspective, then converting this knowledge into a data mining problem definition
and a preliminary plan designed to achieve the objectives
Figure 1.0: Life cycle of a data mining project consists of four phases.
1.2
Data understanding
The data understanding phase starts with initial data collection and proceeds with activities
that enable you to become familiar with the data, identify data quality problems, discover
first insights into the data, and/or detect interesting subsets to form hypotheses regarding
hidden information.
1.3
Data preparation
The data preparation phase covers all activities needed to construct the final dataset data
that will be fed into the modeling tool(s) from the initial raw data. Data preparation tasks
are likely to be performed multiple times and not in any prescribed order. Maximum
amount of time and effort is invested in this stage. This step consists of data cleaning,
removal of outliers, addressing the missing values with moving averages and so on.
1.4 Modeling
In this phase, various modeling techniques are selected and applied, and their parameters
are calibrated to optimal values. Typically, there are several techniques for the same data
mining problem type. Some techniques have specific requirements on the form of data.
Therefore, going back to the data preparation phase is often necessary.
1.5 Evaluation
At this stage in the project, a model (or models) is ready to have high quality from a data
analysis perspective. Before proceeding to final deployment of the model, it is important to
thoroughly evaluate it and review the steps executed to create it, to be certain the model
properly achieves the business objectives. A key objective is to determine if there is some
important business issue that has not been sufficiently considered. At the end of this phase,
a decision on the use of the data mining results should be reached.
1.6
Deployment
Creation of the model is generally not the end of the project. Even if the purpose of the
model is to increase knowledge of the data, the knowledge gained will need to be
organized and presented in a way that the users can use it. Depending on the requirements,
the deployment phase can be as simple as generating a report or as complex as
implementing a repeatable data mining process across the enterprise. In many cases, it is
the end user or the customer, who carries out the deployment steps so that he understands
up front what actions need to be carried out in order to actually make use of the created
models. This following application case will present in detail how data mining has been
used to control ammonia in coke oven gas at coke by product plant based on actual plant
data.
COKE BY PRODUCT PLANT PROCESS
The coke oven by-product plant is an integral part of the by-product coke making process.
In the process of converting coal into coke using the by-product coke oven, the volatile
matter in the coal is vaporized and driven off. This volatile matter leaves the coke oven
chambers as hot, raw coke oven gas. After leaving the coke oven chambers, the raw coke
oven gas is cooled which results in a liquid condensate stream and a gas stream. The
functions of the by-product plant are to take these two streams from the coke ovens, to
process them to recover by-product coal chemicals and to condition the gas so that it can
be used as a fuel gas.
Figure 2.0: Gas generated from coke oven battery
Historically, the by-product chemicals were of high value in agriculture and in the
chemical industry, and the profits made from their sale were often of greater importance
than the coke produced. Nowadays however most of these same products can be more
economically manufactured using other technologies such as those of the oil industry.
Therefore, with some exceptions depending on local economics, the main emphasis of a
modern coke by-product plant is to treat the coke oven gas sufficiently so that it can be
used as a clean, environmentally friendly fuel.
Because of the corrosive nature of ammonia, its removal is a priority in coke oven byproduct plants. Presence of moisture further aggravates the corrosive action of the gas. The
basic layout of a coke by product plant is shown in figure 2.0.
Figure 3.0: Schematic layout of by product plant
The hot coke oven gas generated from the batteries is passed through a unit called primary
cum deep cooling(PCDC).The purpose of passing through PCDC is to bring down the
temperature of the gas from 80°Cto 22°C in order to facilitate separation of impurities, like
tar, naphthalene and ammonical vapours,by means of condensation. The function of
exhauster operation is to suck the C.O gas from batteries and discharge it at a higher
pressure, so that it can smoothly flow through the downstream units. In pre scrubbing the
gas is scrubbed with flushing liquor to remove tar and coal fines loading in the gas as well
reduce the temperature. In ammonia scrubber, gas is scrubbed with circulating lean liquor
which absorbs ammonia from gas .This rich liquor is sent to ammonia still to remove
ammonia. The ammonia vapor released from ammonia still is then incinerated in ammonia
incinerator as shown in ammonia removal circuit in figure 4.0
Figure 4.0: Ammonia removal circuit
Therefore the key challenge in any coke by product plant is to control the level of
impurities like ammonia (NH3) and to minimize its effect as far as possible by better
process control. As a next step, we examine the scenarios where we might have to operate
with only three temperatures and still aim to minimize ammonia in clean coke oven gas.
For example say, we cannot control T (Average PCDC temp) and GT1 (Gas scrubber1
temperature) i.e. T + GT1 off. Similarly consider the following cases
Figure 5.0: Ammonia scrubbing
Case2 T+GT2,
Case3 T+GT3
Data preparation
This step involves identifying and selecting all relevant data that can be used for data
mining. For a more comprehensive analysis following key parameters were considered viz

Gas scrubber temperatures(GT1,GT2,GT3)

Gas temperature after PCDC(T)

Stripped liquor flow (m³/hr)

Stripped liquor Conc. (mg/100cc)

Stripped liquor Temp.(ºC)

Ammonia in clean C.O. gas(mg/Nm³/hr)
More than 2 years of data (FY13, FY15) have been used. Final subset of data was then
treated for missing values, outliers etc. Data preparation tasks were performed multiple
times and not in any prescribed order. Maximum amount of time and effort was spent at
this stage. For this analysis we considered a volume of 10000 actual data points of by
product plant.
Modeling
Data mining involves the execution of the various data mining algorithms against the
prepared data sets. Several (tens to hundreds) mining runs were completed for this data
mining project.
As mentioned at the beginning our objective was to find out the best operating conditions
where ammonia content in clean coke oven gas was lowest so that those conditions can be
replicated. As a customary first step we need to find out the principal components. In
principal component analysis few important factors are selected from the large number of
factors impacting the process. Once we have shortlisted the key parameters we can
concentrate on these vital few to get the desired effect. One of the ways to find the
principal components is to find the correlation matrix. Since the number of variables
considered are not many, instead of shortlisting the parameters on the basis of correlation
matrix all the parameters were considered for further analysis .From the correlation matrix
it is quite evident that ammonia is highly dependent on the gas scrubber temperatures i.e.
GT1, GT2 and GT3.Next multiple prediction algorithms were used to predict the best
operating range for ammonia but looking at the variation in the data, neural networks was
considered the most appropriate for this case. Figure 7.0 shows the output of the neural
networks prediction model.
Table 1.0: Correlation Matrix
Correlation analysis helps to identify the key parameters impacting NH3 in coke oven gas
Neural networks are very sophisticated modeling and prediction making techniques
capable of modeling extremely complex functions and data relationships. The sweeping
success of neural networks over almost every other statistical technique can be attributed
to power, versatility and ease of use Neural networks have a remarkable ability to derive
and extract meaning, rules and trend from complicated, noisy and imprecise data. They can
be used to extract patterns and detect trends that are governed by complicated
mathematical functions which are too difficult, if not impossible, to model using analytic
or parametric techniques. One of the abilities of neural networks is to accurately predict
data that were not part of the training dataset, a process known as generalization. Refer to
figure 6.0 for a better understanding of artificial neural networks (ANN).
(b) A more typical NN
(a)
Input Layers
X1
Output Nodes
Hidden
Layers
X2
Output
Layers
X3
(c) Combination function
X1
w1
w2
X2
Combination function + Transfer function
= Active function
X3
w3
Transfer function
Figure 6.0: Artificial neural network (ANN) mechanism
A neural network prediction is based on combination function as well transfer function.
For NH3< 40
GT2 should
lie between
29.5-30
Figure 7.0: Results of neural network output
After analyzing all the histograms the results were tabulated.[Refer to table2.0a,2.0b]
Table 2.0 a: Result summary for NH3 <40
.
Table 2.0 b: Prediction of operating range at different conditions (NH3<40)
The results had a very overwhelming response from the shop floor leading to revision of
the standard operating procedures (SOP).Thus it was put to use in daily management
practices of bye product plant showing steady decline in ammonia in clean coke oven gas
as against the internal MOU of 120 mg/Nm3/hr.
2012-13
2013-14
2014-15
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
200
150
100
50
0
MOU
ACTUAL
Monthly
Figure 8.0: Ammonia in clean C O gas
Results interpretation and discussion
From the correlation matrix it is quite evident that ammonia (NH3) has a strong positive
correlation with gas scrubbing temperatures viz GT1, GT2, GT3 and T where T is average
PCDC temperature.
After several mining runs the partisan with least average ammonia (NH3) was selected.
Then each parameter was analyzed in depth to predict the best operating range. The results
are shown in the form of a double histogram were the red bar superimposed over the
brown bar shows the distribution of the parameter for the partisan with least average
ammonia whereas the brown bars shows the distribution of the overall population of
data.IBM’s intelligent miner was used as the data mining tool for this analysis.
The results drawn from the data mining techniques were used to revise the standard
operating procedures (SOP) which in turn has helped to strengthen the daily management
practices in the plant.
CONCLUSIONS
The sweeping success of data mining over other statistical techniques can be attributed to
power, versatility and ease of use. Thus data mining can not only be used in the areas of
marketing and sales, fraud detection etc but also to gain meaningful insights in the area of
complex manufacturing processes like that of iron and steel making.
REFERENCES

Training manual for Coke by product plant, Tata Steel Ltd

Data mining techniques, Michael J. A Berry, Gordon Linoff, Wiley computer
publishing.

Intelligent Miner for data Application guide, Peter Cabena, Hyun Hee Choi, Il Soo
Kim, Shuichi Otsuka,Joerg Reinschmidt, Gary Saarenvirta,IBM redbooks
Download