- Kiange applied the back-propagation neural networks

advertisement
The Application of Panel Data Mining Based on Gene Schema in Predicting
Finance Distress
Feng-ning Ma1, Ji-ting Yang2, Shi-qiang Jiang3, Qin-yu Ren3
1College of Management and Economics, Tianjin University, Tianjin, China
2College of Management and Economics, Tianjin University, Tianjin, China
3College of Management and Economics, Tianjin University, Tianjin, China
(sophia-yangji@163.com)
Abstract - Various different methods has been applied in
the field of predicting finance distress, including statistical
analysis, neural network technologies, genetic algorithm,
logistic analysis etc. Although these classical methods have
good performance in the prediction of financial distress,
there still exist some other disadvantages. As financial data
should be panel ones, most investigations only focus on one
year’s financial data to interpret the underlying statistic
model, which hence may fail to characterize the business
failure tendency of ST companies. In comparison, panel data
combines cross-section data with time series data so that it
can provide researcher with a huge amount of data as well
as multi-dimension perspectives. By utilizing panel data
based on the binary gene expressions, this article aims at
constructing a dynamic prediction model which can explore
multiple years’ financial data. By resorting to the dynamic
thresholding techniques, the marginal value during
discretization can be properly derived by a relative floating
on the corresponding industry average value. Relying on the
discrete expression, the period gene can be identified from
the provided time binary sequence, which can be then
explored to recognize ST company. Numerical simulation
has demonstrated that our new method can significantly
improve the prediction accuracy of realistic financial data,
which is of great significance to both theoretical analysis and
realistic applications.
Keywords - Panel data, period gene schema, financial
distress
I. INTRODUCTION
During the last four decades, the issue of
financial distress prediction has been extensively
investigated, which has been evolved from the primary
statistical methods to the more appealing intelligent
techniques. Among the hot research topics, predicting
corporate failure has long being remained as an important
issue, since it affects the interested decision making
including stockholders, creditors, senior management and
auditors.
The icebreaking work on the prediction of financial
failure was initialized by the U.S. William Beaver [1] in
1966, in which a simple univariate analysis was adopted.
Then, in 1968 Edward Altman [2] employed a linear
discriminant analysis based on the multivariable model to
analyze the financial corporate failure of several
companies. Lately, in 1980 Ohlson [3] used the appealing
logit model to identify the significant 9 statistic variables
that have heavy impact on firm failures. In 1992, Tam and
Kiange [4] applied the back-propagation neural networks
(BPNN) to fanatical failures prediction, and concluded
that BPNN performed better than the other methods. From
then on, an important trend has been emerged which build
the sophisticated soft computing architectures or hybrid
intelligent strategies to the problem analysis [5]. Jie Sun [6]
proposed a decision tree model combining attributeoriented induction, information gain and decision tree for
financial distress prediction. Myoung-Jong [7] put forward
an ensemble with neural network in the bankruptcy
prediction field, which has been proved more accuracy.
Philippe du Jardin [8] improved the prediction accuracy of
neural network based model using a set of variables
selected with a criterion. Lili Sun [9] built naive Bayes
Bayesian network models for bankruptcy prediction using
operational guidance.
Plenty of techniques have been introduced to address
the problem of financial prediction in recent years.
Unfortunately, most of these investigations on prediction
can only rely on the static model, other than a more
efficient dynamic model. Thus, focusing on the shortrange periodic financial data, e.g. the historic data of the
(t-1)th year to construct the prediction state of the tth year,
which basically ignore the long-term historic data. In this
work we deal with the financial prediction as a dynamic
statistic model based on panel data which can characterize
the business failure tendency of financial distress
companies. Panel data combines time-series data and
cross-sectional data together and construct bi-dimension
data based on time and space. Since panel data contains
multitude records, it shows better performance in degree
of freedom than cross-sectional data. Generally speaking,
there are two ways in dealing with data mining: the one is
establishment of continuous regression function, such as
logit model and neural network, the other is to divide
samples into different classes, for example clustering
algorithm. Both theory and experiments have
demonstrated that the latter performance excellently in
certain cases, especially in dealing with high-dimension
data. Our research is devoted to develop a more efficient
complete discrete mode, in which each realistic instance is
treated as a discrete module and then the financial
prediction is conducted in a more efficient way.
II. OUR METHOD
In 1993, for the first time, Tichy and Sherman [10]
proposed the concept of “corporate DNA”. This theorem
believes that, each corporate has its unique genes as
human beings, and it is such an innate property
determines the fundamental stable pattern, development
tendency and variation. In this work, the financial
information of company is regarded as a special corporate
gene. Based on this new perspective, we may identify the
common gene corresponding to the critical schema of
financial distress, which is practically in sharp contrast to
the situations of good-runned financial, and hence
establish an efficient financial prediction model. In our
paper, single year’s financial information is converted in
to binary variables which will be regarded as the
individual gene according to the well-designed
conversation criterion. This investigation put forward a
novel concept of “period gene schema” which contains
several constantly years’ individual gene, and expresses a
company gene schema in a time series..
A. The principle of the schema
As the most popular coding scheme in genetic
algorithms, binary coding employs the binary set {0, 1} as
its coding notation set. That is, the gene expression of
each population can be viewed as a binary string.
Furthermore, we may add a redundant element "*", which
can be referred as to wildcard and practically can be used
as either the binary “0” or “1”. Thus, the above binary
notation set can be generalized to a ternary set, i.e., {0,1
, *}, on which the element string such as {0110 ,
**0110, 1110*01**} can be generated. The binary string
generated from the ternary set {0 ,1 , *}, which can
depict similar structures, is referred as to a pattern. For
example, the pattern *1* can describe the 4-elements
subset [010, 110, 111, 011]. Accordingly, for a binary
coding string with wildcard notation, there may totally
involve 3L patterns when the string length is L.
Based on this binary coding with wildcard, we may
construct the binary expression from the realistic
continuous data according to specific criterion. And then,
the combination in time dimension is performed..
B. Mapping rule
The first aroused question is that how we can obtain
the discrete gene expression from the provided continuous
financial data. With regard to the unsupervised
discretization, the widely adopted method may include the
equal-width and equal-frequency discretization. To be
specific, in equal-width method we calculate the fixed
width of each box with an equal width given a prescribed
boxes number. Assume the original continues region is
denoted by [a, b], then the derived equal-width sub-space
can be expressed into[a, a+(b-a)/N],[a+(b-a)/N, a+2(ba)/N],…[b-(b-a)/N, b]. In this research, an improved
equal-width discretization is presented. Instead of directly
dividing the fixed-width space for each box, during the
discretization a marginal value is adopted with which the
subfield can be obtained. This marginal value is denoted
with A, which is derived from average financial data for
each industry by increasing or decreasing it. The extent of
rise or fall will be decided in the following repeated
experiments. Then the two states can be determined
correspondingly through a comparison with A.
Specifically, the resulting state is set to “0” when kij is
smaller than A, while it is “1” if kij surpasses A. Thus,
we have:
R=0 while kij ∈ (-∞,A].
(1)
R=1 while kij ∈ [A,+∞).
(2)
C. Definition of binary period gene schema
ts: the sth time, (s=1…P)
Kj: the jth finance index, j=1…n
Kij: the jth finance index of the ith company
Rij: the gene schema of the jth finance index for the ith
company, is the mapping of Kij, f: Kijts --Rijts
Rijts∈
{0,1},the mapping rules are (1) and (2).
Xi={ Ri1t1…Ri1tp , Ri2t1…Ri2tp ,……,Rint1 …Rintp }
which is composed of the total 26 indexes pattern Rij
during the successive three years (i.e. 2004-2006).
In this investigation, P can be empirically set to 3,
which means the financial data of the past three years is
utilized. Taking the Kelon Electric Appliance Company
Limited for example, the total 26 financial indicators of
2002-2004 are compared with the marginal value which
derived from mean value of specific industry. When the
financial indicator is smaller than the corresponding
marginal value, we have R=0; and otherwise, we may set
R=1. As a consequence, the binary expression of
company gene of the Kelon between the year 2004 and
2006 can be derived which also can be regarded as
“period gene schema” of the Kelon Electric Appliance..
D. Building of prediction model
Based on the statistic technique, a prediction pattern
is extracted which can efficiently distinguish the potential
distressed company from those healthy ones. Due to the
fact that three years’ financial data are analyzed in this
paper, each of index has eight kind of gene schema
(000,001,010,011,100,101,110,111) We calculate the
percentage of each gene schema of each index in ST
sample and Non-ST sample respectively, which can be
regarded as the response ratio of each index gene schema.
We established the new prediction model by using this
developed response ratio which generally exhibits a high
percentage in distressed companies while usually shows a
low percentage in healthy companies. Then under
different marginal values, different predicting models can
be achieved. Depends on accuracy of each model, the one
which has the first-rate results would be the best
predicting model.
III. RESULTS
property
A. The selection of sample
Since the promulgation of company bankruptcy law,
in 1986 the listed company which has gone to bankruptcy
seems barely, thereby it is relatively difficult to construct
the more promising analysis samples. Alternatively, the
special treatment (ST) companies which be warned by
China Securities Regulatory Commission is widely
adopted in the most domestic existing investigations.
Hence, a similar strategy is used in our analysis in which
the ST companies can be thought of distressed ones while
those without any special treatment (Non-ST) can be
regarded as the healthy one. Besides, in order to eliminate
the effects coming from different industry, the industry
mean value is served as the critical value. Also, the ratio
index is adopted to minimize the impact from different
size. As a result, after getting rid of the companies with
data deficiency and data singularity, the total 460
companies are selected as the ST and Non-ST instances
with a main focus on 2006-2008. The whole sample is
divided into two subsamples as the test set and the
prediction set. The former contains 230 including 115 ST
and 115 Non-ST companies while the later is consisted of
230 including 115 ST and 115 Non-ST companies.
Considering the provided data of ST Company may
exhibit noticeable fluctuations, the financial data of the
years exactly before the firstly special treated are used.
For example, we may construct the financial schema of a
company, which has been ST firstly in 2006, by using the
earlier data from 2002 to 2004..
cash flowing
earning
development
operating
C. Prediction of model
TABLE II.
index
current ratio
B. The choice of financial index
From the most classical literatures, 3 financial
indexes have been highlighted by Beaver in [10].In 1968,
Altman employed 5 indexes in the so-called Z-score
model. And lately, in 1977 he extended the total number
of financial indexes to 7 in his improved model. Ohlson
employed 9 significant variables in the new logit model.
By combining these famous indexes which have been
adopted by most other investigations, in this work we may
use 26 fanatical indexes which can reflect the most
property of a company, such as the ability of short
repaying, long repaying, cash flowing, earning,
development and operating. Table 1 embodies the
indexes.
TABLE I.
property
ability of short
repaying
ability of long
repaying
index
The debt-equity ratio,
the stockholders' equity ratio,
Long Term Debt to Total Asset Ratio
the ratio of cash flow and liabilities,
cash ratio
net profit on sales,
net profit on total assets,
the ratio of net assets and net profit,
the ratio of Operating profits and Costs
and expenses,
the growth rate of fixed assets,
the growth rate of total assets,
the growth rate of net profit,
the ratio of operating ratio,
the ratio of management Expenses and
main business income,
financial ratio,
Fixed Asset Ratio
inventory turnover
Fixed asset turnover
current assets turnover
the assets turnover
the stockholders' equity turnover
financial
ratio
the ratio net
working
capital and
total assets
schema
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
TABLE OF SCHEMA RATIO
Ration of
ST
0.1724
0.0172
0.0086
0.0517
0.0431
0.0172
0.0172
0.8103
0.7586
0.0431
0.0086
0
0.0603
0.0086
0.0603
0.0517
0.431
0.0431
0.0086
0.0086
0.0603
0.0259
0.1034
0.3103
Ration of
Non-ST
0.0259
0.0172
0
0.0345
0.0345
0.0259
0.069
0.6379
0.5259
0.0431
0.0259
0.0345
0.0948
0.0172
0.0431
0.2069
0.2414
0.0172
0.0172
0.0259
0.0345
0.0172
0.0517
0.5862
Difference
between
ST and
Non-ST
0.1465
0
0.0086
0.0172
0.0086
-0.0087
-0.0518
0.1724
0.2327
0
-0.0173
-0.0345
-0.0345
-0.0086
0.0172
-0.1552
0.1896
0.0259
-0.0086
-0.0173
0.0258
0.0087
0.0517
-0.2759
TABLE OF INDEXES
index
current ratio,
quick ratio
the ratio net working capital and total
assets,
debt ratio,
the ratio of Long-term liabilities and net
working capital,
Table 2 has illustrated the response ratio of some ST
and Non-ST companies under each financial index when
marginal value is 15% up to the average one. Ration of ST
represents the percentage of each schema under different
indexes in ST samples, as the same, ration of Non-ST is
the percentage of each schema under different indexes for
Non-ST samples. Difference between ST and Non-ST
means the difference value between Ration of ST and
Ration of Non-ST for the same schema under same index.
Take “current ratio” for example, response ratio of schema
“000” is 17.24% for ST companies in comparison to
2.59% for Non-ST companies. Among the eight schema of
index “current ratio”, the schema “111” has the greatest
difference between ST and Non-ST samples, so we can
safely come to the conclusion that “111” is the best
schema of “current ratio” under 15% up marginal value.
Based on our repeatedly empirical experiments, we may
choice the marginal value rising 15% over the exactly
mean value, and choice the schemas with their ST ratio
being 15% larger than those of Non-ST. Table 3 expresses
different prediction on different marginal value according
to various extent to the change of very value of each
industry. The resulting significant schemas include the 111
pattern of financial expenses ratio, the 000 pattern of
current ratio and the 111 pattern of financial ratio, with
which the prediction accuracy can be improved to 86.09%
and 80% from the experiments. There also has significant
prediction for the combination of the 000 pattern of current
ratio and the 111 pattern of financial ratio, which can
achieved 75.65%.
TABLE III.
Extent to
the very
value
0%
up 15%
down 15%
up 20%
down 20%
prediction model, however, any other advantaged
techniques can be applied in, such as genetic algorithm.
Furthermore, the schema which contains “*” also could be
researched in the prediction model.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
TABLE OF ACCURACY
[7]
index
shcema
accuracy
current ratio
quick ratio
financial ratio
current ratio and
quick ratio
current ratio
financial ratio
current ratio and
financial ratio
current ratio
financial ratio
current ratio and
financial ratio
current ratio
financial ratio
current ratio and
financial ratio
current ratio
financial ratio
000
000
111
0.7826
0.7130
0.8067
000,000
0.6700
000
111
0.8609
0.8000
000,111
0.7565
000
111
0.7217
0.8435
000,111
0.6700
000
111
0.8522
0.8087
000,111
0.7304
000
111
0.6348
0.8170
IV. CONCLUSION
Traditional methods for constructing stable prediction
model often lie in single year data, which cannot embody
the trend before ST. According to schema theory of
genetic algorithm, this paper presents a panel data mining
method based on binary variables. We can see that the
prediction of the new method no less than classic ones,
meanwhile the principle is simple, it can be helpful to
providing qualified information for interest-related parts.
There still are some aspects could be improved. This
research uses statistic method when establish the
[8]
[9]
[10]
Beaver,W.H., “Financial ratios as predictors of failure,”
Journal of Accounting Research. Chicago, vol.
4(supplement.), pp. 71-111, 1966.
Altman E I. “Financial Ratios, Discriminant Analysis and
Prediction of Corporate Bankruptcy,” Journal of Finance.
Pennsylvania, vol. 9, pp. 589-609, 1968.
James A Ohlson, “Financial Ratios and the Probabilistic
Prediction of Bankruptcy,” Journal of Accounting Research.
Chicago, vol. 18, pp. 109-131. 1980.
Tam,K.Y., Kiang,M.Y., “Managerial applications of neural
networks:The case of bank failure predictions,”
Management Science. Pennsylvania, vol. 38, pp. 926-947.
1992.
P. Ravi Kumar, V. Ravi, “Bankruptcy prediction in banks
and firms via statistical and intelligent techniques – A
review,” European Journal of Operational Research.
Canterbury, vol. 180, pp. 1-28, 2007
Jie Sun, Hui Li, “Data mining method for listed
companies’financial distress prediction,” Knowledge-Based
Systems. Vol 21, pp. 1-5, 2008.
Myoung-Joung Kim, Dae-Ki Kang, “Ensemble with neural
networks for bankruptcy prediction,” Expert Systems with
Applications. vol. 37, pp. 3373-3379, 2010.
Philippe du Jardin, “Predicting bankruptcy using neural
networks and other classification methods: The influence of
variable selection techniques on model accuracy,”
Neurocomputing, vol. 73, pp. 2047-2060. 2010.
Lili Sun, Prakash P.Shenoy, “Using Bayesian networks for
bankruptcy
prediction:Some
methodological
issues,”.European Journal of Operational Research.
Canterbury, vol. 180, pp. 738-753, 2007.
Tichy. N.M., Sherman. S., “Control Your Destiny or
Someone Else Will,” Doubleday/Currency, New York,
NY.1993
Download