Uploaded by Hoang Pham

Final-Project-Report

advertisement
UEH – COLLEGE OF BUSINESS
FINANCE DEPARTMENT
DATA SCIENCE
FINAL - EXAM PROJECT
TOPIC: APPLICATION OF THE NEURAL NETWORK MODEL IN
FORCASTING FINANCIAL DISTRESS OF COMPANIES IN THE
MANUFACTURING AND WHOLESALE INDUSTRY IN 2022, AND 2023
BY USING ORANGE PROGRAM.
CLASS: 22C1INF50905912
INSTRUCTOR: Thái Kim Phụng
GROUP MEMBERS:
1. Đoàn Thị Kim Anh
MSSV: 31201022021
2. Phạm Công Hoàng
MSSV: 31201020324
3. Trần Minh Tuyết Mai
MSSV: 31201022425
4. Nguyễn Bích Ngọc
MSSV: 31201020646
5. Nguyễn Thu Trang
MSSV: 31201022820
HO CHI MINH CITY - 2022
GROUP REPORTS
SUBJECT: DATA SCIENCE
• Plagiarism check: 10%
● Group leader: Trần Minh Tuyết Mai
● Group members: 5
Student’s Name
MSSV
Participation (/100)
1. Đoàn Thị Kim Anh
31201022021
80
2. Phạm Công Hoàng
31201020324
100
3. Trần Minh Tuyết Mai
31201022425
100
4. Nguyễn Bích Ngọc
31201020646
60
5. Nguyễn Thu Trang
31201022820
70
I
ABSTRACT
According to General Statistics Office in 2021, GDP of Vietnam grew by 2.58%
compared to 2020, in which the manufacturing and wholesale industry has a large increase
of 6.37%, contributing 1.61 percentage points to GDP growth. However, due to the severity
of the COVID-19 pandemic, there was a large wave of firms experiencing financial stress,
resulting in bankruptcy. Among that, manufacturing and wholesale accounted for the largest
proportion around 36.9%. The main objective of this study aims to predict the financial
distress of manufacturing and wholesale firms in 2022, and 2023 by interpreting Altman’s
Z-score after using the suitable classification model that the Orange program selects.
Data were collected 627 observations from manufacturing and wholesale companies
listed on three stock exchanges (HNX, HOSE, and UPCOM) in 2021. This sample was
divided into 2 different datasets: training dataset (439 companies) and forecast dataset (188
companies). One dependent variables is Results (three values: Safe zone, Distress zone, and
Gray zone). Five independent variables are: Net working capital/Total assets (NWCTA),
Retained earnings/Total asset (RETA), Earnings before interest and taxes/Total assets
(EBITTA), Market value of equity/Book value of total liability (MVETD), and Sales/Total
assets (NRTA). Then, we removed extreme outliers, thereby training dataset has 415
observations. In all 4 classification methods (Tree, SVM, Neural Network, Logistic
Regression), model Neural Network is rated the highest in 5 indexes: AUC, CA, F1,
Precision, and Recall; and have the highest correct propotion of predicted value through
confusion matrix. Then, the team predicted the bankruptcy probability of the remaining 188
listed companies in two industries, and found that 72 listed companies do not go bankrupt,
59 companies are at risk of bankruptcy, and 57 companies are at high risk of bankruptcy,
with a true probability of 94% within 1 year (2022) and 74% within 2 years (2023).
This research can be used as a reference to have a specific view of financial stress in
listed manufacturing and wholesale comapanies in 2022, and 2023, giving not only
managers but also governments consider strategies and solutions. However, the model is
only true for Vietnam's listed firms in the manufacturing and wholesale industries in general
in 2021. Therefore, the group will continue to do more research for each specific industry
group and add other micro, and macro factors to give a more general model for each
particular industry group.
II
TABLE OF CONTENTS
ABSTRACT ............................................................................................................................ I
TABLE OF CONTENTS ......................................................................................................II
LIST OF TABLES ................................................................................................................ V
LIST OF FIGURES ............................................................................................................ VI
LIST OF ACRONYMS .................................................................................................... VIII
CHAPTER 1: INTRODUCTION .........................................................................................1
1.1. Reason for doing the topic ................................................................................................1
1.2. Objectives of the study ......................................................................................................4
1.3. Research questions ............................................................................................................5
1.4. Research subjects and scopes ............................................................................................5
1.5. Overall research methodology ..........................................................................................6
1.6. Practical meanings of the topic .........................................................................................6
1.7. Research layout .................................................................................................................7
CHAPTER 2: LITERATURE REVIEWS ...........................................................................8
2.1. Literature reviews about financial distress ........................................................................8
2.1.1. Definition of financial distress ...................................................................................8
2.1.2. Some causes of financial distress .............................................................................10
2.1.2.1. Financial factors ................................................................................................10
2.1.2.2. Nonfinancial factors ..........................................................................................15
2.1.2.3. Macroeconomic factors .....................................................................................16
2.1.3. Financial distress costs ............................................................................................17
2.2. Literature review about data mining ...............................................................................19
2.2.1. Definition of data mining .........................................................................................19
2.2.2. The key properties of data mining ............................................................................19
III
2.2.3. Data mining processing ...........................................................................................20
2.1.4. Data mining methods ...............................................................................................22
2.1.5. Data mining tool used in the study – Orange ..........................................................23
2.3. Literature review about data classification .....................................................................23
2.3.1. Definition of data classification ...............................................................................23
2.3.2. Data classification process ......................................................................................24
2.3.3. Data classification methods .....................................................................................25
2.3.3.1. Logistic Regression ...........................................................................................25
2.3.3.2. Support Vector Machine ...................................................................................26
2.3.3.3. Decision Tree ....................................................................................................28
2.3.3.4. Neutral Network ................................................................................................29
2.3.4. Methods to evaluate classification models ...............................................................32
2.3.4.1. Confusion Matrix, Accuracy, ROC, AUC, and Precision/Recall .....................32
2.3.4.2. Cross Validation: Holdout and K-fold cross validation ....................................36
2.4. Previous empirical evidences applying data mining in forecasting the financial distress
38
2.4.1. Empirical evidences with foreign research subjects ..............................................38
2.4.2. Empirical evidences with Vietnamese research subjects .......................................39
CHAPTER 3: METHODOLOGY ......................................................................................41
3.1. Research process .............................................................................................................41
3.2. Research model ...............................................................................................................41
3.3. Variable measurements ...................................................................................................42
3.3.1. Dependent variables: Z-scores.................................................................................42
3.3.2. Independent variables ..............................................................................................43
3.3.2.1. Net Working Capital/Total assets (NWCTA) ....................................................43
IV
3.3.2.2. Retained earnings/Total assets (RETA) ............................................................44
3.3.2.3. Net revenues/Total assets (NRTA) ....................................................................44
3.3.2.4. EBIT/Total assets (EBITTA) .............................................................................45
3.3.2.5. Equity-to-debt ratio (MVETD) ..........................................................................45
3.3.3. Summary of variable measurements ........................................................................46
3.4. Data collection methods, and descriptive statistics before preprocessing ......................47
CHAPTER 4: RESULTS .....................................................................................................51
4.1. Results of preprocessing data ..........................................................................................51
4.2. Descriptive stastistic after processing training dataset ....................................................54
4.3. Results of choosing, and evaluating the most suitable classification method ................58
4.4. Results of forecasting data by using Neural Network model ..........................................62
CHAPTER 5: DISCUSSIONS, LIMITATIONS, AND RECOMMENDATIONS ........67
5.1. Discussions ......................................................................................................................67
5.2. Recommendations ...........................................................................................................68
5.3. Limitations.......................................................................................................................69
5.4. Directions ........................................................................................................................70
REFERENCES ....................................................................................................................... I
APPENDIX 1: TRAINING TEST BEFORE PROCESSING DATA ...........................VII
APPENDIX 2: TRAINING TEST AFTER PROCESSING DATA .............................. XX
APPENDIX 3: TEST DATA BEFORE FORECASTING ...................................... XXXII
APPENDIX 4: TEST DATA AFTER FORECASTING BY USING NEURAL
NETWORK MODEL ............................................................................................... XXXVII
V
LIST OF TABLES
Table 2.1: Some common formula for calculating short-term solvency................................11
Table 2.2: Some common formula for calculating long-term solvency ................................12
Table 2.3: Some common formula for calculating assets management .................................13
Table 2.4: Some common formula for calculating profitability ............................................14
Table 2.5: Some formula for calculating market value ..........................................................15
Table 3.1: Interpretation of Z-score .......................................................................................43
Table 3.2: Five independent variables selected in this study .................................................46
Table 3.3: Descriptive statistics of quantitative variables before preprocessing ...................48
VI
LIST OF FIGURES
Figure 2.1: Outcomes of Financial Distress .............................................................................9
Figure 2.2: Determinants of financial distress .......................................................................10
Figure 2.3: Five main groups of financial factors ..................................................................11
Figure 2.4: Illustration of the data mining process ................................................................22
Figure 2.5: Illustration of building a classification model .....................................................24
Figure 2.6: Illustration of classifying new data and estimating the accurateness ..................25
Figure 2.7: Illustration of logistic regression .........................................................................26
Figure 2.8: Maximum-margin hyperplane and margins for an SVM trained with samples
from two classes .....................................................................................................................28
Figure 2.9: A simple decision tree with the tests on attributes X and Y ................................29
Figure 2.10: Simple neural network .......................................................................................31
Figure 2.11: Confusion matrix for binary classification ........................................................32
Figure 2.12: Outcomes of a confusion matrix ........................................................................33
Figure 2.13: Sensitivity and Specificity .................................................................................34
Figure 2.14: Area under the ROC Curve................................................................................36
Figure 2.15: Hold-out method ................................................................................................36
Figure 2.16: K-fold cross-validation ......................................................................................37
Figure 3.1: Overall framework of using data mining techniques for prediction of financial 41
Figure 3.2: Statistical results in training dataset results before preprocessing ......................50
Figure 4.1: Process of preprocessing data on Orange ............................................................51
Figure 4.2: Training dataset of 20 listed companies before processing .................................52
Figure 4.3: Distributions of our sample in five variables before preprocessing ....................53
Figure 4.4: Illustration of removing outliers ..........................................................................54
Figure 4.5: Training data of 20 listed companies after preprocessing ...................................54
VII
Figure 4.6: Distributions of our sample in five variables after preprocessing .......................56
Figure 4.7: Statistical results in training dataset results after preprocessing .........................58
Figure 4.8: Procedure for selecting and evaluating data classification methods ...................58
Figure 4.9: Describe the roles of the variables in the training dataset ...................................59
Figure 4.10: Result of the layered evaluation model by Cross Validation ............................60
Figure 4.11: Neural Network's Confusion Matrix..................................................................61
Figure 4.12: ROC analysis .....................................................................................................61
Figure 4.13: Forecasting dataset of 20 listed companies .......................................................63
Figure 4.14: Neural Network forecasting process..................................................................63
Figure 4.15: Describe the properties of the variables in the forecast dataset.........................64
Figure 4.16: Forecast results using Neural Network model ...................................................65
Figure 4.17: Statistical forecasting results by Neuron Network model .................................66
VIII
LIST OF ACRONYMS
No
Abbreviation
Explanation
1
ANN
Artificial Neural Network
2
AUC
Area Under the Curve
3
CA
4
CART
5
CPI
6
EBIT
7
EBITTA
8
FPR
False Positive Rate
9
GDP
Gross Domestic Product
10
HNX
Hanoi Stock Exchange
11
HOSE
Ho Chi Minh City Stock Exchange
12
KDD
Knowledge Discovery in Databases
13
MLP
Multi-Layer Perceptrons
14
MVETD
15
NRTA
16
NWCTA
17
RETA
Retained earnings/Total asset
18
ROC
Receiver Operating Characteristic
19
SMO
Sequential Minimal Optimization
20
SVM
Support Vector Machine
21
TPR
True Positive Rate
22
UPCoM
Correspondence Analysis
Classification and Regression Tree
Consumer Price Index
Earnings Before Interest and Tax
Earnings before interest and taxes/Total assets
Market value of equity/Book value of total liability
Sales/Total assets
Net working capital/Total assets
The Unlisted Public Company Market
1
CHAPTER 1: INTRODUCTION
1.1. Reason for doing the topic
2020 and 2021 witnessed stagnation in the whole world's economic development due
to the impact of COVID-19. GDP reached a strong growth rate of 7.02% in 2019. However,
GDP growth decreased significantly in 2020 and 2021 (2.91% and 2.58% respectively). In
particular, GDP in the second quarter of 2020 increased by only 0.36% compared to the
same period in 2019, and GDP in the third quarter of 2021 decreased by 6.02% compared to
the same period in 2020. By 2022, the economy entered a recovery phase. GDP in the first 6
months of 2022 increased by 6.42% over the same period in 2021. Especially, the average
CPI in the first 6 months of 2022 increased by 2.44% compared to the previous year,
showing an encouraging recovery in consumer demand.
However, inflation has become a significant problem due to 4 main factors: CPI
increases significantly due to the recovery of domestic demand; high energy prices while
traveling recovers; the prices of inputs increase sharply reflect in consumer goods prices,
and fiscal support causes the money supply to increase sharply in 2022. According to
Associate professor Ph.D.Nguyễn Bá Minh - Director of the Institute of Economics and
Finance, other reliable organizations and financial experts, they forecast inflation in 2022
will increase strongly compared to 2021 (about 4%), which is more than expected. Rising
inflation causes the SBV to tighten monetary policy, leading to higher interest rates and
businesses' risk of falling into financial distress due to the inability to produce.
In the financial sector, financial distress is the top concern of companies to maintain
their business operations. Financial distress is used to indicate a condition when promises to
creditors of a company are broken or honored with difficulty. If financial distress cannot be
relieved, it can lead to bankruptcy. Financial distress is usually associated with some costs
to the company; these are known as costs of financial distress. The term "financial distress"
of enterprises is mentioned by many researchers as a difficult period of enterprises arising
from the time before the enterprise declares bankruptcy (Altman and Hotchkiss, 2006; Li
Jiming and Du Weiwei, 2011; Tinoco and Nick Wilson, 2013). If an enterprise falls into a
state of financial distress, it falls into one of the following situations: its securities are put
under control, warned, securities are delisted, or the enterprise goes bankrupt. (Vietnam
2
Bankruptcy Law, 2014; Decree No. 58/2012/ND-CP of the Government). Financial distress
can cause lasting damage to one's creditworthiness, and is often a harbinger of bankruptcy.
For investors, creditors and managers, when a business goes bankrupt, the risks and losses
are not small. Therefore, financial analysts always put effort into detecting financial distress,
and signs of bankruptcy in a short time.
Financial situation of businesses in the manufacturing and wholesale industries have
been significantly badly affected by the COVID-19 pandemic in 2021. A series of serious
losses, such as production and commerce being disrupted, thousands of businesses
struggling due to the burden of large expenses, business activities of enterprises being
extremely volatile, low liquidity, and high bad debt,... According to the General Statistics
Office's announcement on the socio-economic situation in 2021, 43,200 enterprises
temporarily suspended business (an increase of 25.9%): 20,267 enterprises in the wholesale
and retail sectors, accounting for 13.8%; 6,558 enterprises in the processing and
manufacturing sectors, accounting for 11.9%. 48,127 enterprises were waiting for
dissolution procedures, which increased by 27.8% compared to 2020, accounting for the
majority: wholesale and retail (17,178 enterprises, accounting for 35.7%) and processing
and manufacturing (5,794 enterprises, accounting for 12.0%). The financial difficulty of
these enterprises will still continue or is even worse, especially in current unstable domestic
and foreign economy.
Despite being heavily affected by COVID-19 in 2020 and 2021, manufacturing and
wholesale are two industries that account for a large proportion and contribute greatly to the
positive growth of Vietnam's GDP (Annual report General Statistics of Vietnam in 2020,
and 2021). In 2020, the processing and manufacturing sectors played an important role in
the economy's growth with an increase of 5.82%, contributing 1.25 percentage points to
GDP growth; wholesale and retail increased by 5.53% compared to 2019. GDP in 2021
increased by 2.58%, and the processing and manufacturing sector continued to be the main
driving force with an increase of 6.37%, contributing 1.61 percentage points to GDP
growth.
It can be seen that manufacturing and wholesale are two important industries that
contribute significantly to GDP growth. However, listed companies in the manufacturing
3
and wholesale industries are facing financial distress, especially with the current unstable
domestic and foreign economy, which brings high risks of financial distress. So the team
decided to choose companies in these two industries as the research subjects.
Finding a way to detect the warning signs of bankruptcy is always one of the the
primary concern of market regulators, analysts and shareholders. There have been many
models built by researchers to assess and forecast firms’ financial distress based on
published financial information of enterprises. But, the Z-score model of Altman (1968) is
considered as the original, and most widely recognized model, using by both academia and
practice in the world. This model shows that the Z-index predicts the bankruptcy risk of
enterprises business within the next 2 years with a high level of confidence. Although
developed more than 40 years ago, Altman's model has remained highly accurate to this day
and is a popular tool among analysts when determining corporate health. In addition, more
than 20 countries around the world use this index with high reliability (Altman & Hotchkiss,
2006).
Quite a lot of empirical evidence on the Altman model in Vietnam is conducted to
analyze the financial distress of companies through the Z-scores index. According to the
study by Hoàng Thị Hồng Vân (2020), accuracy in predicting financial distress one year
before bankruptcy is 76,67%, and for 2 years before the bankruptcy is 70%, which are pretty
good predictive results. The results by Võ Văn Nhị and Hoàng Cẩm Trang (2013) shows
that there is a positive relationship between bankruptcy risk and earnings management of
listed companies in Vietnam through the Altman model. Lê Cao Hoàng Anh và Nguyễn Thu
Hằng (2012) retested Altman's Z-Score in predicting the failure of 293 companies listed on
HOSE, with the results show that the Z-Score correctly predicted 91% of the year before the
company went into financial distress, this rate fell to 72% within 2 years. Research by Trần
Việt Hải (2017) has studied Altman’s model to fraudulently identify financial statements of
companies listed on HOSE. Research results show that the model has classified companies
with fraud with an accuracy rate of 68.7%. In short, the forecast rate is quite high, showing
that the Z-Score is really a reliable indicator, which is suitable for the Vietnamese market.
Therefore, our group used this model to predict bankruptcy for Vietnamese enterprises listed
on three stock exchanges of both manufacturing and wholesale industries from 2022 to
4
2023.
In recent years, society is witnessing an explosion of information technology, which
has caused the data warehouse of management information systems to grow rapidly. That is
the premise for the birth of data mining techniques, making processing techniques smarter
and more efficient in collecting, storing, and analyzing data... to improve work productivity.
Numerous organizations, such as Johnson & Johnson, GE Capital, Fingerhut, Procter &
Gamble, Harrah's Casino, and so on, have acknowledged the value of data mining in
accounting and finance (Calderon, Thomas G., et al., 2003). Data mining has been named
one of the top ten technologies for the future by the American Institute of Chartered Public
Accountants and one of the four research priorities by the Institute of Internal Auditors
(Koh, 2004). The use of data mining techniques on financial data can aid in the decisionmaking process and help solve categorization and prediction issues. Corporate bankruptcy,
credit risk assessment, going concern reporting, financial distress, and corporate
performance forecast are common instances of financial categorization issues (Naveen,
2018)
In Vietnam, there have been many articles predicting bankruptcy risk of enterprises.
During the research period, we did not find any specific articles on this topic by listed
companies in the manufacturing and wholesale industries, especially with the instability of
current domestic and foreign economies. Therefore, on the basis of inheriting the
advantages of previous studies, this study will use data mining to add to the gap in the
prediction of bankruptcy risk of listed companies in the manufacturing and wholesale
industries in Vietnam in the years 2022 and 2023.
1.2. Objectives of the study
Originating from the instability of the economy, the importance of forecasting
financial stress, the popularity of Altman's model, and the lack of papers relating to data
mining techniques in Viet Nam’s corporate finance sector, especially predicting the
financial distress of manufacturing and wholesale firms, the study was conducted with the
following goals:
• Increasing the applicability of data mining by selecting the suitable model to
5
forecast the possibility of financial distress of listed companies in the
manufacturing and wholesale industries in Vietnam.
• Describing, and identifying the probalility of financial distress of enterprises
through the interpretation of Altman’s Z-score after using the suitable model that
Orange program selects.
• Giving some useful information for not only investors but also managers and
policy makers.
1.3. Research questions
In order to achieve the overall goal of the study, the research question posed is:
• How do internal factors affect the criteria for assessing the possibility of
bankruptcy of the company (Z-score) presented through descriptive statistics?
• Given the training data set, which suitable model provided by Orange software
should be used to predict financial distress with high level of confidence?
• By using the selected suitable model, how the likelihood of financial distress of
the companies are in 2022 and 2023?
• What policies and strategies can investors, policymakers and business managers
recommend to identify and minimize the possibility of financial distress at the
moment and the next business period?
1.4. Research subjects and scopes
a. Research subjects
The object of research in this topic focuses mainly on the prediction of financial
distress of listed companies in the manufacturing and wholesale industries in Vietnam in the
years 2022 and 2023.
b. Research scopes
Total dataset is collected from the financial statements of 624 companies in the
manufacturing and wholesale industries in Vietnam on three listed stock exchanges: HOSE,
HNX, and UPCOM. These sources of data were used by the authors in the audited
consolidated financial statements and annual reports of enterprises. In which:
6
• Training dataset includes 439 companies.
• Forecast dataset includes 188 companies.
The data was taken in 2021, when the COVID-19 epidemic is taking place and
gradually showing signs of being under control. However, the current domestic and foreign
economy are still complicated and tense.
1.5.
Overall research methodology
First, the team collected 627 observations from manufacturing and wholesale
companies listed on three stock exchanges in Vietnam (HNX, HOSE, and UPCOM) in
2021. Then, 627 companies were divided into 2 different datasets, in which there are 439
companies in the training dataset and 188 companies in the forecast dataset. The data is
secondary data and from cafef and vietstock. One dependent variables included in this
research is Results (which has three values: Safe zone, Distress zone, and Gray zone). Five
independent variables based on Altman (1968) are: Net working capital/Total assets
(NWCTA), Retained earnings/Total assets. (RETA), Earnings before interest and
taxes/Total assets (EBITTA), Market value of equity/Book value of total liability
(MVETD), and Sales/Total assets (NRTA).
The team then preprocessed the data by removing extreme outliers of each
independent variable. The result of the training dataset is only 415 observations. Next, the
team assigned the attributes to use for the variables, and useed four data classification
methods to learn the training dataset: logistic regression, decision tree induction, support
vector machines, and neural network. Then, the team evaluated 4 methods to choose the
most suitable model through 5 indicators (F1 – score, CA, Precision, Recall, and AUC), and
confusion matrix.
After finding the most effective classification method (specifically Neural Network
model), the team predicted the bankruptcy probability of the remaining 188 listed
companies in the two industries of manufacturing and wholesale.
1.6. Practical meanings of the topic
Due to the uncertainty of the economic situation in Vietnam and the world at the end
of 2022 with many complicated changes coming from the conflict between Russia and
Ukraine, and a high chance of an increase in interest rate by the Fed to control inflation after
7
the Covid-19 pandemic, our study has high practical meanings for investors, policymakers
and administrators because it:
• helps investors avoid investing in public companies that are in poor financial
condition or even a high chance of bankruptcy.
• gives policymakers more useful information in making decisions when they plan to
reasonably change regulations, monetary policy to reduce the chance of financial
distress in manufacturing and wholesale industries.
• provides managers some useful information about the current health status of their
company to promptly plan quick-responding strategies and consider rebuilding their
capital structure.
1.7. Research layout
The research is divided into 5 chapters:
Chapter 1: Introduction. The authors indicate the reason for choosing the topic,
objectives of the study, research questions, research subjects, scopes and overall research
methodology. In addition, the authors give practical meanings and research layout.
Chapter 2: Literature review. The authors give the current situation of enterprises in
manufacturing and wholesale industries in Vietnam in recent years. After classifying the
data according to the Z-score index to forecast the company's bankruptcy, the authors select
the appropriate model, provide the prediction results, and determine the financial distress of
the company.
Chapter 3: Research methodology. The authors present the research process, and
build research models. Besides, we explain our data collecting and processing methods.
Chapter 4: Results. The authors of this study will apply statistical mining to
overcome a discrepancy in the prediction of bankruptcy risk of listed businesses in the
manufacturing and wholesale industries in Vietnam in the years 2022 and 2023.
Chapter 5: Discussions and conclusions. The authors give conclusions,
recommendations, limitations,
and directions for further research. In addition,
recommendations are drawn from the research results to apply to real life for macro
policymakers, business managers, and
investors
in
Vietnam.
8
CHAPTER 2: LITERATURE REVIEWS
2.1.
Literature reviews about financial distress
2.1.1. Definition of financial distress
Firms that are having financial difficulties are said to be in financial distress. These
situations are most frequently described using the words "failure," "insolvency", "default",
and "bankruptcy".
Financial distress can be easily explained somewhat by relating it to insolvency, as
Black’s Law Dictionary defines: “Inability to pay one’s debts; lack of means of paying one’s
debts. Such a condition of assets and liabilities that the former made immediately available
would be insufficient to discharge the latter.”. According to “Corporate Finance, 10th
edition” by Ross, Westerfield, and Jaffe, this definition has two general themes: balancesheet and cash-flow. Balance-sheet insolvency occurs when a firm has a negative net worth,
or the value of assets is less than the value of its debts which means that the company does
not have enough assets to meet current obligations to lenders. Cash-flow insolvency occurs
when the current and long-term assets of the firm are enough to fulfill its debt obligations to
creditors but not the payment can not happen in liquid forms such as cash. This term also
means a situation where a firm lacks liquid assets on its hand to match the financial
requirement of creditors.
However, financial distress has a broader definition than bankruptcy, which aids
research in growing their sample sizes. In contrast, bankruptcy is a specific type of financial
difficulty, and studies on it tend to have smaller samples (Altas, 1993). There are four steps
in a business bankruptcy. The firm's financial status is incubating during Stage 1. The firm's
financial trouble, often known as financial embarrassment, is in Stage 2 and is known to the
management. In the 3rd stage of financial insolvency, the company lacks the resources to
meet its debt obligations. Stage 4 is where insolvency is finally proven. The firm's
bankruptcy is made official by a court determination, as its assets must be sold to pay
creditors (Poston et al., 1994). Therefore, financial distress is distinct from bankruptcy. This
happens when a company's business operations are not able to meet its financial obligations,
and its assets are becoming less liquid. Financial distress may occasionally be identified
before the business enters bankruptcy, which begins from stage 2. If an enterprise falls into a
state of financial distress, it falls into one of the following situations: its securities are put
9
under control, warned, securities are delisted, or the enterprise goes bankrupt. (Vietnam
Bankruptcy Law, 2014; Decree No. 58/2012/ND-CP of the Government).
However, financial distress does not always progress to bankruptcy. Figure 2.1
illustates how public firms may undergo different paths of financial distress and their final
destination could be private workout rather than being declared bankruptcy. Interestingly,
approximately half of the financial restructurings have been done via private workouts
(Wruck, 1990).
Figure 2.1: Outcomes of Financial Distress
Sources: Five empricial studies - Wruck (1990).
Some firms may actually benefit from financial distress by restructuring their assets.
For example, a levered recapitalization can change a firm’s behavior and force a firm to
dispose of unrelated businesses. A firm going through a levered recapitalization will add a
great deal of debt and, as a consequence, its cash flow may not be sufficient to cover
required payments, and it may be forced to sell its noncore businesses. For some firms,
financial distress may bring about new organizational forms and new operating strategies.
Financial distress can serve as a firm’s “early warning” system for trouble. Firms
with more debt will experience financial distress earlier than firms with less debt. However,
firms that experience financial distress earlier will have more time for private workouts and
10
reorganization. Firms with low leverage will experience financial distress later and, in many
instances, be forced to liquidate.
2.1.2. Some causes of financial distress
Financial management literature has divided the causes of a firm's financial distress
into two categories: internal and external factors. While the external elements are the
macroeconomic factors, the internal components are further separated into financial and
nonfinancial factors. Each of these elements has an impact on how the firm operates. As a
result, if they are not adequately handled, they may endanger the organization's ability to
continue existing.
Figure 2.2: Determinants of financial distress
Source: Fredrick Ikpesu, and et.al (2019).
2.1.2.1.
Financial factors
There is broad agreement that financial factors have been one of the major predictors
of financial distress (Turetsky & McEwen, 2001; Nahar, 2006; Chancharat, 2008; Honjo,
2010; Thim et al., 2011; Parker et al., 2011; Kristanti et al., 2016; Devji & Suprabha,
2016; Wesa & Otinga, 2018; Idrees & Qayyum, 2018). Failure by businesses to manage the
financial aspect typically results in businesses failing to fulfill their debt obligations by the
deadline and is a sign of financial distress.
Based on “Corporate Finance, 10th edition” by Ross, Westerfield, Jaffe, there are 5
groups of financial ratios, as shown figure below.
11
Figure 2.3: Five main groups of financial factors
Sources: Corporate Finance - Ross, Westerfield, Raffe
a.
Short-term solvency
The purpose of short-term solvency ratios, also known as liquidity measures, is to
reveal information about a firm's liquidity. The ability of the company to make short-term
bill payments without undue stress is the main concern. These ratios therefore concentrate
on current assets and current liabilities.
Table 2.1: Some common formula for calculating short-term solvency
No
Short-term
Formula
Meanings
solvency
1
Current ratio
Current assets/current This ratio measures the firm’s ability to pay
liabilities
its short-term obligations by converting all
its current assets into cash.
2
Quick (or Acid- (Current
Test) Ratio
assets
- This ratio looks deeply into the short-term
Inventory)/Current
liquidity of firm by subtracting inventories
liabilities
to current assets because they are often the
least liquid assets and some of the
inventories may later turn out to be
12
damaged, obsolete, or lost.
3
Cash Ratio
Cash/Current
This ratio shows a company's ability to
liabilities
cover its short-term obligations using only
cash and cash equivalents.
Sources: Corporate Finance - Ross, Westerfield, Raffe
According to Chow et al. (2011), a company is in financial distress when its
operating cash flows are insufficient to cover its present obligations with debtors, forcing
restructuring, mergers and acquisitions, the issuance of new capital, and renegotiation of the
loan arrangement. Other several well-known studies (Elloumi & Gueyee, 2001; Nahar,
2006; Thim et al., 2011; Wesa & Otinga, 2018) have confirmed that firms with low levels of
liquidity are more likely to experience financial distress because they are unable to pay their
recurring debts when due. These research have shown that one of the financial factors that
affect a firm's financial distress is liquidity.
b. Long-term solvency
Long-term solvency ratios are used to address the firm’s long-run ability to meet its
obligations or measure its financial leverage, which are regularly called financial leverage
ratios. When a company frequently uses debt to finance its operations, it may be more
susceptible to financial trouble, especially if it becomes challenging for the company to
satisfy ongoing obligations (Wesa & Otinga, 2018).
Table 2.2: Some common formula for calculating long-term solvency
No
Long-term
Formula
Meanings
solvency
1
2
Total Debt (Total
assets
-
Total The total debt ratio takes into account all
Ratio
equity)/Total assets
debts of all maturities to all creditors.
Time
EBIT/Interest
This ratio measures how well a company
interest
has its interest obligations covered, often
earned
called the interest coverage ratio.
13
3
Cash
(EBIT + Depreciation and It is a basic measure of the firm’s ability to
coverage
amortization)/Interest
generate cash from operations, frequently
used as a measure of cash flow available to
meet financial obligations.
Sources: Corporate Finance - Ross, Westerfield, Raffe
c. Assets management
The specific ratios that measure the efficiency with which firm uses its assets, also
used as measures of turnover. A firm needs to use its assets efficiently, or intensively, to
generate sales.
Table 2.3: Some common formula for calculating assets management
No
Assets
Formula
Meanings
management
1
Inventory
Cost
of
turnover
sold/Inventory
goods As long as firms are not running out of stock
and thereby forgoing sales, the higher this ratio
is, the more efficiently we are managing
inventory.
2
3
4
Days’ sale in 365 days/Inventory It figures out how long it took firms to turn
inventory
turnover
inventory over on average.
Receivables
Sales/Account
It calculate how fast firms collect on sales.
turnover
receivables
Days’ sale in 365
It can give insight into how a business
inventory
generates cash flow.
days/Receivables
turnover
5
Total
assets Sales/Total assets
turnover
It measures the efficiency of a company's
assets in generating revenue or sales.
Sources: Corporate Finance - Ross, Westerfield, Raffe
d. Profitability
14
Another financial factors that cause financial distress is profitability (Thim et al.,
2011; Baimwera & Murinki, 2014; Campbell et al., 2015). A company with low
profitability is usually weak at generating sufficient cash flows, which can cause the firm to
experience a low level of liquidity. This could make it more difficult for the company to
satisfy its obligations and expose it to a distressing situation.
Table 2.4: Some common formula for calculating profitability
No Profitability
1
2
Formula
Meanings
Profit
Net
A measure expresses the percentage of revenue that
Margin
income/Sales
the company keeps as profit.
EBITDA
EBITDA/Sales
EBITDA margin looks more directly at operating
Margin
cash flows, than does net income and does not
include the effect of capital structure or taxes.
3
Return
Assets
on Net
income/Total
It is a ratio that shows how much profit a company is
generating relative to the value of everything it owns.
assets
4
Return
Equity
on Net
It is an indicator of how the stockholders did over the
income/Total
course of the year. ROE is, in an accounting sense,
equity
the genuine bottom-line metric of success because it
is managers’ intention to benefit shareholders.
Sources: Corporate Finance - Ross, Westerfield, Raffe
e. Market value
Another financial factor that determines financial distress is the share price (Devji &
Suprabha, 2016; Idrees & Qayyum, 2018). Share price and financial distress have an
inverse relationship, according to numerous studies. A fall in a company's share price may
raise the likelihood that it would experience financial trouble. A consistent decline in an
organization's share price is a symptom of impending financial trouble.
15
Table 2.5: Some formula for calculating market value
No Market value
Formula
Meanings
1
Price–Earnings
Price per share/Earnings Higher PEs typically indicate that the
Ratio
per share
company has substantial chances for
future growth because they assess how
much investors are prepared to pay per
dollar of current earnings.
2
Market-to-Book Market
Ratio
value
per It reflects historical costs. If the value is
share/Book value per less than 1, it may indicate that the
share
company hasn't done a good job overall of
generating value for its owners.
3
4
Market
Price per share x Shares It refers to the total dollar market value of
Capitalization
outstanding
Enterprise
Market capitalization + It calculates the market value of the
Value
Market value of interest outstanding stock plus the market value of
a company's outstanding shares of stock.
bearing debt - Cash
the outstanding debt that is interest
bearing, less the cash on hand.
5
Enterprise
EV/EBITDA
Value Multiples
It takes into account a company's debt and
cash levels in addition to its stock price
and relates that value to the firm's cash
profitability.
Sources: Corporate Finance - Ross, Westerfield, Raffe
2.1.2.2. Nonfinancial factors
Financial distress among businesses may also be caused by non-financial issues (Dun
& Bradstreet, 1986). According to the paper, the non-financial factors include customer
cause, sales cause, experience cause, and disaster cause. The customer cause develops when
a company has cash flow issues and few regular clients. The sales cause is a result of a
16
company's location, low sales, inventory problems, and tough competition for its products,
which may result in low demand for the company's goods. An ineffective management
team, the board of directors' lack of participation, and subpar leadesrhip are to blame for the
experience cause. The disasters cause arises as sudden and unpredictable, burglary, strikes,
fire and sudden death of the owner.
2.1.2.3. Macroeconomic factors
According to Ikpesu, F., Vincent, O., & Dakare, O. (2020), the operations and
performance of businesses are also impacted by macroeconomic factors. If firms fail to
strategically identify and manage these factors, it can result in financial distress. The
macroeconomic factors are inflation, interest rate, exchange rate, instability in government
policy and political unrest.
A company's operations can be impacted by a nation's rate of inflation. When the
country's inflation rate is constant and low, the majority of enterprises are likely to perform
better. High inflation increases a company's cost of production overtime and reduces its
ability to compete in the global market when exporting goods, thus, reducing the net income
of firm and badly affect its profitability.
Based on Corporate Governance Models and Applications in Developing Economies,
an upturn in interest rates frequently works as a deterrent to investment because firms are
put off by high borrowing costs. They may reject potential projects that may not generate
positive cash flows in short term due to the interest rate. They tend to invest less in working
capital and fixed assets which have a detrimental influence on the profitability of the
company in long run. Additionally, the increase in interest rates severely hampers
companies' capacity to meet their borrowing obligations in terms of principal and interest
repayments.
Firms that depend on imported raw materials or technology to operate may be
negatively impacted by the exchange rate's unpredictability. The cost of production rises as
a result of import prices rising due to currency devaluation. When the high cost of
production keeps the company from breaking even, it suffers from limited liquidity, and
losses, and is unable to meet its contractual obligations on time.
17
Political unrest and instability of government policies are other macroeconomic
factors. Political unrest may impede business activities, which could have a negative impact
on the organization by endangering its long-term survival. Government laws are frequently
changed, which may have an impact on an organization's sales, distribution, supply chain,
reputation in the worldwide market, expansion plans, and decision-making process. As a
result, a company's ability to weather political turbulence and unpredictable governmental
policy may put it in financial peril.
2.1.3. Financial distress costs
Financial distress is very costly when conflicts of interest hinder sound decisions
about operations, investments, and financing. This incurs the costs of financial distress,
including many specific costs below:
•
Direct costs:
According to “Corporate Finance, 10th edition” by Ross, Westerfield, Jaffe, the
direct costs of financial distress are legal, and administrative costs of liquidation or
reorganization. During bankruptcy, with fees from hiring lawyers often in the hundreds of
dollars an hour, these costs can add up quickly. In addition, administrative and accounting
fees can substantially add to the total bill. And if a trial takes place, each side may hire a
number of witnesses to testify about the fairness of a proposed settlement. Their fees can
easily rival those of lawyers or accountants.
A number of academic studies have measured the direct costs of financial distress (J.
B. Warner, 1977; M. J. White, 1983; E. I. Altman, 1984; Lawrence A. Weiss, 1990; Stephen
J. Lubben, 2000; Arturo Bris, and et.al, 2006). Although large in absolute amount, these
costs are actually small as a percentage of firm value.
•
Indirect costs:
Bankruptcy hampers conduct with customers and suppliers. Sales are frequently lost
because of both fear of impaired service and loss of trust. Indirect costs of financial distress
may be the culprit. Unfortunately, although indirect costs seem to play a significant role
here; there is no easy way to have a decent quantitative method to estimate their effects.
•
Agency costs:
When a business is in trouble, in financial distress, both creditors and shareholders.
Both want the business to recover, but in other respects, their interests may conflict. They
18
tend to play their own "games" to ensure their interests. Agency costs is the conflict of
interests between bondholders and shareholders when the business encounters difficulties.
Stockholders employ three different types of self-serving tactics to harm bondholders
and benefit themselves, specifically:
• For selfish investment strategy 1 (Incentive to take large risks), the company is in
such bad shape that, should a recession strike, it will come dangerously close to
bankruptcy with one project and actually go into bankruptcy with the other. The
important thing to remember is that, in comparison to the low-risk project, the highrisk project boosts company value during a boom and depresses it during a downturn.
Thus, financial economists argue that stockholders expropriate value from the
bondholders by selecting high-risk projects.
• Selfish investment strategy 2 (Incentive toward underinvestment) show that
stockholders of a firm with a significant probability of bankruptcy often find that new
investment helps the bondholders at the stockholders’ expense. The simplest case
might be a real estate owner facing imminent bankruptcy.
• For selfish investment strategy 3 (Milking the property), an alternative method is to
pay out additional dividends or other distributions during difficult financial times,
leaving less money in the company for the bondholders.
These "games" will make the financial distress of the business more and more serious
and may lead to bankruptcy. It is worth noting that the costs associated with financial
distress are more severe for firms with many intangible assets. This is understood because
intangible assets associated with corporate health will lose value if the company falls into
bankruptcy. Bankruptcy/bankruptcy forecasting is important for these companies.
In the end, the expense of selfish investiment methods would be paid by the
stockholders. Bondholders cannot reasonably anticipate assistance from stockholders when
they are about to face financial hardship. They are likely to make investment decisions that
lower the bond's value. As a result, bondholders fortify themselves by increasing the interest
rate they demand on the bonds. The ultimate losers from self-serving tactics are the
stockholders because they must pay such high rates. Leverage ratios will be low for
businesses that deal with these distortions and debt.
19
• Economy costs:
Economic issues cause performance decline, failure, insolvency, and default by
affecting the economy as a whole. Although its liquidity is the primary cause of insolvency
and default, a reduction in performance and failure have an impact on the firm's
profitability. Due to the economic bailout packages, the government may have a national
budget deficit at a time of financial difficulty. The recently established policies must be
optimized at the same time to prevent financial distress from worsening and sending the
nation into further financial disaster.
2.2.
Literature review about data mining
2.2.1. Definition of data mining
Data mining is the process of sorting through large data sets to find patterns and
relations that can be used to solve business problems. Data mining is typically an interactive
and iterative discovery process, according to Mohammed J. Zaki and Limsoon Wong (2003).
This procedure aims to extract from big data sets patterns, associations, changes, anomalies,
and statistically significant structures. The outputs of the mining process should also be
reliable, original, practical, and clear. Thus, businesses can foresee future trends and make
more qualified business decisions thanks to data mining techniques and tools.
Data mining is a crucial component of data analytics and one of the fundamental
fields in data science, which makes use of modern and recently developed analytics methods
to discover valuable information in data sets. According to Koti Neha, and M Yogi Reddy
(2020), descriptive data mining tasks categorize features of data in a target data set based
on past or recent events. Data mining, at a more detailed level, is a step in the knowledge
discovery in databases (KDD) procedure, a data science approach for collecting, processing,
and evaluating data. Although they are often used interchangeably, data mining and KDD
are more frequently understood to be separate concepts.
2.2.2. The key properties of data mining
There are many important parameters in data mining, such as classification and
clustering rules. Referring to the research of Mehmed Kantardzic (2011), the key properties
of data mining are:
20
• Measurable Quality. With a smaller data set, the accuracy of approximations may be
properly assessed.
• Recognizable Quality. Before using any data-mining techniques, the quality of
approximations can be simply assessed during the data-reduction algorithm's run
time.
• Monotonicity. The algorithms are often repetitive, and neither the amount of time
spent running them nor the quality of the input data affects how well they perform.
• Consistency. A correlation exists between calculation time and the caliber of the
incoming data.
• Diminishing Returns. In the initial computation stages (iterations), the improvement
in the answer is significant, and it gets smaller as time goes on.
• Interrupt ability. Any time can be used to halt the algorithm and output some results.
• Preempt Ability. With little more work, the algorithm can be stopped and started
again.
Data mining can answer questions that cannot be addressed through simple query and
reporting techniques.
2.2.3. Data mining processing
According to “Discovering Knowledge in Data: An Introduction to Data Mining” by
Daniel T. Larose (2005) and “Data Mining: Concepts, Models, Methods and Algorithms”
by Mehmed Kantardzic (2011) the data mining process includes 5 steps as follows:
The first step is recognizing the problem and constructing the hypothesis. Finding the
cause of the problem is one of the main steps in data mining. Then, the application experts
use their knowledge and experience to develop hypotheses relating to those roots. This
process helps the experts to easily come up with meaningful problem statements. After that,
they usually identify a set of independent variables and dependent variables based on the
hypotheses and work with the modelers - data-mining experts to build the appropriate
model.
The second step is collecting the data. There are generally two options. The first
approach is referred to as a designed experiment that is managed by an expert (modeler).
The second choice is the observational approach, which is used when the expert is
21
powerless or has no power to change how the data are generated. Most samples in datamining applications are assumed to come from random data generation. This assumption is
necessary because it makes the final results more accurate and the data collection process is
completely objective to give additional evidence to support the ultimate outcomes. In
addition, it's important to confirm that the data used to estimate a model and that the data
used later to test and apply a model both come from the same sample distribution. If this is
not true, the estimated model cannot be applied correctly.
The third step is preprocessing or Cleaning the data. In this paper, we used a method
called detection and removal of outliers. Outliers frequently originate from measurement
errors, coding, and recording problems, and occasionally they are just naturally anomalous
results. Such unrepresentative samples have a significant impact on the final model. Such
unrepresentative samples have a significant impact on the final model. There are two
common ways to deal with outliers: Either create robust modeling methods that don't bother
about outliers or identify and remove them.
The fourth step is estimating the model. The primary job is to choose and put into
practice the best data-mining model in this phase which is not simple. Implementation is
typically based on several models, and choosing the best appropriate model is an extra
necessity.
The final step is interpreting the final results and concluding. Data mining models
usually play a crucial role in supporting decision-making. Thus, for such models to be
useful, they must be interpretable because it is unlikely that humans will make decisions
based on complex "black-box" models. There is some trade-off between the accuracy goals
of the model and the precision of its interpretation. Simple models are typically easier to
construct and understand, but they are also less precise. Complicated models generate
extremely precise findings. However, their outputs are normally unreadable. Therefore, the
issue of reading the final results coming from these models is treated as a separate job with
particular methods for validating the outcomes.
22
Figure 2.4: Illustration of the data mining process
(Source: Mehmed Kantardzic, 2011)
2.1.4. Data mining methods
There are many data mining methods. Classification is a technique in data science
used by data scientists to categorize data into a given number of classes. Festim Halili and
Avni Rustemi (2016) claim that applications for credit risk management and fraud detection
are especially well suited to this kind of study. This method usually uses classification
algorithms based on decision trees or neural networks. This technique can be performed on
structured or unstructured data and its main goal is to identify the category or class under
which new data will fall.
Regression is a probabilistic model that predicts discrete output values from a set of
input values. This technique is mentioned by Breiman et al. (1984), Steinberg and Colla
(1995), and Yohannes and Webb (1998). The main purpose of this regression method is to
explore and map data.
Third is clustering, which is the process of clustering unlabeled objects or data with
similar characteristics to make data description easier. According to Fan Cai (2016),
clustering also has its drawbacks, e.g. traditional clustering, as K-means clustering, can only
handle numerical attributes, and is weak at computing accurate behavior-response mapping
relationship since training is unsupervised and dropping targets.
23
Summarization is presentation generated data in a comprehensible and informative
manner. It is a carefully performed summary that will convey trends and patterns from the
dataset in a simplified manner. Accoring to Daniel T. Larose (2005), this method is used to
compare records after subsampling.
Other is dependency modeling, which is the method for finding and using the local
model
describing
the
dependencies
is
based
on
the
Dependency
modelling
approach. Change and Deviation Detection is also building a model that describes the most
significant changes in the data from previously measured or normative values.
2.1.5. Data mining tool used in the study – Orange
Orange software is known for integrating open-source data mining tools and is
programmed in Python with an intuitive interface and easy interaction. With many
functions, it can analyze data from simple to complex, create beautiful and interesting
graphics and also make data mining and machine learning easier for both internal and expert
users.
Orange is software aimed at automation. This is one handy data mining software,
easy to use thanks to the compact interface, and the toolboxes are arranged in a coherent and
reasonable way. Tools (widgets) provide basic functions such as reading data, displaying
tabular data, selecting data properties, training data for prediction, comparing machine
learning algorithms, visualization of data elements, etc. One of the simplest data mining
tools is Orange, according to Janez Demar and Bla Zupan (2012). It works on Windows,
Linux, and OS X. Numerous machine learning, preprocessing, and data visualization
methods are included in the basic installation. Therefore, Orange is the software that the
research team decided to use in the research paper.
2.3.
Literature review about data classification
2.3.1. Definition of data classification
Data classification is one of the main research directions of data mining. Data Mining
Classification is a method that divides the data point into various classes. Data mining
techniques, such as classification, allow discovering patterns, forecasting, knowledge
discovery, etc., in different business sectors to decide upon the future trends in business to
develop, claim Dimitrios Papakyriakou and Ioannis S. Barbounakis (2022). The quality of
24
the data can be altered by utilizing supervised learning techniques based on historical data.
2.3.2. Data classification process
Data Mining Classification includes 2 steps: Learning Phase and Classification
Phase.
According to Adelaja Oluwaseun Adebayo and Mani Shanker Chaubey (2019), one
of the main goals of the learning process is to develop models with high generalization
capabilities, such as models that reliably predict the class labels of previously unknown
information. The main focus of this phase of data mining classification is building the
classification model using various techniques available. For the model to learn in this step, a
training set is necessary. Using the target dataset as its basis, the trained model produces
correct results. When test data is incorporated into the model, the generated Classification
Model becomes more accurate.
Figure 2.5: Illustration of building a classification model
Source: Mohamed Osman Ali Hegazi, et al (2016)
Second stage is estimating the accuracy of the model and classifying the new data
(Classification). The main focus of this phase of data mining classification is building the
classification model using various techniques available. For the model to learn in this step, a
training set is necessary. The fundamental goal of classification algorithms, according to
Koti Neha and M Yogi Reddy (2020), is to forecast the target class by examining the training
dataset or to categorize the data into a predetermined number of classes.
25
Figure 2.6: Illustration of classifying new data and estimating the accurateness
Source: Mohamed Osman Ali Hegazi, et al (2016)
2.3.3. Data classification methods
Commonly used methods for data prediction include Logistic Regression, SVM
(Support Vector Machine), Decision Tree, Neural Network, etc.
2.3.3.1. Logistic Regression
To model continuous-value functions, linear regression is generally employed.
Theoretically, the linear regression technique for modeling categorical response variables
might be based on generalized regression models. Logistic regression is one popular kind of
generalized linear model. Logistic regression models the likelihood of an event occurring
using a set of independent variables. Rather than attempting to forecast the value of the
dependent variable, the logistic regression approach seeks to determine its likelihood. We
only use logistic regression when the model's output variable is a categorical binary.
However, there are no requirements for the type of data of the response variables, hence,
this kind of model supports a broader input data set.
A common method of statistical analysis that identifies the best linear logistic
regression model is called SimpleLogistic (Cox, 1958). It is similar to the LogitBoost
method with simple regression functions. This algorithm, which depends on the logistic
function, models the outcome's log odds rather than the actual result. SimpleLogistic
26
explains how one or more independent variables and the categorical dependent variable are
related.
The logistic regression model is used to predict categorical variables by one or more
continuous independent variables. Our dependent variable can be ordinal, discrete,... The
independent variable can be an interval or a scale, discrete... We can represent the formula
as follows:
𝑑
𝑧 = ∑ 𝑤𝑖 𝑥𝑖
𝑖=0
P(y) = sigmoid(z) =
1
1 + 𝑒 −𝑧
In which:
• d is the number of features of the data,
• w is the weight, which will be initialized at first, then adjusted to suit the weight
Since the outcome is a probability, the dependent variable is bounded between 0 and 1.
Therefore, if the result of the variable x giving the result of y exceeds 1 or the probability is
negative or less than 0, the interpretation of the logistic regression coefficient is
meaningless.
Figure 2.7: Illustration of logistic regression
(Source: Synthesis by the author)
2.3.3.2. Support Vector Machine
The foundations of SVMs were created by Vladimir and Alexey (1963) and are
becoming more and more popular because of their numerous appealing qualities and
promising empirical performance. The SRM principle is embodied in the statement.
27
Although SVMs were first created to address the classification challenge, they have lately
been expanded to address regression issues (for the prediction of continuous variables).
SVMs can be used to solve regression problems by introducing a different loss function that
incorporates a distance measure.
Vapnik et al. 1996 introduced an SVM variant that does regression rather than
classification. It is known as Support Vector Regression (SVR). Using a collection of
labeled training data, an SVM is a supervised learning method that generates learning
functions. It has a strong theoretical underpinning and only needs a minimum number of
samples to train; studies revealed that it is unaffected by the dimensions of the sample size.
The program begins by addressing the overarching issue of learning to distinguish between
individuals belonging to two classes represented by n-dimensional vectors The function
could be a generic regression or classification function (the output is binary).
SVM is effective in many classification and regression problems, it has been found.
Z. Erdem et al. (2005) discuss how the use of SVM ensembles has opened up new
possibilities for optical character recognition. When new training data sets are available in
batches, ensemble-based methods are additionally used with a dynamic weighting scheme
that is built for additional training models over the earlier ones, providing an incremental
approach (X. Yang et al., 2009). To understand concept drift, R. Elwel et al. (2011)
concentrate on ensemble methods.
The normal form of SVM takes input data, treats them as vectors in space, and
classifies them into two different classes by constructing a hyperplane in multidimensional
space as the interface between the data layers. The key idea is that the decision border
should be as far away from the data points of both classes as possible. Only one optimizes
the distance between it and the closest data point in each class to maximize the margin. The
margin is intuitively understood to be the space or gap between the two classes determined
by the hyperplane. The margin is the shortest distance between two data points that are
closest to a point on the hyperplane in terms of geometry.
SMO (Sequential Minimal Optimization) (John C. Platt, 1998) is an improved
training method for Support Vector Machines (SVM) that demonstrated good performance
across a variety of problems. However, the complexity of the training and implementation
28
processes meant that SVM's use was constrained. SMO is consequently subtly enhanced by
being conceptually straightforward, simple to implement, and generally faster than SVM.
Figure 2.8: Maximum-margin hyperplane and margins for an SVM trained with samples
from two classes
(Source: Synthesis by the author)
In figure 2.8, the blue and green points lying on the two boundary lines (black
dashed) are called support vectors, because they have the task of helping to find the
hyperplane (red line).
2.3.3.3. Decision Tree
Making a decision tree, a group of decision nodes connected by branches that extend
downward from the root node and finish in leaf nodes is one appealing classification
method. Attributes are assessed at the decision nodes, starting at the root node, which is
customarily positioned at the top of the decision tree diagram, with each conceivable
conclusion leading to a branch. There will not always be the simplest tree, but each branch
ultimately leads to either another decision node or a terminating leaf node. A decision tree
comprises nodes where attributes are evaluated. In a univariate tree, the test only utilizes
one of the characteristics to evaluate each internal node. All potential results of the test at a
node are represented by the incoming branches of that node. Figure 2.9 provides a
straightforward decision tree for the classification of samples using the two input qualities X
and Y.
29
Figure 2.9: A simple decision tree with the tests on attributes X and Y
(Source: Synthesis by the author)
A decision tree provides a powerful technique for the classification and prediction of
Diabetes diagnosis problems. Various decision tree algorithms are available to classify the
data, including ID3, C4.5, C5, J48, CART, and CHAID (Aiswarya Iyer et al., 2015).
Popular decision tree model C4.5 builds the branch with the largest information gain ratio
(Quinlan JR, 1993). Small changes to the dataset, however, will probably result in a
significant difference in the decision tree that is produced.
In terms of advantages, decision trees are easy to understand, do not require
normalization of data, can handle many different data types, and handle large amounts of
data in the fastest time. However, decision trees are difficult to process data in situations
where the data is affected by the time and cost to build decision tree models which are quite
expensive.
2.3.3.4. Neutral Network
The discovery that complicated learning systems in animal brains comprised
networks of closely interconnected neurons served as the inspiration for neural networks.
Although a given neuron may have a very straightforward structure, dense networks of
interconnected neurons are capable of carrying out challenging learning tasks like
classification and pattern recognition. At a very fundamental level, artificial neural networks
30
reflect an effort to mimic the nonlinear learning that takes place in natural neuronal
networks.
Another method that is frequently used to address data mining applications is the
Artificial Neural Network (ANN). A network of densely connected processing units, known
as a neural network, has a complex structure and exhibits some characteristics of a
biological brain network. Because of how neural networks are built, users have the option to
apply parallel concepts at various layer levels. Fault tolerance is another important ANN
feature. ANNs work best when there is a lot of noise and uncertainty in the information.
ANN is a method for processing information that significantly deviates from traditional
methods in that it uses training by example to solve problems rather than a set procedure.
(K. Anil Jain et al., 1996; George Cybenk et al., 1996). Based on the training methodology,
they may be split into two categories: supervised training and unsupervised training.
Unsupervised networks don't need the desired output for every input, whereas supervised
networks need the desired output for every input.
The back-propagation algorithm is the most widely used neural network algorithm.
As the most thoroughly studied and utilized neural network classifiers, the emphasis is on
feedforward multilayer networks or multilayer perceptrons. (R.P. Lippmann, 1989), even
though there are several ways to employ neural networks for classification.
Let us examine the simple neural network shown in Figure 2.10 Although most
networks have three levels—an input layer, a hidden layer, and an output layer—the neural
network is made up of two or more layers. Although most networks only have one hidden
layer, this is adequate for the majority of applications, there may be additional hidden
layers. The neural network is fully connected, which means that each node is linked to every
other node in the next layer but not to nodes in the same layer. The quantity and kind of
characteristics in the data collection often determine the number of input nodes. Both the
total number of hidden layers and the total number of nodes in each hidden layer are userconfigurable. Depending on the specific classification job at hand, more than one node
could be present in the output layer.
31
Figure 2.10: Simple neural network
(Source: Synthesis by the author)
Utilizing neural networks has many benefits, including the fact that they are
extremely resilient to noisy input. These uninformative (or even erroneous) instances in the
data set can be ignored by the network since it has multiple nodes (artificial neurons), with
weights given to each link. In contrast to decision trees, which generate rules that are simple
to comprehend for non-experts, neural networks are more difficult for humans to interpret.
Additionally, training periods for neural networks are often longer than those for decision
trees, going up to several hours frequently.
Complex classification problems can be handled by multi-layer perceptrons (MLP)
(Shrivastava et al., 2011; Hagan MT et al., 1996). However, MLP's drawbacks are obvious:
No prior knowledge of the ideal hidden layer size. A too-small setting will result in a very
weak network that could overgeneralize. A setting that is too large will result in very slow
training and many hyperplanes that may coincide after training; if not, the problem is overfitting.
There are several sectors where neural networks may be applied, including finance,
trading, business analysis, business planning, corporate risk management, etc. Additionally,
Neural Networks are used in several other industries, including business risk assessment,
and weather forecasting,... Neural Network is also widely used in technology and other
applications such as video games, speech recognition, social network filtering, automatic
translation, and medical diagnostics. Alternatively, there are several application cases for
neural networks that analyze transactions using historical data and identify better trading
opportunities.
32
2.3.4. Methods to evaluate classification models
Accuracy, speed, scalability, interpretability, and robustness are typically used as
comparison criteria when comparing different algorithms. (Sossi Alaoui et al., 2017). The
confusion matrix is a method that shows the anticipated and actual categorization, according
to Provost and Kohavi (1998). This leads to many standards, including Precision, Recall,
and F-Measure, being introduced.
2.3.4.1. Confusion Matrix, Accuracy, ROC, AUC, and Precision/Recall
The confusion matrix is a matrix that shows how many data points belong to a
particular class, and which class is predicted to fall. The confusion matrix is k x k in size,
where k is the quantity of data layers.
Assume that class A is positive and class B is negative. The following are the
important terms in the confusion matrix:
Figure 2.11: Confusion matrix for binary classification
(Source: Synthesis by the author)
•
True positive (TP): Classifying predictions of positive outcomes as positive.
•
False positive (FP): Classifying predictions of negative outcomes as positive.
•
False negative (FN): Classifying predictions of positive outcomes as negative.
•
True negative (TN): Classifying predictions of negative outcomes as negative.
33
Figure 2.12: Outcomes of a confusion matrix
(Source: Synthesis by the author)
The term "type I error" is widely used to describe false positives. Oftentimes, type II
errors are used to describe false negatives.
Precision and recall are calculated using a confusion matrix. Precision and recall
measures extend classification accuracy and provide a more detailed insight into model
evaluation. Which one we prefer depends on the work and our goals.
When the forecast is accurate, precision measures how reliable our model is.
Precision is focused on making accurate predictions. It demonstrates how many
prognoses have turned out to be accurate.
Recall measures how accurately our model foretells positive classes.
Actual positive classifications are the center of recollection. It reflects how many of
the positive classifications the model accurately predicts.
The F - score is an additional statistic that combines recall and accuracy into a single
number. The F - score is calculated as the weighted average of accuracy and recall.
34
Because it considers both false positives and false negatives, the F - score is a more
relevant metric than accuracy for situations with the unequal class distribution. The best F score value is 1 and the poorest is 0.
In addition, we also have the formula for Accuracy as follows:
The fraction of properly categorized samples in the overall data set is referred to as
accuracy. Accuracy just shows us the proportion of data that is correctly classified, but it
does not tell us how each class is classified, which classes are most accurately classified, or
which classes data is typically classified in.
The true positive rate (TPR), also known as sensitivity, is the same as recall. As a
result, it calculates the fraction of the positive class that is accurately anticipated to be
positive.
Specificity is comparable to sensitivity, except it is only concerned with the negative
class. It calculates the percentage of the negative class that is accurately predicted to be
negative.
Figure 2.13: Sensitivity and Specificity
(Source: Synthesis by the author)
35
The ROC curve and AOC (area under the curve) measures are best explained using a
logistic regression example. The likelihood of a sample being positive is calculated using
logistic regression. Then, to discriminate between positive and negative classes, we define a
threshold value for this probability. The sample is categorized as positive if the probability
exceeds the threshold value. As a result, variable threshold values cause certain samples to
be categorized differently, affecting accuracy and recall scores.
The ROC curve describes the model's performance at various threshold levels by
merging confusion matrices at all threshold values. The ROC curve's X-axis represents the
true positive rate (sensitivity), while the Y-axis represents the false positive rate (1specificity).
𝑇𝑃𝑅(𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦) =
𝑇𝑃
𝑇𝑃+𝐹𝑁
𝐹𝑃𝑅(1 − 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦) = 1 −
𝑇𝑁
𝑇𝑁+𝐹𝑃
=
𝐹𝑃
𝑇𝑁+𝐹𝑃
The model predicts that all samples will be positive when the threshold is 0. TPR
(sensitivity) in this instance is 1. However, since no negative prediction was made, FPR(1specificity) is also 1. TPR and FPR both become 0 when the threshold is set to one. As a
result, setting the threshold to 0 or 1 is not a wise choice.
Our goal is to decrease the false positive rate (FPR) while increasing the true positive
rate (TPR) (FPR). The ROC curve shows that when TPR increases, FPR also increases.
How many false positives can we tolerate, then?
We may use a different statistic rather than aiming to find the optimum ROC curve
threshold value, known as AUC (Area under the curve). The area under the ROC curve
between (0,0) and (1,1) is determined using integral calculus. AUC essentially aggregates
the model's performance overall threshold levels.AUC values can be as high as 1, which
denotes a flawless classifier. The classifier performs better when the AUC is larger.
Classifier A outperforms classifier B in the following figure.
36
Figure 2.14: Area under the ROC Curve
(Source: Melo, F. (2013))
2.3.4.2. Cross Validation: Holdout and K-fold cross validation
The hold-out method divides the original data set into 2 independent sets according
to a certain ratio. For example, the training set accounts for 70%, the testing set accounts for
30%.
This method is suitable for small data sets. However, the samples may not be
representative of the entire data (missing classes in the test set). It can be improved by using
a sampling method so that each class is evenly distributed in both training and evaluation
datasets. Or random sampling: do holdout k times and accuracy acc(M) = average of k exact
values.
Figure 2.15: Hold-out method
37
(Source: Synthesis by the author)
About K-fold cross-validation, this method splits data into k subsets of the same size
(called folds). One of the folds is used as the evaluation dataset and the rest is used as the
training set. The process is repeated until all the folds have been used as the evaluation
dataset.
Figure 2.16: K-fold cross-validation
(Source: Synthesis by the author)
The k-fold method is more often used because the model will be trained and
evaluated on many different pieces of data. Thereby increasing the reliability of the
evaluation measures of the model.
38
The hold-out method usually gives good performance on large data sets. However, in
small or moderate data sets, the effectiveness of the model using this method is highly
dependent on the division as well as the data division ratio.
2.4.
Previous empirical evidences applying data mining in forecasting the financial
distress
Numerous sectors of research use data mining. The prediction of business bankruptcy
is a well-known topic in research related to finance. Investors aim to lower credit risk and
maintain a strategic distance from unsuccessful investments (Wilson and Sharda 1994). As a
result, this subject has been investigated by a variety of authors in the past. The financial
markets are significantly impacted by predictions of bankruptcy.
2.4.1. Empirical evidences with foreign research subjects
Numerous foreign papers on the use of data mining in the financial sector have
been published worldwide studies in order to improve the performance of enterprise
financial distress evaluation.
The research paper "A data mining approach to the prediction of corporate failure"
(2001) by Feng Yu Lin and Sally McClean, studied the financial data of 1,113 UK
companies from 1890 to 1999. The authors use four single classifiers - discriminant
analysis, logistic regression, neural networks, and decision trees - each based on two
feature selection methods for predicting corporate failure. The analysis's conclusion is
that the hybrid method performs better when predicting company collapse a year in
advance.
As shown by Mahmoud Mousavi Shiri, Mahnaz Ahangary, Seyed Hesam Vaghfi,
and Abolfazl Kholousi's research "Corporate Bankruptcy Prediction using Data Mining
Techniques: Evidence from Iran" (2012), the study sample consists of 144 companies
listed in Tehran stock exchange from 2005 to 2009, implemented various data mining
algorithms such as neural networks, logistic regression, svm Bayesnet and decision trees
for years t, t-1, t-2 were compared [T] year for the distress companies is bankruptcy year
and for the non-bankruptcy, company is placed in the sample. However, the CART
algorithm is more effective in Iran at predicting collapse and non-bankrupt firms, with
39
an average accuracy of 94.93% over a three-year period.
The study "Comparison of Support Vector Machine and Back Propagation Neural
Network in Evaluating the Enterprise Financial Distress" (2010) by Ming-Chang Lee
and Chang, constructed an enterprise financial analysis methodology based on a support
vector machine and Back Propagation neural. They concluded that Support Vector
Machine provides higher precision and lower error rates than back propagation neural,
even though the difference between the performance measurements is slight.
"Comparison Of Wavelet Network And Logistic Regression In Predicting
Enterprise Financial Distress" (2015) by Ming-Chang Lee and Li-Er Su, reviewed the
Wavelet neural network structure, Wavelet network model training algorithm, Accuracy
rate and error rate (accuracy of classification, Type I error, and Type II error). The major
research opportunity is a potential model for predicting business failure (wavelet
network model and logistic regression model). The result reveals that this wavelet
network model is highly accurate, and that it improves the logistic regression model in
terms of Type I error and Type II error as well as overall prediction accuracy.
According to Efstathios Kirkos and Yannis Manolopoulos' essay "Data Mining In
Finance And Accounting: A Review Of Current Research Trends" (2015), the purpose of
this study is to classify the most popular method in financial documents using data
mining. The sources of selected research papers come from reputable journals of four
publishers: Elsevier, Emerald, Kluwer, and Wiley. The conclusion is that most analyses
seem to favor the neural network model.
2.4.2. Empirical evidences with Vietnamese research subjects
Data mining also appears in some financial research publications in Vietnam. These
findings imply that the chance of financial distress at Vietnamese enterprises decreases by
increasing financial liquidity, asset productivity, solvency, and profitability.
Example of "Khai Phá Dữ Liệu Trên Nền Oracle Và Ứng Dụng" (2014) by Nguyễn
Thị Minh Lý, who researched the issue of commercial bank classification, the thesis put up
a model to address this issue research on the Naive Bayes approach, Support Vector
40
Machine method, and decision tree method. The collected experimental findings
demonstrate that the suggested decision tree-based model has the highest accuracy.
According to the study "Ứng dụng Data Mining dự báo kiệt quệ tài chính ở các Công
ty Dược niêm yết tại Việt Nam" (2016) by Hồ Thị Thanh Thảo, using the data mining
method to identify early signs of financial distress decline in profits and show which
financial metrics are most effective in forecasting financial distress. The research subjects
are Vietnamese joint stock companies that were listed on Ho Chi Minh City Stock Exchange
and Hanoi Stock Exchange from 2011 - 2015. Algorithms used include a decision tree,
neural network, and Support Vector Machine to determine if these algorithms predict
financial distress well and which one is the best. Based on the research results, all three
methods have accurate forecasts, but the decision tree method gives the most exact
responses.
As shown by Binh Pham Vo Ninh, Trung Do Thanh, and Duc Vo Hong's research
"Financial distress and bankruptcy prediction: An appropriate model for listed firms in
Vietnam" (2018), this study used data from 800 listed companies from 10 different
industries that were traded on the Ho Chi Minh Stock Exchange (HOSE) and the Hanoi
Stock Exchange (HNX) between 2003 and 2016. In a complete model that takes into
account the following crucial elements of business financial distress: accounting factors,
market factors, and two macroeconomic indicators, logistic regression is used. Additionally,
alternative models for default prediction are compared using the AUC. The empirical
findings of this study show how accounting, market, and macroeconomic factors affect the
likelihood of financial hardship in Vietnamese firms over the course of the study period.
However, a thorough model indicates that accounting concerns seem to have a greater
influence than market variables.
41
CHAPTER 3: METHODOLOGY
3.1. Research process
Figure 3.1 illustrates that there are two classes of factors that cause firms to
experience financial distress. The first class is internal factors which consists of two sub
factors namely financial factors and non-financial factors, the second one is external class
which includes macroeconomics variables.
In this paper, we use some of the financial factors with the application of data-mining
techniques, such as classification, outlier detection, prediction and visualization as well as
its algorithms namely Neutral Network, Logistic Regression, Support Vector Machine and
Decision Tree to find out the most appropriate model for prediction of financial distress.
Figure 3.1: Overall framework of using data mining techniques for prediction of financial
distress
Source: Synthesis by the author
3.2. Research model
42
In this research, we build the predictive model of financial distress probability based
on the research of Dr. Edward I. Altman (1986). Edward Altman, a professor at New York
University, has employed multiple discriminant analysis (MDA) to distinguish between
bankrupt and non-bankrupt firms based on a set of predesignated financial variables. Altman
demonstrates that a year before to bankruptcy, the financial characteristics of bankrupt and
nonbankrupt companies are considerably different.
Original Altman Z-Score (O Z-score) quantitative model proposed by Altman in
1968 is frequently used to assess the probability of bankruptcy of a manufacturing and
wholesale company listed on the stock exchange in next 2 years. The probability of
correctness of this model is 94% within 1 year, and 74% within 2 years. Altman’s estimated
discriminate function is:
𝑍 = 0.12𝑋1 + .014𝑋2 + .033𝑋3 + .006𝑋4 + .999𝑋5
The fundamental premise of the Z-score methodology is these various financial
measures:
X1 : Net working capital/Total assets.
X2: Retained earnings/Total assets.
X3: Earnings before interest and taxes/Total assets.
X4: Market value of equity/Book value of total liabilities.
X5: Sales/Total assets.
Z: Probability of financial distress of a manufacturing and wholesale company listed on
the stock exchange.
In which:
• Z-score > 2.99: Safe Zone - Business is not in danger of bankruptcy.
• 1.81 < Z-score < 2.99: Grey Zone - Business is in danger of bankruptcy.
• Z-score < 1.81: Distress Zone - Business is at high risk of bankruptcy.
3.3. Variable measurements
3.3.1. Dependent variables: Z-scores
We first calculate the Z – score for each firm using the Altman model (1968) with the
43
given data and then use his intepretation for the score to convert the set of those numbers
into qualitative data which is input of the variable “Results”. This approach is suitable for
calculating the because the results has strictly reviewed be other academic papers such as
Nguyen Phuc Canh and Vu Xuan Hung (2014), Hoang Thi Hong Van (2020), Rahman et al
(2021). The Altman Z-score intepretation can be shown in the table below:
Table 3.1: Interpretation of Z-score
The interval
Z-score > 2.99
Value
Interpretation
Safe Zone
Business is not in danger of bankruptcy
1.81 < Z-score < 2.99 Grey Zone
Business is in danger of bankruptcy
Z-score < 1.81
Business is at high risk of bankruptcy
Distress Zone
Source: Altman model (1968)
3.3.2. Independent variables
3.3.2.1. Net Working Capital/Total assets (NWCTA)
Net working capital is equal to the difference between the firm’s current assets and
its current liabilities. The numerator represents the company's financial capacity to pay its
short-term obligations or the short-term liquidity of the firm. By dividing the net working
capital by the total assets, we can determine the firm's net liquid assets in relation to total
capitalization as well as eliminate the difference in size among the firms. According to
Altman (1968), this liquidity ratio was the most useful and showed more statistical
significance both on a univariate and multivariate basis. This ratio is the strongest indicator
of eventual discontinuance, which is consistent with the inclusion of this statistic.
Calculation formula:
𝑋1 =
𝑁𝑒𝑡 𝑊𝑜𝑟𝑘𝑖𝑛𝑔 𝐶𝑎𝑝𝑖𝑡𝑎𝑙
𝑇𝑜𝑡𝑎𝑙 𝑎𝑠𝑠𝑒𝑡𝑠
X1 has significant and a positive relationship in the Z-score model. In the articles of
Nguyễn Xuân Hùng (2014), Trung Do Thanh, Binh Pham Vo Ninh, and Duc Vo Hong
(2018), and Hoàng Thị Hồng Vân (2020), all of them prove that X1 is a stable variable and
qualified for included in the model.
44
3.3.2.2. Retained earnings/Total assets (RETA)
The ability of the corporation to accumulate earnings utilizing its total assets is
gauged by the retained earnings to total assets ratio. This indicator shows long-term
cumulative profitability, which reflects the extent of the company's leverage and eliminate
the difference in the amount of total assets among the firms. In addition, as we mentioned in
the pecking order theory, retained earnings is also the first back-up source of fund of the
firm to fulfill its obligations with debtors when the company cannot generate any profits or
even has a negative operating cash flow from its business activities. As claimed by Altman
(1986), this ratio implicitly takes into account the age of a company because the young firm
has not had time to build up its cumulative profits. Calculation formula:
𝑋2 =
𝑅𝑒𝑡𝑎𝑖𝑛𝑒𝑑 𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠
𝑇𝑜𝑡𝑎𝑙 𝑎𝑠𝑠𝑒𝑡𝑠
In the Z-score model, X2 exhibits a strong and positive relationship. All of the
authors' articles — Nguyen Xuan Hung (2014), Trung Do Thanh, Binh Pham Vo Ninh, and
Duc Vo Hong (2018), and Hoang Thi Hong Van (2020) — demonstrate that X2 is a stable
variable and suitable for inclusion in the model.
3.3.2.3. Net revenues/Total assets (NRTA)
A common financial ratio that shows how well a company's assets can generate
income is the capital-turnover ratio. It shows how well management can handle challenging
market situations. In reality, it wouldn't have shown up at all based on the statistical
significance metric. According to Altman (1986), this ratio ranks second in its contribution
to the overall discriminating capacity of the model because of its special link to other
variables in the model. Calculation formula:
𝑋3 =
𝑁𝑒𝑡 𝑅𝑒𝑣𝑒𝑛𝑢𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑎𝑠𝑠𝑒𝑡𝑠
A number of research papers have included X3 in the model and demonstrated
a positive relationship such as those of Nguyen Xuan Hung (2014), Arif Darmawan and
Joko Supriyanto (2018), Hoang Thi Hong Van (2020). Thus, this variable is significant and
stable enough to be included in the model.
45
3.3.2.4. EBIT/Total assets (EBITTA)
By dividing its earnings before interest and tax (EBIT) by a company's total assets,
this ratio is regarded as a sign of a company's ability to make operating profits by utilizing
its assets effectively which eliminates the tax and interest factor. The higher the ratio, the
better the firm can generate sufficiently cash flow to pay for the debtors and the
government. According to Altman (1986), this ratio is especially suitable for research on
corporate failure because a firm's ability to produce money is what ultimately determines
whether it will remain in business. Calculation formula:
𝑋4 =
𝐸𝐵𝐼𝑇
𝑇𝑜𝑡𝑎𝑙 𝑎𝑠𝑠𝑒𝑡𝑠
Alman’s results show that 𝑋4 is significant at the level of 0.001. The research of
Shashikanta Baisag, Dr. PramodKumar Patjoshi (2020) shows that EBIT to total assets
have positive effect on financial distress. However, according to Binh Pham Vo Ninh, Trung
Do Thanh and Duc Vo Hong (2017), 𝑋4 is reported to be statistically significant at 1–10%
with a negative correlation with default probability, and has the largest impact on financial
distress in a logistic regression. Because of its consistency and significance, this measure
can be used to predict financial distress.
3.3.2.5. Equity-to-debt ratio (MVETD)
The Equity-to-debt ratio evaluates the company's overall debt in relation to the
capital the owners initially put up and the profits that have been held through time. A very
low debt-to-equity ratio can be a sign that the company is very mature and has accumulated
a lot of money over the years. Altman (2005) used this metric to quantify default in
emerging markets, and the greater the ratio, the lower the default likelihood. Calculation
formula:
𝑋5 =
𝑀𝑎𝑟𝑘𝑒𝑡 𝑣𝑎𝑙𝑢𝑒 𝑒𝑞𝑢𝑖𝑡𝑦
𝐵𝑜𝑜𝑘 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑑𝑒𝑏𝑡
When using various modifications of the original model to analyze the performance
of the Z-Score model for firms from 31 European and three non-European countries,
Edward I. Altman, Magorzata Iwanicz-Drozdowska, Erkki K. Laitinena, and Arto Suvas
(2016) found that for each model, the coefficient of Equity-to-debt ratio is very close to
46
zero, indicating a minor effect on the logit. However, Equity-to-debt is significant at 0.001
level in Altman (1986) and Binh Pham Vo Ninh, Trung Do Thanh and Duc Vo Hong (2017)
researchs. Therefore, this variable have enough significance and reliability to be used in
financial distress forecasting.
3.3.3. Summary of variable measurements
In this study, table 3.2 below summarizes variable measurements about five
independent factors affecting probability of financial distress of listed manufacturing and
wholesales in Vietnam in 2022, and 2023:
Table 3.2: Five independent variables selected in this study
No
Category
Code
Formula
Paper
Dependent variable
Probability of Z-score
Altman (1968), Nguyen Phuc Canh and
financial
Vu Xuan Hung (2014), Hoang Thi Hong
distress
Van (2020), Rahman et al (2021)
Independent variables
X1 Liquidity
NWCTA Net
Working Edward
I.Altman
(1968);
James
Capital/Total
A.Ohlson (1980); Ming Xu and Chu
assets
Zhang (2008); Ben ChinFook Yap,
David Gun-Fie Yong and Wai-Ching
Poon (2010); Nguyen Xuan Hung
(2014), Trung Do Thanh, Binh Pham Vo
Ninh, and Duc Vo Hong (2018), and
Hoang Thi Hong Van (2020)
X2 Leverage
RETA
Retained
Edward I.Altman (1968); Nguyen Xuan
earnings/Total
Hung (2014), Trung Do Thanh, Binh
assets
Pham Vo Ninh, and Duc Vo Hong
(2018), and Hoang Thi Hong Van
47
(2020)
X3 Turnover
NRTA
Net
Edward I.Altman (1968); Nguyen Xuan
revenues/Total
Hung (2014), Arif Darmawan and Joko
assets
Supriyanto (2018), Hoang Thi Hong
Van (2020)
X4 Profitability
EBITTA EBIT/Total
Edward I.Altman (1968); Shashikanta
assets
Baisag,
Dr.
PramodKumar
Patjoshi
(2020); Binh Pham Vo Ninh, Trung Do
Thanh and Duc Vo Hong (2017)
X5 Market
Valuation
MVETD Market
value Edward I.Altman (1968); Edward I.
equity/Book
value
debt
of
Altman,
Magorzata
Iwanicz-
total Drozdowska, Erkki K. Laitinena, and
Arto Suvas (2016); Binh Pham Vo Ninh,
Trung Do Thanh and Duc Vo Hong
Source:Synthesis by the author.
3.4. Data collection methods, and descriptive statistics before preprocessing
We collected one sample of 627 listed companies, which is then divided into 2 parts:
training dataset (439 companies) and forecast dataset (188 companies). In our study,
training dataset includes 5 factors to assess the company's likelihood of bankruptcy through
Z-scores of 439 manufacturing and wholesale companies listed on 3 Vietnamese stock
exchanges (HOSE, HNX, UPCom), which were taken from audited consolidated financial
reports in 2021.
Training dataset of 439 listed companies in the manufacturing and wholesale
industries (Appendix 1) includes 5 independent variables: (Current assets - Current
Liabilities)/Total Liabilities; Retained Earnings/Total Assets; Earnings Before Interest and
Taxes/ Total Assets; Book Value of Equity/Total Liabilities; and Net Revenue/Total Assets.
With the proposed model, data including 439 observations are extracted from the
financial statements of 439 manufacturing, and wholesale companies in 2021 on HOSE,
48
UPCoM, and HNX. Because the financial statements of listed companies have been
reviewed and publicly disclosed on the mass media according to the regulations of the State
Securities Commission and the Stock Exchange, the reliability of the data is high. From the
collected data, the authors conduct descriptive statistics for the variables including mean,
median, maximum, minimum, and standard deviation, whose results are presented in Table
3.3 respectively
Table 3.3: Descriptive statistics of quantitative variables before preprocessing
. summarize NWCTA RETA EBITTA MVETD NRTA
Variable
Obs
Mean
NWCTA
RETA
EBITTA
MVETD
NRTA
439
439
439
439
439
.1467537
.0458438
.1196785
2.714288
1.340145
Std. Dev.
.8232309
.0871474
.2200635
6.783536
1.198184
Min
Max
-9.837
-.5752
-.3732
-.9192
.0011
4.1191
.3105
3.718
75.9577
10.7641
(Source: Authors summarize the results on STATA14 software.)
To measure liquidity, authors use variable “Working captial/Total Assets” (NWCTA)
which has an average value at 0.147. The highest value of NWCTA in the study is 4.1191
and the lowest value is -9.837. It shows that businesses tend to invest current assets at a low
level, or they have high current liabilties. For instance, in 2021 Mien Trung Petroleum
Construction Joint Stock Company (PXM) has current liability are 10 times larger than
current assets, which is difficult to pay short-term debts on time. This lead to ultimate
discontinuance of this company over 10 years, and awaiting implementation of the
restructuring policy towards holding company.
Retained
Earnings/Total
Assets
(RETA)
represents
average
cummulative
profitability, which has an average value at 0.0459. The highest value of RETA in the study
is -0.5752 and the lowest value is 0.3105. There are two interpretation for this variable.
First, it implicty shows that young businesses are high chance of having lower this ratio than
older firm due to the shortage of time in building its cummulative profits. Quốc Tế Holding
JSC (around 10-year operation) experienced revenue dropped sharply and did business
below cost, resulting in a gross loss of nearly 4 billion dong in quarter 3 2021 due to many
49
disputes arising from their main business of real estate. Secondly, the lower the ratio is, the
more ineffective in managing operating activities, such as G SaiGon Education JSC
(founded in 1950) currently experiencing negative equity of nearly 31 billion dong due to
the company unilaterally terminating labor contracts with more than 550 employees in 2017.
To measure the true earning power of the firm’s assets, the authors use the variable
Earnings Before Interest and Taxes/Total Assets (EBITTA) has an average value of 0.1197,
and voliation is only 22%, indicating that every dollar of assets the manufacturing, and
wholesale company invests in, it commonly returns 11.97 cents in EBIT per year. These
companies are not fully utilize their economic resources, but try to keep in a moderate level
to generate profits. However, Safoco Foodstuff Joint Stock Company (SAF) is one of foodmanufacturing companiers that truly has productive, and efficient management in utilizing
their total assets, resulting in cummulative 9 months profit nearly 40 billion in 2022.
Market Value of Equity/Book Value of Total Debt (MVETD) represents capital
structure, which has an average value at 2.714. The highest value is 75.957, and the lowest
value is -0.919, indicating companies commonly used preferred, and common stocks more
than current, and long-term debt to raise capital. In addition, the standard deviation of this
ratio is 6.784, indicating that significant volatility compared to the mean value. This can be
explained by some companies, such as Sara Vietnam JSC has equity 76 times larger than
total debt. The other reason that this ratio is too excessive is explained specifically in part
4.2 of this study.
To measure turnover, the authors use the Sales/Total Assets (NRTA) has an average
value of 1.34, indicating that every dollar of assets the manufacturing, and wholesale
company invests in total assets, it commonly returns 1.34 dollars in net revenues per year. In
our sample, companies that are in safe zone has high, and wide range value of this ratio due
to the efffective management in dealing with other competive conditions.
50
Figure 3.2: Statistical results in training dataset results before preprocessing
Source: Results from Orange program
Figure 3.2 above offers statistics results in training dataset results before
preprocessing. In our sample, manufacturing, and wholesale enterprises that are not likely to
have financial distress in 2022, and 2023 accounts for the most, specifically:
• 207 companies in safe zone. Of three results, these companies have not only the
highest positive value, but also the most volatility in four variables: NWCTA,
EBITTA, MVETD, and NRTA. These values have significant positive skewness.
• 105 companies in distress zone. These companies in this zone has the least violatitly
in five variables, and these values are not normal distribution.
• 127 companies in grey zone. Contrary to safe zone, these companies have the highest
negative value, and high volatility in two variables: NWCTA, and RETA. These
value are not following normal distribution.
Overall, each variable in our training dataset has considerate extreme outliers, and
follow non-Gaussian distribution, meaning that the latter prediction of financial distress may
not have high level of confidence. Therefore, our group preprocess the training dataset by
imputing missing data value, and eliminating extreme outliers before training, and
forecasting.
51
CHAPTER 4: RESULTS
4.1. Results of preprocessing data
In this part, our team give some pratical, and specific explanations why we
preprocess training dataset. Then, we preprocess training test by imputing mising values,
and eliminating extreme outliers.
First, team opened the excel file training test on the Orange program, then observed,
visualized, and preprocessed the data, as below:
Figure 4.1: Process of preprocessing data on Orange
Source: Group’s results run from Orange software
There is no missing data in our sample. The following figure 4.2 shows the list of the
first 20 listed companies in the manufacturing and wholesale sectors in the training dataset:
52
Figure 4.2: Training dataset of 20 listed companies before processing
Source: Group’s results run from Orange software
However, given the distributions of our sample over the histogram, each variable
contains some outliers that lie abnormal distances from other values, as shown in figure 4.3.
This greatly affects the prediction of the company's financial distress. There are two main
practical reasons why the prediction is wrong if outliers are not removed:
(1) These companies are subsidiaries, just holding business activities and currently
implementing the restructuring policy of the parent company (dissolution, merger,
bankruptcy...).
(2) The companies have switched to other business lines, except for the two main
industries that the group wants to study (manufacturing, and wholesale).
For example, Mien Trung Petroleum Construction Joint Stock Company (PXM) has
accumulated losses for 10 consecutive years and negative equity for 9 consecutive years.
Although the company has very high probability of bankruptcy, the company is only limited
to trading on UPCom, and from 2012 to 2021, the business only operates almost in
moderation, trying to maintain the personnel apparatus, and awaiting implementation of the
restructuring policy towards bankruptcy, dissolution, merger of the Vietnam Oil and Gas
Construction Joint Stock Corporation.
53
Figure 4.3: Distributions of our sample in five variables before preprocessing
Source: Group’s results run from Orange software
Therefore, team recognized, and removed outliers using a single-layer SVM methods
with non-linear kernels (RBF) on the Orange program, for two reasons: (1) training the
dataset is not multidimensional (5 features < 439 observations), and (2) all 5 variables have
a non-Gaussian distribution (Figure 4.3).
54
Figure 4.4: Illustration of removing outliers
Source: Group’s results run from Orange software
After processing the missing data value, and removing outliers, the team received a
dataset of 415 observations (24 observations removed) that can be used to train the training
test to give better prediction results, as shown as figure 4.5. Finally, the team saves the
processed training test data.
Figure 4.5: Training data of 20 listed companies after preprocessing
Source: Results from Orange program
4.2. Descriptive stastistic after processing training dataset
After eliminating 24 extreme outliers, our training dataset includes 415 observations
coming from manufacturing, and wholesale companies in 2021 on HOSE, UPCoM, and
HNX. From the preprocessed data, the authors conduct descriptive statistics for the
55
variables, whose results are presented in Table 4.6 respectively:
Table 4.6: Descriptive statistics of quantitative variables before preprocessing
. summarize NWCTA RETA EBITTA MVETD NRTA
Variable
Obs
Mean
NWCTA
RETA
EBITTA
MVETD
NRTA
439
439
439
439
439
.1467537
.0458438
.1196785
2.714288
1.340145
Std. Dev.
.8232309
.0871474
.2200635
6.783536
1.198184
Min
Max
-9.837
-.5752
-.3732
-.9192
.0011
4.1191
.3105
3.718
75.9577
10.7641
(Source: Authors summarize the results on STATA14 software.)
It is clear from the above table and table 3.3 that standard deviation of five variables
all decreased, meaning that we finally eliminate extreme outliers. Additionally, the mean
value of EBITA, MVETD, and NRTA decrease slightly, while there was an insignificant
climb in mean value of NWCTA, and RETA.
In table 4.6, the mean of MVETD reduced dramatically by 0.600, and there was a
gradual drop of 3.680 to 3.129 in its standard deviation. This variable experienced the most
significant decrease of both mean and standard deviation, compared to other variables.
There was a litte decrease of around 0.010 in mean value of EBITTA, and NRTA, while the
standard deviation decrease slightly to 0.098 and 0.904 respectively. The mean value of
NWCTA, and RETA increased slightly to 0.216, and 0.050 respectively, whereas two
variables saw a slight drop in their standard deviation.
To illustrate more specific distributions of our processed data over the
histogram (figure 4.6), each variable now contains less outliers that lie abnormal
distances from other values. Although these five variables are still not follow correct
normal distributions, this dataset can partly eliminate statistical errors, assumptions’
violations, and make the models consistent, hence improving the accuracy prediction
of the company's financial distress.
56
Figure 4.6: Distributions of our sample in five variables after preprocessing
Source: Group’s results run from Orange software
Another technique that we used to evaluate, and choose the final variable profile is to
determine
the
interaction
between
them
in
the
function.
57
Simple observation of the descriptive statistics, and discriminant coefficients from past
empirical
studies
are
not
enough,
and
misleading
since the actual variable measurement units are not all equivalent.
From table 4.8 below, the authors found that the correlation coefficients are all less
than 0.7000, and differ from 0 (the highest correlation coefficient is 0.6506, and the lowest
is -0.2384). Therefore, multicollinearity is not a problem for any given pair of independent
variables. Therefore, the study can use this models with all five independent variables to
learn and choose the most suitable classification methods, and hence forecast.
Table 4.8: Correlations of quantitative variables after preprocessing
(Source: Authors summarize the results on STATA14 software.)
According to Cochran (1977), this study showed that the majority of correlations
between variables in previous studies were positive and that negative correlations are more
beneficial than positive correlations in adding new information to the function. Interestingly,
table 4.8 shows that NRTA has the most negative correlations with MVETD. This means
that the lower MVETD is (due to cummulative operating losses), companies may have high
NRTA. This can be explaned if MVETD is too high, that firm can experience poor credit
risk, hence they have to issue more share to gain sufficient capital to generate sales.
Therefore, including NRTA in this model is approriate due to its corrlations to other
variables, although this variable is proven insignificantly in many past studies.
Figure 4.7 above offers statistics results in training dataset results aftter
preprocessing. In our sample, manufacturing, and wholesale enterprises that are not likely to
have financial distress in 2022, and 2023 still accounts for the most, specifically: 194
companies in safe zone (eliminating 13 obs), 95 companies in distress zone (eliminating 10
58
obs), 126 companies in grey zone (eliminating 1 obs). The more postive values are in all
five variables, the less likelihood of financial distress that manufacturing and wholesale
companiers may experience.
Figure 4.7: Statistical results in training dataset results after preprocessing
Source: Group’s results run from Orange program
4.3. Results of choosing, and evaluating the most suitable classification method
After data preprocessing, with the training dataset of 415 listed companies of the
manufacturing and wholesale industries on the three exchanges HOSE, HNX, and UPCoM,
our group selected the most appropriate data classification method through some evaluation
results and confusion matrix, as figure 4.8 follows:
Figure 4.8: Procedure for selecting and evaluating data classification methods
59
Source: Results from Orange program
First, we used Orange software to input the training dataset. After input the training
dataset, we will started declaring the role of each variables in the training dataset, as
follows:
• Independent variables NWCTA, RETA, EBITTA, MVETD, and NRTA are declared as
"feature".
• Dependent variable Results are declared "target". Results are divided into 3 results:
Safe Zone, Distress Zone, and Gray Zone.
• Variable Code does not participate in the training process, and is a categorical, so it is
declared "meta".
• Variable No does not participate in the training process but it is numeric, so it must
be set "skip".
Figure 4.9: Describe the roles of the variables in the training dataset
Source: Results from Orange program
After declaring the properties of the variables as figure above, the team continued to
the Test and Score section to see an overview of the indicators and choose the most suitable
60
model for the study. The team used the Cross Validation evaluation method with Number of
fold of 5 (k = 5) to avoid duplication between the test sets because the model is trained and
evaluated on many different pieces of data, thereby increasing the reliability for the
evaluation measures of the model.
Figure 4.10: Result of the layered evaluation model by Cross Validation
Source: Results from Orange program
In all 4 classification methods that the team chose to test (Tree, SVM, Neural
Network, Logistic Regression), model Neural Network is rated the highest in 5 indexes:
AUC, CA, F1, Precision, and Recall. In particular, this model correctly predicts 93.2% for
each class, and the AUC value is 98.9%, showing that this model is more effective than the
rest. The team continued to further evaluate this model through the confusion matrix
method, as figure below.
61
Figure 4.11: Neural Network's Confusion Matrix
Source: Results from Orange program
Through the above figure, the neural network model predicts that 97 companies are
in the distress zone (at hight risk of bankruptcy), but only accurately predicts 91.8% of the
actual value. In addition, there are 122 companies that are in the grey zone (in danger of
bankruptcy), but are misclassified 4 companies. Finally, the model predicts 196 companies
are in the safe zone (not in danger of bankruptcy), in which the accuracy of this zone is up
to 95.9%, and only 2 companies are misclassified.
Figure 4.12: ROC analysis
62
Source: Results from Orange program
Figure 4.12 shows that the AUC for the Neural Network ROC curves are higher than
that for the other classfication methods’ curves. Therefore, Neueruron Network model had a
better job of correctly classifying the positive classes in the dataset. From that, the team can
conclude that the Neural Network model is quite effective, and suitable for the training
dataset, so it is quite suitable for predicting the bankruptcy of the remaining companies,
through the forecastin dataset.
4.4. Results of forecasting data by using Neural Network model
In this part, we continued to forecaste, and evaluated the other 188 listed companies
in the same industries by using Neural Network model. Figure 4.13 shows forecast data of
20 listed companies as:
63
Figure 4.13: Forecasting dataset of 20 listed companies
Source: Results from Orange program
Based on the above procedure of training, and evaluation dataset of 415 listed
companies in the manufacturing and wholesale industries on the three stock exchanges
HOSE, HNX, and UPCoM, we found that Neural Network is the most appropriate
classification method. Therefore, the team took the same steps as the training dataset to
forecast, as figure 4.14 below:
Figure 4.14: Neural Network forecasting process
Source: Results from Orange program
Just like the training dataset, the team inputed the forecast dataset into the Orange
64
program and set the properties for the variables in the forecast dataset, as figure 4.15 below:
• The independent variables NWCTA, RETA, EBITTA, MVETD, NRTA, and Results is
declared as "feature".
• The Code variable does not participate in the prediction process, but is catergory
data, so it is declared with the "meta" attribute.
• Variable No does not participate in the training process but is numeric data, it is set
"skip".
Figure 4.15: Describe the properties of the variables in the forecast dataset
Source: Results from Orange program
Then we used Predictions to see how the forecast using Neural Network model is.
Figure 4.16 shows the forecast results of the first 20 companies of the forecast dataset as
follows:
65
Figure 4.16: Forecast results using Neural Network model
Source: Results from Orange program
The forecast results of the remaining 188 listed companies in the manufacturing and
wholesale sectors show that:
• There are 72 listed companies in the safe zone. It also means that these 72 companies
do not go bankrupt, with the correct probability being 94% within 1 year (2022) and
74% within 2 years.
• There are 59 companies in the distress zone. It also means that these 59 companies
are at risk of bankruptcy, with the correct probability being 94% within 1 year and
74% within 2 years.
• There are 57 companies located in the gray zone. It also means that these 57
companies are at high risk of bankruptcy, with a true probability of 94% within 1
year and 74% within 2 years.
66
Figure 4.17: Statistical forecasting results by Neural Network model
Source: Results from Orange program
67
CHAPTER 5: DISCUSSIONS, LIMITATIONS, AND RECOMMENDATIONS
5.1.
Discussions
The topic "Application of the neural network model in forcasting financial distress of
listed manufacturing and wholesale companies in 2022, and 2023 by using orange program"
has basically completed the research objectives set out through two aspects:
• Theoretically, the study presented the general theoretical basis of data mining
technique, and data classification method (specifically Neural Network model).
• Experimentally, the study combines the use of technology (Orange, STATA, and
Excel) to apply to the financial sector using Altman (1968) model for predicting
probability of bankruptcy in 2022 and 2023 of manufacturing and wholesale
enterprises listed on three stock exchanges in Vietnam, through 5 independent
variables: NWCTA, RETA, EBITTA, MVETD, NRTA, and 1 dependent variable
Results (with 3 results: Safe zone, Distress zone, and Gray zone). In the study, the
team used 439 companies in the training dataset and 188 companies in the forecast
dataset. After the group preprocessed the data (removing outliers), the training
dataset was left with 415 observations. The authors draw some conclusions below:
Firstly, regarding to the question: “How do internal factors affect the criteria for
assessing the possibility of bankruptcy of the company (Z-score) presented through
descriptive statistics?”, our group found that the more postive values are in all five
variables, the less likelihood of financial distress that manufacturing and wholesale
companiers may experience.
Secondly, regarding to the question: “Given the training data set, which suitable
model provided by Orange software should be used to predict financial distress with high
level of confidence?”, our group chose Neural Network as the effective, and suitable
classification method for this study through the training data set of 415 observations (after
removing 24 obs from the data preprocessing). In all 4 classification methods that the team
chose to test và evaluate which is the most suitable model (Tree, SVM, Neural Network,
Logistic Regression), model Neural Network is rated the highest in 5 indexes: AUC, CA,
F1, Precision, and Recall. Moreover, this model also have the highest correct propotion of
68
predicted value through confusion matrix.
Thirdly, regrading to the question “By using the selected suitable model, how the
likelihood of financial distress of the companies are in 2022 and 2023?”, we used Neruron
Network model to predict forecast data (188 observations remaining in two industries:
manufacturing, and wholesale). We found that there are 72 listed companies do not go
bankrupt, 59 companies are at risk of bankruptcy, and 57 companies are at high risk of
bankruptcy, with a true probability of 94% within 1 year (2022) and 74% within 2 years
(2023).
5.2.
•
Recommendations
For domestic companies:
Domestic companies can increase their Z score by reducing debt in their capital
structure. A firm should monitor the financial ratios namely cash flow to total debt, net
income to total assets, and total debt to total assets. To ensure proper asset utilization and
lower the danger of financial crisis, the company should keep its debt at an optimal level.
When an industry slumps, highly indebted companies are more likely to experience
financial difficulties. However, these businesses can continue to thrive economically even
in instances where their industry is in trouble by using effective management techniques.
The achievement of efficiency while downsizing is one of the primary tactics for
recovering from a financial hardship situation.
To identify early warning signs of crises and take preventive action to forestall the
impending danger, a firm should pay close attention to both internal (financial and nonfinancial) and external (macroeconomic factors) causes that could result in financial
difficulty, especially the macroeconomic policies. The government's macroeconomic
policies directly affect how the firm does business. Since businesses must adhere to the
law, the board of directors and finance supervisors needs to understand how new
legislation and government policies may impact their performance. Then, they must
quickly establish internal control and risk management systems.
•
For government:
69
Financial distress of the companies is significantly influenced by the overall
financial health of the economy. It is found that, structural reforms, improvements to the
business climate, reduced uncertainty, measures to address bank asset quality deterioration
through enhancing the legal and institutional insolvency framework are helpful in
improving overall financial health of economy.
When developing its policies, the government must take the business communities
into consideration. In other words, government policies must be pro-business in order to
support and strengthen firm growth rather than slow it down. The government should also
offer infrastructure improvements and a favorable environment for businesses to flourish.
5.3.
Limitation
Even if the study was conducted with scientific research in mind, there are still
some mistakes. In 2021, social distancing measures have affected the circulation of goods,
leading to the disruption of supply chains and affecting the production and business
activities of manufacturing, and wholesale industries domestically and internationally.
This significantly affects the financial indicators in the study. The data was collected when
the COVID-19 epidemic was ongoing and began to show signs of being under control, so
it also has some influence on predictions when estimating the future of businesses after the
COVID pandemic.
Another limitation of the study is that the companies under investigation were all
publicly held manufacturing organizations for whom extensive financial data, including
market price quotations, were accessible, represents another limitation of the study.
Therefore, extending the investigation to comparatively smaller asset-sized businesses and
unincorporated companies, where the frequency of company failure is higher than with
larger corporations, would be a subject for future research.
In this paper, the team only used ratios Altman Z-score as a measure for the
financial distress status of companies without referencing other models with similar
functions. This Z-score degree of accuracy and reliability acquired in the tests carried out
in Vietnam on industrial listed businesses (banks, insurance, and financial enterprises were
excluded) could not be applied to other countries, even to nations with similar
70
environmental characteristics. Therefore, it could be wise to proceed to a preliminary
validation of the model for companies that are quoted on different marketplaces.
5.4.
Directions
The team would like to offer the following directions for further research on the
topic based on the limitations that prevent the research from being truly complete:
● To guarantee that the data collected is not diluted and results are more clear, the
research subject should be a specialized industry.
● To increase the accuracy of the results, possible predictions must be implemented the
model must be tested to reality, and the results must be consistently evaluated. We
recommend that the model variables be added and appropriately adjusted to relate to
each macro period in order to account for the limitation that the authors recognize in
the research report.
● To have effective decisions for investors, the researchers proposed investigative
modeling initiatives other than the Altman model. Future study ideas will discover
more beneficial models and concentrated research techniques for forecasting
enterprise development.
I
REFERENCES
GSO (2020). Annual report General Statistics of Vietnam in 2020.
GSO (2021). Annual report General Statistics of Vietnam in 2021.
Ross, Westerfield, Jaffe (2012). Corporate Finance, 10th edition.
The Law on Bankruptcy 2014
Vietnam Bankruptcy Law, 2014; Decree No. 58/2012/ND-CP of the Government
Adebayo, A. O., & Chaubey, M. S. (2019). Data mining classification techniques on
the analysis of student’s performance. GSJ, 7(4), 45-52.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of
corporate bankruptcy. The journal of finance, 23(4), 589-609.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of
corporate bankruptcy. The journal of finance, 23(4), 589-609.
Altman, E., & Hotchkiss, E. (2006). Corporate financial distress and bankruptcy. NJ:
John Wiley & Sons.
Baimwera, B., & Muriuki, A. M. (2014). Analysis of corporate financial distress
determinants: A survey of non-financial firms listed in the NSE. International Journal of
Current Business and Social Sciences, 1(2), 58-80.
Baisag, S., & Patjoshi, P. (2020). Corporate Financial Distress Prediction–A Review
Paper. PalArch's Journal of Archaeology of Egypt/Egyptology, 17(9), 2109-2118.
Breiman, L., & Ihaka, R. (1984). Nonlinear discriminant analysis via scaling and ACE.
Davis One Shields Avenue Davis, CA, USA: Department of Statistics, University of
California.
Calderon, T. G., Cheh, J. J., & Kim, I. W. (2003). How large corporations use data
mining to create value. Management Accounting Quarterly, 4(2), 1-1.
Chancharat, N. (2008). An empirical analysis of financially distressed Australian
companies: the application of survival analysis.
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal
Statistical Society: Series B (Methodological), 20(2), 215-232.
Devji, S., & Suprabha, K. R. (2016). Corporate financial distress and stock return:
Evidence from Indian stock market. Nitte management review, 10(1), 34-44.
II
Dhar, S., Mukherjee, T., & Ghoshal, A. K. (2010, December). Performance evaluation
of Neural Network approach in financial prediction: Evidence from Indian Market. In 2010
International Conference on Communication and Computational Intelligence (INCOCCI)
(pp. 597-602). IEEE.
Elloumi, F., & Gueyié, J. P. (2001). Financial distress and corporate governance: an
empirical analysis. Corporate Governance: The international journal of business in society.
Erdem, Z., Polikar, R., Gurgen, F., & Yumusak, N. (2005, June). Ensemble of SVMs
for incremental learning. In International Workshop on Multiple Classifier Systems (pp.
246-256). Springer, Berlin, Heidelberg.
Fan, A., & Palaniswami, M. (2000, July). Selecting bankruptcy predictors using a
support vector machine approach. In Proceedings of the IEEE-INNS-ENNS International
Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges
and Perspectives for the New Millennium (Vol. 6, pp. 354-359). IEEE.
Frydman, H., Altman, E. I., & Kao, D. L. (1985). Introducing recursive partitioning for
financial classification: the case of financial distress. The journal of finance, 40(1), 269-291.
Fun, M. H., & Hagan, M. T. (1996, September). Modular neural networks for friction
modeling and compensation. In Proceeding of the 1996 IEEE International Conference on
Control Applications IEEE International Conference on Control Applications held together
with IEEE International Symposium on Intelligent Contro (pp. 814-819). IEEE.
Hải, T. V. (2017). Nhận diện gian lận báo cáo tài chính của các công ty niêm yết trên
thị trường chứng khoán Việt Nam–bằng chứng thực nghiệm tại sàn giao dịch chứng khoán
HOSE.
Halili, F., & Rustemi, A. (2016). Predictive modeling: data mining regression
technique applied in a prototype. International Journal of Computer Science and Mobile
Computing, 5(8), 207-215.
Idrees, S., & Qayyum, A. (2018). The impact of financial distress risk on equity
returns: A case study of non-financial firms of Pakistan Stock Exchange. Journal of
Economics Bibliography, 5(2), 49-59.
Ikpesu, F., Vincent, O., & Dakare, O. (2020). Financial Distress Overview,
Determinants, and Sustainable Remedial Measures: Financial Distress. In Corporate
Governance Models and Applications in Developing Economies (pp. 102-113). IGI Global.
III
Ikpesu, F., Vincent, O., & Dakare, O. (2020). Financial Distress Overview,
Determinants, and Sustainable Remedial Measures: Financial Distress. In Corporate
Governance Models and Applications in Developing Economies (pp. 102-113). IGI Global.
Ikpesu, F., Vincent, O., & Dakare, O. (2020). Financial Distress Overview,
Determinants, and Sustainable Remedial Measures: Financial Distress. In Corporate
Governance Models and Applications in Developing Economies (pp. 102-113). IGI Global.
Iyer, A., Jeyalatha, S., & Sumbaly, R. (2015). Diagnosis of diabetes using
classification mining techniques. arXiv preprint arXiv:1502.03774.
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: A
tutorial. Computer, 29(3), 31-44.
Jiming, L., & Weiwei, D. (2011). An empirical study on the corporate financial
distress prediction based on logistic model: Evidence from China’s manufacturing industry.
International Journal of Digital Content Technology and its Applications, 5(6), 368-379.
Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms. John
Wiley & Sons.
Kirkos, E., & Manolopoulos, Y. (2004). Data mining in finance and accounting: a
review of current research trends. In Proceedings of the 1st international conference on
enterprise systems and accounting (ICESAcc) (pp. 63-78).
Kristanti, F. T., Rahayu, S., & Huda, A. N. (2016). The determinant of financial
distress on Indonesian family firm. Procedia-Social and Behavioral Sciences, 219, 440-447.
Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via
statistical and intelligent techniques–A review. European journal of operational research,
180(1), 1-28.
Larose, D. T. (2005). An introduction to data mining. Traduction et adaptation de
Thierry Vallaud.
Lee, M. C., & Su, L. E. (2015). Comparison of wavelet network and logistic regression
in predicting enterprise financial distress. International Journal of Computer Science &
Information Technology, 7(3), 83-96.
Lee, M. C., & To, C. (2010). Comparison of support vector machine and back
propagation neural network in evaluating the enterprise financial distress. arXiv preprint
arXiv:1007.5133.
IV
Lê, C. H. A., & Th S Nguyễn, T. H. (2012). Kiểm định mô hình chỉ số Z của Altman
trong dự báo thất bại doanh nghiệp tại Việt Nam.
Lin, F. Y., & McClean, S. (2001). A data mining approach to the prediction of
corporate failure. Knowledge-based systems, 14(3-4), 189-195.
Lippmann, R. P. (1989). Pattern classification using neural networks. IEEE
communications magazine, 27(11), 47-50.
McLachlan, G. J. (2004). Discriminant analysis and statistical pattern recognition.
John Wiley & Sons.
Neha, K., & Reddy, M. Y. (2020). A Study On Applications Of Data Mining.
International Journal of Scientific & Technology Research, 9(02).
Neha, K., & Reddy, M. Y. (2020). A Study On Applications Of Data Mining.
International Journal of Scientific & Technology Research, 9(02).
Nguyễn, T. M. L. (2014). Khai phá dữ liệu trên nền ORACLE và ứng dụng (Doctoral
dissertation, Đại học Quốc gia Hà Nội).
Ninh, B. P. V., Do Thanh, T., & Hong, D. V. (2018). Financial distress and bankruptcy
prediction: An appropriate model for listed firms in Vietnam. Economic Systems, 42(4),
616-624.
Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy.
Journal of accounting research, 109-131.
Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy.
Journal of accounting research, 109-131.
Papakyriakou, D., & Barbounakis, I. S. Data Mining Methods: A Review.
International Journal of Computer Applications, 975, 8887.
Parker, J. A. (2011). On measuring the effects of fiscal policy in recessions. Journal of
Economic Literature, 49(3), 703-18.
Provost, F. J., Fawcett, T., & Kohavi, R. (1998, July). The case against accuracy
estimation for comparing induction algorithms. In ICML (Vol. 98, pp. 445-453).
Quinlan, J. R. (1993, June). Combining instance-based and model-based learning. In
Proceedings of the tenth international conference on machine learning (pp. 236-243).
Rani, K. U. (2011). Analysis of heart diseases dataset using neural network approach.
arXiv preprint arXiv:1110.2626.
V
Shiri, M. M., & Ahangary, M. (2012). Corporate Bankruptcy Prediction Using Data
Mining Techniques: Evidence from Iran. African J. Sci. Res. Vol, 8(1).
Sossi Alaoui, S., Farhaoui, Y., & Aksasse, B. (2017, April). A comparative study of
the four well-known classification algorithms in data mining. In International Conference on
Advanced Information Technology, Services and Systems (pp. 362-373). Springer, Cham.
Steinberg, D., & Colla, P. (1995). CART: tree-structured non-parametric data analysis.
San Diego, CA: Salford Systems.
Supriyanto, J., & Darmawan, A. (2018). The effect of financial ratio on financial
distress in predicting bankruptcy. Journal of Applied Managerial Accounting, 2(1), 110-120.
Thảo, H. T. T. (2016). Ứng dụng Data Mining dự báo kiệt quệ tài chính ở các Công ty
Dược niêm yết tại Việt Nam.
Thim, C. K., Choong, Y. V., & Nee, C. S. (2011). Factors affecting financial distress:
The case of Malaysian public listed firms. Corporate Ownership and Control, 8(4), 345-351.
Tinoco, M. H., & Wilson, N. (2013). Financial distress and bankruptcy prediction
among listed companies using accounting, market and macroeconomic variables.
International review of financial analysis, 30, 394-419.
Trang, H. C., & Nhị, V. V. (2020). Ảnh hưởng của thành viên nữ trong hội đồng quản
trị đến hiệu quả hoạt động của các công ty niêm yết. Tạp chí Phát triển Kinh tế, 61-75.
Turetsky, H. F., & McEwen, R. A. (2001). An empirical investigation of firm
longevity: A model of the ex ante predictors of financial distress. Review of Quantitative
Finance and Accounting, 16(4), 323-343.
Vân, H. T. H. Vận dụng mô hình Z-score trong dự báo khả năng phá sản doanh nghiệp
tại Việt Nam.
Wesa, E. W., & Otinga, H. N. (2018). Determinants of financial distress among listed
firms at the Nairobi securities exchange, Kenya. Strategic Journal of Business and Change
Management, 9492, 1056-1073.
Wilson, R. L., & Sharda, R. (1994). Bankruptcy prediction using neural networks.
Decision support systems, 11(5), 545-557.
Wruck, K. H. (1990). Financial distress, reorganization, and organizational efficiency.
Journal of financial economics, 27(2), 419-444.
VI
Yap, B. C. F., Yong, D. G. F., & Poon, W. C. (2010). How well do financial ratios and
multiple discriminant analysis predict company failures in Malaysia. International Research
Journal of Finance and Economics, 54(13), 166-175.
Yohannes, Y., & Webb, P. (1998). Classification and regression trees, CART: a user
manual for identifying indicators of vulnerability to famine and chronic food insecurity
(Vol. 3). Intl Food Policy Res Inst.
Zaki, M. J., & Wong, L. (2003). Data Mining Techniques, WSPC. Lecture Notes
Series: 9in x 6in, 2.
VII
APPENDIX 1: TRAINING TEST BEFORE PROCESSING DATA
VIII
IX
X
XI
XII
XIII
XIV
XV
XVI
XVII
XVIII
XIX
XX
APPENDIX 2: TRAINING TEST AFTER PROCESSING DATA
XXI
XXII
XXIII
XXIV
XXV
XXVI
XXVII
XXVIII
XXIX
XXX
XXXI
XXXII
APPENDIX 3: TEST DATA BEFORE FORECASTING
XXXIII
XXXIV
XXXV
XXXVI
XXXVII
APPENDIX 4: TEST DATA AFTER FORECASTING BY USING NEURAL
NETWORK MODEL
XXXVIII
XXXIX
XL
Download