Tangible Assets

advertisement
Data Analysis Final Report
TEAM 04
Yoko Arita
Leslie Anne James
Shigenori Kobayashi
Takeshi Kusunoki

Step 1: Determine a concerning Variable
Team Objective of the project
 Find conditions for Strong Company
Investopedia explains “Cash Flow”
1. In business as in personal finance, cash flows are essential to solvency.
They can be presented as a record of something that has happened in the
past, such as the sale of a particular product, or forecasted into the future,
representing what a business or a person expects to take in and to spend.
Cash flow is crucial to an entity's survival. Having ample cash on hand will
ensure that creditors, employees and others can be paid on time. If a
business or person does not have enough cash to support its operations, it is
said to be insolvent, and a likely candidate for bankruptcy should the
insolvency continue.
2. The statement of a business's cash flows is often used by analysts to
gauge financial performance. Companies with ample cash on hand are able to
invest the cash back into the business in order to generate more cash and
profit.
Step 1: Determine a concerning Variable
Does “Cash Flow” fit our goal?
As Investopedia says, cash flow is essential for a
company. It needs a lot of cash on hand to be able
to pay employees and creditors. It can invest in the
company to improve performance and profit.
Also, Cash Flow is often used to determine financial
performance.
Both for practical and analytical reasons,
Cash Flow is strong factor for a company’s success!
Step 2 : Consider the Explanatory variables
Step 2-1 : Dispersion : Get the beautiful Histograms
> summary(Cashflow)
Min. 1st Qu. Median
-189900 558 1515
Mean 3rd Qu. Max.
8101 4215 1711000
> summary(log(Cashflow)-0.1)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-0.100 6.489 7.338 7.454 8.356 14.250 159
Step 2 : Consider the Explanatory variables
Step 2-2 : Association : decide an explanatory variable
As we are looking for factors that make
a strong company, we decided that the
profit is a natural indicator of a
successful company. So, our first
explanatory variable is Ordinary Profit.
Step 2 : Consider the Explanatory variables
Step 2-2 : Association : draw scatter plots
> cor(Cashflow,OrdinaryProfit)
[1] 0.7604991
> cor(log(Cashflow),log(OrdinaryProfit))
[1] 0.814815
Step 2 : Consider the Explanatory variables
Step 2-2 : Association : using the data
By looking at our previous charts, it is clear that
using the log of our variables gives us clearer
information than the initial values.
In addition, the correlations of Cashflow and
OrdinaryProfit:
cor(Cashflow,OrdinaryProfit)
cor(log(Cashflow),log(OrdinaryProfit))
0.7604991
0.814815
an improvement of 0.0543159
Step 3 : Construct Single Linear Regression Models
> Cashflow.lm<-lm((Cashflow)~(OrdinaryProfit))
> summary(Cashflow.lm)
Call:
lm(formula = (Cashflow) ~ (OrdinaryProfit))
Residuals:
Min
1Q Median
-360958 -578 2373
3Q Max
3660 1011555
Coefficients:
Estimate Std. Error
t value
Pr(>|t|)
(Intercept)
-3.443e+03
7.356e+02 -4.681
3.03e-06 ***
OrdinaryProfit 2.136e+00
4.021e-02 53.109
< 2e-16 ***
--Signif. codes: 0 ‘***’
0.001 ‘**’
0.01 ‘*’
0.05 ‘.’
0.1 ‘ ’ 1
Residual standard error: 32130 on 2089 degrees of freedom
Multiple R-squared: 0.5745,
Adjusted R-squared: 0.5743
F-statistic: 2821 on 1 and 2089 DF, p-value: < 2.2e-16
Cashflow = (-3.443e+0.3) + (2.136)OrdinaryProfit
Step 3 : Construct Single Linear Regression Models
> model2<-lm(log(dat[,29])~log(dat[,37]))
> summary(model2)
Call:
lm(formula = log(dat[, 29]) ~ log(dat[, 37]))
Residuals:
Min
1Q Median
3Q Max
-5.4288 -0.5322 -0.0756 0.4844 4.5932
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.83025 0.09951 18.39 <2e-16 ***
log(dat[, 37]) 0.78049 0.01304 59.85 <2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8912 on 1813 degrees of freedom
Multiple R-squared: 0.6639,
Adjusted R-squared: 0.6637
F-statistic: 3582 on 1 and 1813 DF, p-value: < 2.2e-16
Log(Cashflow) = (1.83025) + (0.78049)(log(OrdinaryProfit))
Step 3 : Construct Single Linear Regression Models
Initial results:
We found that using log() of both variables gave
us better data.
Tighter scatter plot
The R-Squared improved (0.5743 to 0.6637 )
The error improved (32130 on 2089 degrees of
freedom to 0.8912 on 1813 degrees of freedom)
Also, we found the p-value of OrdinaryProfit to be <2e-16,
which is very close to zero.
But, the R-Squared improved by only 0.0894, so clearly
adding only OrdinaryProfit is not enough.
Step 3 : Model Checking
> var(log(dat[,29]))
[1] 2.361808
> var(residuals(model2))
[1] 0.7937481
> var(predict(model2))
[1] 1.56806
Step 3 : Model Checking
Initial results and impressions:
The variance results were:
Original
2.361808
Residuals
1.56806
Predicted
0.7937481
Which are all fairly high. And even though the predicted
variance is less than one, it is not close to zero.
In addition to the R-Squared results, we can say that just
using OrdinaryProfit with Cashflow improves the model, but it
does not make it a good model.
Step 4 : Implement of the Obtained Model
by changing explanatory variables
After concluding that using OrdinaryProfit to compliment Cashflow does not
give us a significant improvement, we have to improve our model!
So, we decided to use the following variables because we believe that they all
are signs of a strong company:
Variable
Data
Concerning variable
Cash Flow
29
Initial explanatory variable
Ordinary Profit
37
New explanatory variables
Personnel Expenses
41
Tangible Assets
18
Depreciation
28
Step 4 : Implement of the Obtained Model
by changing explanatory variables 1
First, we just included Personnel Expenses to our previous data:
> model3<-lm(log(dat[,29])~log(dat[,37])+log(dat[,41]))
> summary(model3)
Call:
lm(formula = log(dat[, 29]) ~ log(dat[, 37]) + log(dat[, 41]))
Residuals:
Min
1Q Median
3Q Max
-4.7120 -0.3625 0.0258 0.3959 3.6438
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.98178 0.12307 -7.977 2.63e-15 ***
log(dat[, 37]) 0.44945 0.01522 29.527 < 2e-16 ***
log(dat[, 41]) 0.59305 0.01953 30.359 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7257 on 1812 degrees of freedom
Multiple R-squared: 0.7772,
Adjusted R-squared: 0.777
F-statistic: 3161 on 2 and 1812 DF, p-value: < 2.2e-16
Step 4 : Implement of the Obtained Model
by changing explanatory variables 2
Then, we included Tangible Assets:
> model4<-lm(log(dat[,29])~log(dat[,37])+log(dat[,41])+log(dat[,18]))
> summary(model4)
Call:
lm(formula = log(dat[, 29]) ~ log(dat[, 37]) + log(dat[, 41]) +
log(dat[, 18]))
Residuals:
Min
1Q Median
3Q Max
-4.6334 -0.2612 0.0527 0.3206 2.6876
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.65317 0.09596 -17.228 <2e-16 ***
log(dat[, 37]) 0.31375 0.01224 25.634 <2e-16 ***
log(dat[, 41]) 0.17090 0.01902 8.987 <2e-16 ***
log(dat[, 18]) 0.56570 0.01577 35.881 <2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.555 on 1811 degrees of freedom
Multiple R-squared: 0.8698,
Adjusted R-squared: 0.8696
F-statistic: 4033 on 3 and 1811 DF, p-value: < 2.2e-16
Step 4 : Implement of the Obtained Model
by changing explanatory variables 3
Finally, we included Depreciation:
> model5<-lm(log(dat[,29])~log(dat[,37])+log(dat[,41])+log(dat[,18])+log(dat[,28]))
> summary(model5)
Call:
lm(formula = log(dat[, 29]) ~ log(dat[, 37]) + log(dat[, 41]) +
log(dat[, 18]) + log(dat[, 28]))
Residuals:
Min
1Q Median
3Q Max
-4.6901 -0.1523 -0.0261 0.1396 2.7388
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.481898 0.084842 5.680 1.57e-08 ***
log(dat[, 37]) 0.307379 0.008691 35.368 < 2e-16 ***
log(dat[, 41]) -0.040860 0.014402 -2.837 0.0046 **
log(dat[, 18]) -0.004568 0.017541 -0.260 0.7946
log(dat[, 28]) 0.727838 0.017238 42.224 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.394 on 1810 degrees of freedom
Multiple R-squared: 0.9344,
Adjusted R-squared: 0.9343
F-statistic: 6446 on 4 and 1810 DF, p-value: < 2.2e-16
Step 4 : Implement of the Obtained Model
by changing explanatory variables 4
Results of R-Squared:
There was a clear increase in Adjusted R-Squared as we
included each variable.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
The Adjusted
R-Squared
increased from
66% to 93%!
Cashflow and with Tangible
Ordinary
Assets
Profits
with
Personnel
Expenses
with
Depreciation
NOTE: all values are log()
Step 4 : Implement of the Obtained Model
by changing explanatory variables 4
Other Results:
*all variables are log()
T-value
P-value
Ordinary Profit
35.368
< 2e-16
Personnel Expenses
-2.837
0.0046
Tangible Assets
-0.260
0.7946
Depreciation
42.224
< 2e-16
The P-values are good and close to zero, except for Tangible Assets
The T-values vary widely
Depreciation and Ordinary Profit have very good T-values, and
Personnel Expenses has an OK one.
However, Tangible Assets does not have a good T-value
Step 4 : Implement of the Obtained Model
by changing explanatory variables 4
Scatter plots of the Linear Model and the final Multiple Regression Model:
Step 4 : Implement of the Obtained Model
by changing explanatory variables 4
Result and reflections:
Our goal is to find factors that indicate a strong company, and we
decided to focus on Cash Flow. We also decided that a strong
company would have a good Ordinary Profit. So, looked for the
correlation between the two. We found that the log() of each
value is the best way to present the relationship. Then, we looked
for variables that may contribute to Cash Flow, and how they
improve our model.
By using log() and several explanatory variables, the
Adjusted Multiple Square has been drastically improved!
Original: 0.5745 →
Improved: 0.9344
Step 5 : Residual Analysis
Whisker Plots of our models:
> boxplot(residuals(Cashflow.lm),residuals(Cashflow.lm4))
Residuals of Simple
Residuals of Multiple
The
Standard
Deviation
is smaller.
Step 5 : Residual Analysis
The way to improve the model….
Check whether remarkable characteristics of outlaying data can be analyzed
for companies over the standard deviation:
> dat[residuals(model5)>0.6,1]
[1] HEIWA CONSTRUCTION
Sekisui House Hokuriku
SDK ENGINEERING
DAITO TRUST CONSTRUCTION
[5] SHINNIHON
Sumitomo Forestry
CHUDENKO
ATAKA CONSTRUCTION & ENGINEERING
[9] Taikisha
DAI-DAN
Hibiya Engineering
Snow Brand Seed
[13] CHUYU
BOSO OIL & FAT
NIHON SHOKUHIN KAKO
SHINOBU FOODS PRODUCTS
[17] Fuji Spinning
Teikoku Sen-i
CAROLINA
KANBO PRAS
[21] ITARIYARD
Kureha Chemical Industry
Nippon Chemical Industrial
Plas-Tech
[25] JAPAN CARLIT
Nippon Steel Chemical
ONO PHARMACEUTICAL
SANTEN PHARMACEUTICAL
[29] NIPPON SEIRO
KINUGAWA RUBBER INDUSTRIAL
ISHIZUKA GLASS
Sumitomo Osaka Cement
[33] Harima Ceramic
Pacific Metals
Japan Metals & Chemicals
Optec Dai-Ichi Denko
[37] Japan Bridge
KOMAI TEKKO
TOSHIBA TUNGALOY
TOYO MACHINERY & METAL
[41] Hitachi Zosen Tomioka Machinery Kurita Water Industries
YUKEN KOGYO
Shimpo Industrial
[45] Heiwa
SANKYO
Shinko Electric
Denyo
[49] MABUCHI MOTOR
TAMURA ELECTRIC WORKS
HIROSE ELECTRIC
OHKURA ELECTRIC
[53] KEYENCE
MELCO
KOMATSU ZENOAH
Mazda Motor
[57] IKEDA BUSSAN
ECHO TRADING
NAKAYAMAFUKU
Harima-Kyowa
[61] DOSHISHA
TOKIMEC
TOKYO SEIMITSU
MITSUMURA PRINTING
[65] TSUTSUMI JEWELRY
Nintendo
NAGASE & CO.
G-NET
[69] Tokyo Electron
OSAKA UOICHIBA
RYOYO ELECTRO
TOKAI BUSSAN
[73] KUWAZAWA Trading
TOKYO STYLE
UNI¥xa5CHARM
CENTRAL AUTOMOTIVE PRODUCTS
[77] Ryosan
DENKYOSHA
SHIMACHU
CHIYODA
[81] CHUO SUBARU
Kansai Sekiwa Real Estate
SEIBU RAILWAY
HOKKAIDO CHUO BUS
[85] MEIDEN ENGINEERING
KOEI
NIPPON KANZAI
INES
[89] BIKEN TECHNO
Juel Verite Ohkubo
SHINDEN
NIKKU SANGYO
[93] FAST RETAILING
There is no overall trend, but there are many companies that
deal with materials and electronics.
Step 5 : Residual Analysis
The way to improve the model….
Check whether remarkable characteristics of outlaying data can be analyzed
for companies under the standard deviation:
> dat[residuals(model5)< -0.6,1]
[1] Arabian Oil
FUJIKO
ZENITAKA
Sumitomo Construction
[5] Daiwa Construction
DAI NIPPON CONSTRUCTION
HAZAMA
KOKUNE
[9] TADA
Arai-Gumi
KUMAGAI GUMI
Asakawagumi
[13] KOMATSU CONSTRUCTION
TSUKEN
Chugai Ro
SANKO METAL INDUSTRIAL
[17] KYODO SHIRYO
EZAKI GLICO
SETTSU OIL MILL
KATOKICHI
[21] Shoei
MIYUKI KEORI
ATSUGI NYLON INDUSTRIAL
TOMOEGAWA PAPER
[25] Settsu
Nippon Denko
Shimura Kako
ASAKA INDUSTRIAL
[29] KATO SPRING WORKS
KOYO IRON WORKS & CONSTRUCTION ANEST IWATA
FUJITSU GENERAL
[33] UNIDEN
OKAYA ELECTRIC INDUSTRIES
SHIZUKI ELECTRIC
ZOJIRUSHI
[37] TAKASHIMA & CO.
DAIKO DENSHI TSUSHIN
LECIEN
TOKYO SOIR
[41] DAITO GYORUI
Kinsho-Mataichi
YUASA TRADING
CANOX
[45] MOONBAT
ZETT
NAGAHORI
YAOHAN JAPAN
[49] Footwork International
OAK
Maruzen
SHOWA LINE
[53] DAIICHI CHUO KISEN
Ga-jo-en Kanko
Again, there is no overall trend. But, there are many electric,
construction, and oil companies.
Data Analysis Conclusion: R-Squared
In these variables, you can see the strong relationship between CashFlow…
With “OrdinaryProfit”
0.6639
improvement:
Adding “PersonnelExpence”
0.7772
0.1133
Adding “TangibleTotalAsset”
0.8698
0.0926
Adding “Depreciation”
0.9344
0.0646
the R-Squared is
Looking at the first simple regression, R-squared improved from 66.4% to
93.4% at our final multiple model.
All the coefficients but “TangibleFixedAsset” are highly significant since
their p values are very small. So to improve this model more, we aim to
improve on this variable.
Data Analysis Conclusion: Other Values
Looking back and analyzing more data…
*all variables are log()
T-value
P-value
OrdinaryProfit
35.368
< 2e-16
Personnel Expenses
-2.837
0.0046
Tangible Assets
-0.260
0.7946
Depreciation
42.224
< 2e-16
Ordinary Profit and Depreciation have high T-values, so we can say that
they are very effective in supporting Cash Flow. Personnel Expenses does
as well, but is much weaker. However, the T-value of Tangible Assets does
not show any support for Cash Flow.
For P-values, again, Ordinary Profit and Depreciation had a strong result.
There is a high probability that they improve the model. Personnel
Expenses has a lower probability. On the other hand, there was no
evidence that Tangible Assets helps Cash Flow.
Data Analysis Conclusion
Issues and Improvements:
By looking at R-Squared, P-Value, and T-Value, we found that
all of our variables were good explanatory values for Cash Flow
and helped improve the model in some way. However, there
were three problems:
Depreciation improved R-Squared, but significantly less
than the other variables did
Tangible Assets had an insignificant affect on Cash Flow
Tangible Assets had a small probability of supporting
Cash Flow
Data Analysis Conclusion
Verify if Tangible Assets is necessary
Run Automatic Variables selection
Tangible Assets is not impact to our model
Data Analysis Conclusion
Cash flow model is improved by…
- Ordinary Profit
- Personnel Expenses
- Depreciation
Team 04
Thank you for listening!
Any questions?
APPENDIX
Multiple regression model without Tangible Assets
QA from last presentation
Q1. Some of the ordinary profit records has negative value.
How did you stuggle with this problem.
Q2. Based on Q1’s answer the chart in slide 4 shows that using wrong data.
Q3. In slide 4’s right chart, Is there any mean to adding -0.1?
Q4.How did you find explanatory variables on slide 14?
<MEMO>
This page was added after the final presentation.
And charts in slide 4 also replaced.
Download