Multiple Regression of MSFT vs. 10 IT companies

advertisement

Summary

In our project, we investigated the relation between stocks in the Information Technology industry (IT), using five years of price data from October 22, 2004 – October 20, 2009.

The company that we used as our focus was Microsoft, and we wanted to determine how ten other stocks affected the daily return on Microsoft stock (MSFT). The ten stocks we researched for effect were the following: Apple Computer (AAPL), Adobe Software

(ADBE), Automatic Data Processing (ADP), Advanced Micro Devices (AMD), Dell

Computer (DELL), Hewlett-Packard (HP), International Business Machines (IBM),

Oracle Software (ORCL), Yahoo (YHOO), and Google (GOOG). We found that only six of these had a significant relationship with Microsoft. The result of our analysis is an equation that we can use to predict Microsoft stock returns based on the returns of those six companies’ stocks.

First, let’s look into why some stock returns are more highly correlated than others.

Companies in the same industry should behave the same right? The trouble with that idea is that an industry (like Information Technology) is a very wide category. IBM makes computers and sells software, Advanced Micro Devices only creates chips to make computers run, while Microsoft and Oracle compete with their database systems and other software. All four of these companies are in the information technology industry, or sector, but they operate in different sub-sectors. So when we look at the ten companies we want to compare to Microsoft, we expect that the company most similar to Microsoft would have the best relationship (and we measure that) with them. In the table of our results below, where we rank the six stocks according to their relationship with

Microsoft, we indeed see that Oracle does have the best relationship with Microsoft. This makes sense, since both these companies are very large and produce only software.

Stock Rank

ORCL 0.235

ADP 0.211

IBM

ADBE

0.188

0.123

GOOG 0.095

DELL 0.072

When we look at the chart (see the next section) of each of these stocks vs. Microsoft, we see each is positively correlated with Microsoft Corp.

In conclusion, our results show that the strongest relationship is between Microsoft and

Oracle for several reasons: both are purely software companies, direct competitors in the database provider space, and they have the same relationship with the overall market.

The less comparable company is Dell which is solely a hardware technology company that manufactures personal computers.

We have included in the appendix a description of each of the companies.

Technical Summary

Exploratory Analysis of Data

Our data was originally retrieved from the Yahoo finance web site. Stock prices were converted to log returns, i.e. r(t) = log[S(t)/S(t–1)], where S(t) = closing stock price at time t, and log = natural logarithm. This is a typical transformation for financial data, which yields near-normal distributions. Our data had over 100 outliers. Some of these were found to be stock splits, which we then adjusted for. Other outliers were kept in the sample, because they are an important part of financial market data.

Examining the pair-wise relationships between stocks, the correlation table below shows a range of 28–60%, the largest being between MSFT and ORCL. Looking at the plots of each pair, we see that there is a linear relationship between stocks. See Graphs 1 and 2 in the appendix for the pair-wise highest correlation (MSFT vs. ORCL) and lowest correlation (MSFT vs. YHOO). Note that the far right outlier on the YHOO chart is when Microsoft offered to buy out Yahoo for a large premium to its then-current price.

Correlation Table:

MSFT

MSFT AAPL ADBE ADP AMD DELL HP IBM ORCL YHOO GOOG

1 0.44 0.56 0.55 0.36 0.46 0.4 0.56 0.6 0.31 0.46

AAPL

ADBE

ADP

AMD

DELL

HP

IBM

ORCL

YHOO

0.44 1

0.56 0.45

0.55 0.38

0.36 0.38

0.46 0.42

0.4 0.37

0.56 0.49

0.6 0.48

0.31 0.34

0.45 0.38 0.38 0.42 0.37 0.49

1 0.54 0.43 0.46 0.45 0.54

0.54 1 0.38 0.41 0.43 0.56

0.43 0.38

0.46 0.41

1

0.4

0.4 0.38 0.44

1 0.36 0.5

0.45 0.43 0.38 0.36

0.54 0.56 0.44

1

0.5 0.45

0.45

1

0.55 0.51 0.37 0.48 0.42 0.58

0.35 0.31 0.28 0.33 0.34 0.37

0.48

0.55

0.51

0.37

0.48

0.42

0.58

1

0.36

0.34

0.35

0.31

0.28

0.33

0.34

0.37

0.36

1

GOOG 0.46 0.49 0.47 0.43 0.31 0.38 0.39 0.4 0.43 0.37

The graph below shows the distribution of the response variable is mostly normal, with heavy tails, as is often the case for financial data. The mean return was very close to

1 zero, while the kurtosis is 10.78, which is well above the value of 3 for a normal population. The time period of our sample includes some very extreme events, including the bankruptcy of Lehman Brothers, which was largely unanticipated by the market, and had the overall market tumbling with daily returns of up to minus 8%, which is more than six standard deviations from the mean.

0.49

0.47

0.43

0.31

0.38

0.39

0.4

0.43

0.37

Model Selection

The first regression procedure was run with all 10 independent variables and was examined by five algorithms, which are namely, C p

, Adjusted R backward, and stepwise method. According the selection rule of C p fitting model should see the value of C p

2 , forward details,

method, a good

equal to the number of parameters, although sometimes the minimum C p

is used. We find the best model from the C p

algorithm:

Number  in  Model                C(p)            R-­‐Square        Variables  in  Model  

 

6                                            5.5647        0.5017            adbe  adp  dell  ibm  orcl  goog  

 

 

The value of adj-R chosen by the adj-R

2

2 for the above model is 0.4993. This is slightly lower than the model

algorithm, which has 7 variables and an adj-R 2 value of 0.4994.

Number  in  Model      Adjusted  R-­‐Square        R-­‐Square        Variables  in  Model  

7                                  0.4994                              0.5022            aapl  adbe  adp  dell  ibm  orcl  goog  

 

However, we exclude AAPL from our model based on its high p-value of 0.2810. The remaining three methods are more complicated due to multiple steps to review. Detailed

SAS output (as well as code) is found in the appendix. We specified a parameter of 0.05 to include variables that show significance in their added predictability to the model. In the case of the forward model selection method, the same six stocks as the C p model selection are included. See Table 1 in the appendix.

The backward elimination method begins with the full model, and steadily spins off weak variables based on the goodness-of-fit F-test. For example, in the first step, AMD is removed, because it has the lowest F value of 0.06. In addition, when we check its pvalue, it is very high at 0.8039, so it is not significant in its relation to MSFT. The next steps successively remove YHOO, HP, and AAPL from the model. See Table 2 in the appendix. Removing the above variables from the full model, there remains the six stocks

ORCL, ADP, ADBE, IBM, GOOG, DELL to form the best fitting model.

 

The stepwise method is similar to the forward method except it has an iterative procedure to add or eliminate x-variables. More specifically, if one variable acquired by the previous step has a small F-value at the current step, it will be excluded again. From the summary table (Table 3 in the appendix), the six variables (ORCL, ADP, ADBE, IBM,

GOOG, DELL) are again selected to comprise the model, and ORCL has the highest F value. When we come back to re-examine the possible models, it is not hard to notice C p

, forward, backward and stepwise methods all point to the same model consisting of the same six independent variables. In addition, their p-values are below the 5% significant level, thus are the best candidates to explain our response variable – MSFT.

While there is some correlation between our independent variables, the values of the

Variance Inflation Factor (VIF) indicate that collinearity is not a problem for this model.

The highest value was that of IBM, at 1.95, well below the threshold of 4. See table 5.

Residual Analysis and Model Diagnostics

Most of the time, markets exhibit the behavior of a normal random population in the log returns of stocks. The Normal Probability plot below shows this well, with data very close to the straight line for most of the distribution. However, in the points at the ends of the distribution the data falls away from the line, showing the data is not normal in these cases. This is the heavy tails of stock market data.

 

The plot of residuals below also shows a large number of extreme data points greater than

2.5 standard deviations from the mean. Again, this is not unexpected for data from financial markets. Apart from the outliers, the residual plot shows that the data is linearly related, with no clear pattern, showing that a linear model is applicable.

We used a number of measures to identify and examine outlier points. For cutoff values, our model has p=7 parameters and n=1257 observations. The leverage statistic shows 103 data points above a cutoff of 0.127 = (2p+2)/n. Cook’s distance shows 84 data points above a cutoff of 0.00318 = 4/n. The DFFITS statistic shows the same 84 data points beyond a cutoff of 0.15 = 2*sqrt(p/n). Using studentized residuals, we again found 84 data points beyond 2 standard deviations, 16 of which were beyond 3 standard deviations

(see Table 4 in the appendix).

There are a number of events which can cause such extreme data: 1) earnings reported far from expectations of market participants, 2) merger/acquisition activity, and 3) changes in macroeconomic environment. We do not exclude these data from our analysis because they are part of the structure of financial markets. Market participants must continually monitor news sources for such extreme events, and realize that statistical models are unlikely to perform well if a crisis environment persists. There were additional outliers found in the data originally, which were found to be stock-splits. The data was adjusted for those events. The remaining outliers are all an expected part of stock market data. The 5-year period covered by our sample includes a housing bubble and a financial crisis, so it makes sense to have some extreme data points.

Analysis of Results and Discussion

The regression analysis yields the following result (using the stock symbol to indicate a company’s log return). See table 5 in the appendix for details of the parameter estimates.

MSFT = –0.0003 + 0.1233 x ADBE + 0.2107 x ADP + 0.0719 x DELL + 0.1875 x IBM

+ 0.2354 x ORCL + 0.0941 x GOOG

An example of its predictive use, with the following input data, ADP = –0.050, IBM = –

0.042, ORCL = –0.075, ADBE = –0.037, DELL = –0.045, GOOG = –0.071, we calculate the return for MSFT to be –0.0509, with a prediction interval of [–0.0784, –0.0233].

The model could be used to earn profits if there were deviations from the statistical patterns. For example, using the above prediction interval, if the values for the dependent variables were actual data near the end of a trading day, and MSFT was outside the prediction interval, a position could be taken which would profit if MSFT stock moved so that its log return came inside the prediction interval before the close of the day.

However, there is always uncertainty in the markets, and there is no guarantee that at the close of any particular day, the stocks will align with their statistical past.

While this 6-stock model shows good results in the value of predictability of the variables, the model could be improved by adding macroeconomic variables to represent the categories 1=expansion, 2=recession, 3=transition. This may separate the outliers from the rest of the data, as the heavy tails in financial data tend to occur in clusters during recessionary times. One macroeconomic variable that is easily available on a daily basis is the level of long-term interest rates. An initial check shows that adding the

10-year bond as an independent variable improves the adjusted R 2 from 0.4993 to 0.5004.

Another possible improvement to the model is to allow the variance, or volatility, to be random. The common assumption that stock price data is lognormal dates back to the

1970s, and there have been many additional models since then. In addition to the macroeconomic variables mentioned in the previous paragraph, newer statistical models include a distribution of price or volatility jumps. These models are used more often in option markets than stock markets, and show good results in duplicating the skew found in those markets.

Overall, within the context of a multiple regression model for the response variable

MSFT vs. the 10 other Information Technology stocks, we found the best model used six of them as independent variables, with ORCL showing the strongest relationship with

MSFT. Our equation above shows the exact relationship, and this accounts for almost

50% of the variance in Microsoft’s stock returns.

Appendix

Microsoft: develops, manufactures, licenses, sells, and supports software products. The

Company offers operating system software, server application software, business and consumer applications software, software development tools, and Internet and intranet software. Microsoft also develops video game consoles and digital music entertainment devices. Listed on NASDAQ

Apple Computers: designs, manufactures, and markets personal computers and related personal computing and mobile communication devices along with a variety of related software, services, peripherals, and networking solutions. The Company sells its products worldwide through its online stores, its retail stores, its direct sales force, third-party wholesalers, and resellers. Listed on NASDAQ

Adobe Systems Inc.: Incorporated develops, markets, and supports computer software products and technologies. The Company's products allow users to express and use information across all print and electronic media. Adobe offers a line of application software products, type products, and content for creating, distributing, and managing information. Listed on NASDAQ

Automatic Data Processing: is a global provider of business outsourcing solutions. The

Company's services include a wide range of human resource, payroll, tax and benefits administration solutions. Automatic Data also provides solutions to auto, truck, motorcycle, and marine and recreational vehicle dealers. Listed on NASDAQ

Advanced Micro Devices: manufactures semiconductor products. The Company manufactures products that include microprocessors, embedded microprocessors, chipsets, graphics, video and multimedia products. Advanced Micro Devices, Inc. offers its products on a global basis. Listed on DJIA

Dell: offers a wide range of computers and related products. The Company sells personal computers, servers and networking products, storage systems, mobility products, software and peripherals, and services. Listed on NASDAQ

Google, Inc: is a global technology company that provides a web based search engine through its website. The Company offers a wide range of search options, including web, image, groups, directory, and news searches. Listed on NASDAQ

Hewlett-Packard Co: provides imaging and printing systems, computing systems, and information technology services for business and home. The Company's products include laser and inkjet printers, scanners, copiers and faxes, personal computers, workstations, storage solutions, and other computing and printing systems. Hewlett-Packard sells its products worldwide. Listed on DJIA

International Business Machines: provides computer solutions through the use of advanced information technology. The Company's solutions include technologies, systems, products, services, software, and financing. IBM offers its products through its

global sales and distribution organization, as well as through a variety of third party distributors and resellers. Listed on DJIA

Oracle Corp.: supplies software for enterprise information management. The Company offers databases and relational servers, application development and decision support tools, and enterprise business applications. Oracle's software runs on network computers, personal digital assistants, set-top devices, PCs, workstations, minicomputers, mainframes, and massively parallel computers. Listed on NASDAQ

Yahoo Inc.: is a global Internet media company that offers an online guide to Web navigation, aggregated information content, communication services, and commerce. The

Company's site includes a hierarchical, subject-based directory of Web sites, which enables users to locate and access information and services through hypertext links included in the directory. Listed on NASDAQ

Graph 1: MSFT vs. ORCL

Graph 2: MSFT vs. YHOO

Table 1:

                                           Summary  of  Forward  Selection  

 

               Variable          Number          Partial            Model  

Step        Entered            Vars  In        R-­‐Square        R-­‐Square          C(p)            F  Value        Pr  >  F  

 

   1          orcl                        1              0.3560            0.3560          360.649          693.77        <.0001  

   2          adp                          2              0.0786            0.4346          165.712          174.32        <.0001  

   3          adbe                        3              0.0342            0.4688          81.9272            80.76        <.0001  

   4          ibm                          4              0.0172            0.4860          40.8118            41.92        <.0001  

   5          goog                        5              0.0106            0.4967          16.1526            26.44        <.0001  

   6          dell                        6              0.0050            0.5017            5.5647            12.60        0.0004  

Table 2:

                                           Summary  of  Backward  Elimination  

 

               Variable          Number          Partial            Model  

Step        Removed            Vars  In        R-­‐Square        R-­‐Square          C(p)            F  Value        Pr  >  F  

 

   1          amd                            9              0.0000            0.5027            9.0617              0.06        0.8039  

   2          yhoo                          8              0.0003            0.5024            7.7288              0.67        0.4140  

   3          hp                              7              0.0002            0.5022            6.3297              0.60        0.4382  

   4          aapl                          6              0.0005            0.5017            5.5647              1.24        0.2663  

Table 3:

                                       Summary  of  Stepwise  Selection  

 

             Variable        Variable        Number        Partial          Model  

Step      Entered          Removed          Vars  In      R-­‐Square      R-­‐Square        C(p)          F  Value      Pr  >  F  

 

   1        orcl                                                1            0.3560          0.3560        360.649        693.77      <.0001  

   2        adp                                                  2            0.0786          0.4346        165.712        174.32      <.0001  

   3        adbe                                                3            0.0342          0.4688        81.9272          80.76      <.0001  

   4        ibm                                                  4            0.0172          0.4860        40.8118          41.92      <.0001  

   5        goog                                                5            0.0106          0.4967        16.1526          26.44      <.0001  

   6        dell                                                6            0.0050          0.5017          5.5647          12.60      0.0004  

Table 4:

         Studentized  Residual  analysis          17:54  Saturday,  November  7,  2009  2156  

 

           Obs        studres                        Date            msft              ibm            orcl            goog  

 

               1        -­‐7.457                  20090122        -­‐0.125        -­‐0.015        -­‐0.014          0.011  

               2        -­‐7.321                  20060428        -­‐0.121        -­‐0.019        -­‐0.023        -­‐0.005  

               3        -­‐6.961                  20090724        -­‐0.086          0.005          0.006          0.021  

               4        -­‐6.085                  20041115        -­‐0.090          0.006        -­‐0.029          0.016  

               5        -­‐4.554                  20080718        -­‐0.062          0.026          0.015        -­‐0.103  

               6        -­‐4.396                  20080201        -­‐0.068          0.018          0.006        -­‐0.090  

               7        -­‐4.027                  20080425        -­‐0.064        -­‐0.009        -­‐0.019          0.002  

               8        -­‐3.714                  20090403        -­‐0.028          0.014          0.025          0.020  

               9        -­‐3.073                  20090107        -­‐0.062        -­‐0.016        -­‐0.041        -­‐0.037  

         1251          3.192                  20060127          0.048          0.004        -­‐0.002        -­‐0.002  

         1252          3.228                  20060818          0.043          0.007        -­‐0.006        -­‐0.006  

         1253          4.084                  20060721          0.044        -­‐0.008          0.000          0.008  

         1254          5.383                  20081121          0.116          0.043          0.062          0.011  

         1255          5.916                  20071026          0.091          0.008          0.017          0.009  

         1256          6.250                  20090424          0.100        -­‐0.013          0.006          0.012  

         1257          6.465                  20081013          0.171          0.050          0.123          0.138

Table 5:

 

                                                                               

 

 

                                         Parameter              Standard                                                                                          Variance  

   Variable      DF              Estimate                    Error        t  Value        Pr  >  |t|            Tolerance            Inflation  

 

   Intercept      1        -­‐0.00026956          0.00039454            -­‐0.68            0.4946                            .                            0  

   adbe                1                0.12329                0.02137              5.77            <.0001                0.53758                1.86017  

   adp                  1                0.21074                0.03348              6.30            <.0001                0.56917                1.75695  

   dell                1                0.07190                0.02025              3.55            0.0004                0.65733                1.52130  

   ibm                  1                0.18750                0.03559              5.27            <.0001                0.51282                1.94999  

   orcl                1                0.23535                0.02604              9.04            <.0001                0.53531                1.86808  

   goog                1                0.09472                0.01982              4.78            <.0001                0.69456                1.43975  

SAS Code:

/* Analysis of associations between daily log stock returns of 11 software companies - Microsoft, Apple, Adobe, ADP, Advanced Micro Devices,

Dell, Hewlett-Packard, IBM, Oracle, Yahoo, Google */ proc import datafile = "D:\DePaul\logret_11stocks.csv" out =stocks dbms =csv replace; getnames=yes; run ; proc print data =stocks; run ;

/* check correlations */ proc corr ; var msft aapl adbe adp amd dell hp ibm orcl yhoo goog; run ;

/* individual scatterplots with msft as dependent variable */ proc gplot ; plot msft*aapl msft*adbe msft*adp msft*amd msft*dell msft*hp msft*ibm msft*orcl msft*yhoo msft*goog; run ;

/* analysis of individual stock returns */ proc univariate normal ; var msft aapl adbe adp amd dell hp ibm orcl yhoo goog; histogram / normal ; title "Normal probability plots" ; probplot / normal ( mu =est sigma =est); run ;

/* regression analysis */ symbol value = "plus" color =black; title "Regression analysis" ; proc reg ;

/* model selection methods */ model msft = aapl adbe adp amd dell hp ibm orcl yhoo goog / selection = cp slstay =0.05; model msft = aapl adbe adp amd dell hp ibm orcl yhoo goog / selection = adjrsq ; model msft = aapl adbe adp amd dell hp ibm orcl yhoo goog / selection = forward details slentry =0.05 slstay =0.05; model msft = aapl adbe adp amd dell hp ibm orcl yhoo goog / selection = backward slstay =0.05; model msft = aapl adbe adp amd dell hp ibm orcl yhoo goog / selection = stepwise slstay =0.05;

/*Residual plot: residuals vs x-variables*/ plot student.

*(aapl adbe adp amd dell hp ibm orcl yhoo goog);

/*Residual plot: residuals vs predicted values.*/ plot student.

* predicted.

; plot npp.

* student.

; run ;

/* reduced model */ proc reg ;

/* check for collinearity with variance inflation factor */ model msft = adbe adp dell ibm orcl goog / vif tol ;

/* check for influence of outliers */ model msft = adbe adp dell ibm orcl goog / r influence ;

output out =regout1 p =pred cookd =cookd; h =levg student =studres

/*Residual plot: residuals vs x-variables;*/ plot student.

*(adbe adp dell ibm orcl goog);

/*Residual plot: residuals vs predicted values.*/ plot student.

* predicted.

; dffits =dfts press =press plot npp.

* student.

; run ; proc export data =regout1 outfile = "D:\DePaul\regout1.csv" dbms =csv replace ; run ;

/* analysis of residuals */ proc univariate data =regout1; var studres; run ; proc sort data =regout1; by studres; run ; proc print data =regout1; var studres Date msft ibm orcl goog; where abs(studres) > 3 ; format studres msft ibm orcl goog 6.3

run;

/* predict new observation */ data newobs; input date msft aapl adbe adp amd dell hp ibm orcl yhoo goog; datalines ;

. . . -0.037 -0.050 . -0.045 . -0.042 -0.075 . -0.071

; data predict; set newobs stocks; run ; proc reg ; model msft = adbe adp dell ibm orcl goog/ cli clm ; run ;

Download