CHAPTER 1: INTRODUCTION This chapter introduces Microfit, and explains how to install it on your computer. You should read this chapter (quickly) even if you have already install Microfit. 1.1 WHAT IS MICROFIT, AND HOW SHOULD YOU USE THIS BOOKLET? Microfit is a computer program, for carrying out econometric analysis: it was designed for economists. Microfit is mainly intended for regression, but can also be used to enter data from the keyboard, and to draw graphs. Several versions of Microfit have been produced: this document is for Microfit 4 (windows) version, but all other versions of Microfit are for MS-DOS (they can be run under Windows, but were not intended to be used that way). Note that there is a MS-DOS version of Microfit 4, which is very similar to Microfit 4 for Windows. This document is written for students taking the ‘Quantitative methods for financial management’ module (FM105), which is part of the MSc in Financial Management run by CIEE (SOAS, University of London). This booklet assumes you have the folder ‘Quantitative methods for financial management’, and that you can understand the mathematics and statistics it contains. We hope you will be able to use this booklet without help; you may need advice from your tutor on some topics, but please read the relevant part of this booklet carefully before asking for help. We strongly recommend you read the whole of this booklet. This booklet does not assume that you have experience of Microfit (or any other computer program). However, it is essential that you have access to Microfit on a suitable computer (see below), and that you use your computer to experiment with the techniques explained in this booklet. Experimenting takes time, but is essential if you want to pass the MSc course. There are self-test questions throughout this booklet (shown by numbers in {} brackets); always try to answer them yourself (don’t immediately look at the answer). Chapter 1 (this chapter) prepares you to run Microfit on your computer. Chapter 2 and 3 explain some important features of Microfit, such as data files. Chapter 4 explains introductory statistics in Microfit. Chapter 5 to 8 involve regression: they are the hardest chapters, but the most important for your MSc. It is important to take regular breaks, so your brain can absorb ideas: this booklet requires you to cope with statistical theory at the same time as you use a computer, so it isn’t easy. You should take a break at the end of each chapter, and usually at the end of each section, so that you feel fresh when you return to Microfit. 1.2 CAN YOUR COMPUTER WORK WITH MICROFIT 4 (WINDOWS) ? 1 Before you install Microfit, check that your computer is good enough. You will need all of the following items. Windows: the Microfit ‘Windows’ operating system allows your computer to run two or more programs at once. Your computer must be running Windows when you install Microfit 4 (Windows): any Windows version should be fine, including Windows 3.1; Windows95; Windows98; and Windows NT. At the time of writing, Windows 2000 had not been released, but we expect Microfit 4 (Windows) to work on Windows 2000. Mouse: Windows relies on the use of a ‘mouse’ ( a hand-held device, to select an option from the screen), or something similar such as a trackball, to run Microfit 4 (Windows). Memory: This means a set of computer chips, which can store a computer program, data, etc. Memory is temporary: it will “forget” your data when you switch the computer off. Your computer must have at least 8 Megabytes of memory before you can run Microfit. Hard disk: a hard disk keeps data (and other files) more permanently than memory: the hard disk “remembers” files when the computer is switched off. You need at least 12 MegaBytes of available disk-space (as well as the 4 MegaBytes of disk-space needed to install Microfit). Printers: this document assumes that you have a printer attached to your computer: if you don’t have a printer, we recommend you buy one. But you can learn to use Microfit without a printer. If you do not have Windows on your computer, or do not have enough memory or disk-space, we recommend you buy a new computer. Any new IBM-compatible personal computer (portable, or desk-top) should be fine. Apple computers (such as the iMac) are not appropriate for Microfit. 1.3 USING THE MOUSE AND KEYBOARD: HOW TO “CLICK” A BUTTON This document uses the word “button” to refer to an icon which runs a small program if you select it. Within Microfit, if you move the mouse over a button (without pressing the mouse switch), a yellow-and-black box appears just below the button to explain what that button does. Confusingly, the word “button” is often used to refer to the switches on a mouse, which you press with a finger. This document will use the word “switch” (not button) to describe the part of the mouse your finger touches. To use the mouse, move it until the pointer on the screen points to the icon or word you want to choose; then press the left switch on the mouse. This booklet uses the word “click” as an abbreviation for “press the left switch on the mouse”. Some computers (especially portable computers) use an alternative to a mouse: see your computer manual for the equivalent to pressing the left mouse switch. You cannot do everything in Microfit with the mouse: you need to type some things into the keyboard. In this document, things you are required to type are shown in ‘small capitals’, like this: A: SETUP ENTER This document uses ENTER as a shorthand for “press the ENTER (or RETURN) key on your keyboard”. 2 1.4 CREATING A DIRECTORY FOR DATA FILES In order to make sure you can find the data files you will use for this document, you should create a new directory for data files. The way to do this depends on which version of Windows you are using: Windows version 3.1: If you are running Windows, exit to MS-DOS. Then type MKDIR C:\MFITDATA ENTER and then go back to Windows, by typing WIN ENTER Windows 95, Windows 98, or Windows NT: Point the mouse at the word Start (bottom-left of the screen) and click; point the Mouse to MS-DOS Prompt and click; then type MKDIR C:\MFITDATA ENTER EXIT ENTER 1.5 HOW TO INSTALL MICROFIT 4 (WINDOWS) Insert ‘disk 1’ of the set of Microfit disks (sent by CIEE) into your computer. The next step depends on which version of Windows you have: Windows version 3.1: Before installing Microfit, you must be running Windows: do this by typing WIN ENTER Point the mouse to the word File (top-left of the screen) and click: point to Run and click; then type A:SETUP ENTER Windows 95, Windows 98, or Windows NT: Point the mouse at the word Start (bottom-left of the screen) and click; point to Run and click; then type A:SETUP ENTER Now create a “shortcut” to Microfit: Find an empty part of the screen, and press the right switch on the mouse; then point to Create shortcut and type the directory and name of the program: C:\MFIT4WIN\MICROFIT.EXE ENTER Microfit often displays a button labeled Start; from now on, when this booklet refers to the Start button, it means the Start button in Microfit, not the Start button at the bottom-left of the screen. After that, follow the instructions on your screen to install Microfit. CHAPTER 2 ENTERING AND SAVING DATA In this chapter, you will type some data into a spreadsheet and use Microfit for the first time. 2.1 TYPING DATA IN A SPREADSHEET: CSV FORMAT Before using Microfit, we want to type in some data. The best way to type in data is into a spreadsheet, such as Excel (or Lotus 1-2-3). This document cannot teach you how to use a 3 spreadsheet program – if you have never used one before, ask a friend to show you how. The simplest way to transfer data from a spreadsheet program to Microfit is in CSV format. When using CSV format, you must obey all of the following rules: ﹡ The left-hand column must be a label: a date, or an observation number( start with 1 for the first row of data). For dates, use labels like 1979 (annual data), or 1979q1 (quarterly data), or 1979m1 (for monthly data). ﹡ The top row of the spreadsheet must contain labels: one for each data column. Each variable name must start with a letter, and should not contain special characters like commas or spaces. Use 7 characters or less for every variable name (so we can add L to indicate log of a variable, & D for differencing). ﹡ Type each data series in a vertical column. There must be no empty cells in the area of your spreadsheet which contains your data; if any data is missing, type #N/A in Excel (or @NA if you use Lotus 1-2-3 ) into each blank cell. The box in table 1 (on the right) shows London share prices from 9 th March to 13 th April 1999 (excluding weekends & bank holidays) for three firms: ‘Allied Irish Bank’, ‘Zambia Copper’, and ‘All Nippon Air’. These figures are end-of-day share prices in pence (adjusted for dividend payments) as reported in The Times newspaper the following day. For the rest of this booklet, I will refer to these variables as Allied, Zambia & Nippon. day Allied Zambia Nippon 1 1026.5 95 181.75 2 1018.5 94.25 182.25 3 1005 94.25 186.5 4 992.5 93.25 196.5 5 991 88.75 182 6 1009 88.25 188 7 1046 89.25 198 8 1063.5 94.5 197 9 1066 93.5 193.75 10 1057.5 93.5 189.75 4 11 1086.5 93.75 199.75 12 1052.5 94 199.5 13 1056.5 94.25 201.25 14 1070 94.75 203.75 15 1067 80 205.5 16 1067 80.75 200.75 17 1097.5 81 197.25 18 1095.5 81.5 200 19 1069 79.75 209 20 1060.5 80.5 214.5 21 1052 80.25 206.25 22 1083.5 80.75 204.25 Type the data on the right of this page into a spreadsheet program such as Excel, and save the data as file C:\MFITDATA\SHARES1.XLS in spreadsheet format (or, if you use Lotus 1-2-3, save it as file C:\MFITDATA\SHARES1.WK1). The name SHARES1 reminds you that it contains data on share prices; the number 1 in SHARES1 indicates that it is the first version of this dataset. Any filename you choose should consist of 8 characters or less, not containing special characters like spaces. You can keep this spreadsheet file in case you decide to add more data to your dataset, but it cannot be read by Microfit. The next step is to create a file that Microfit can read: CSV format. To do this in Excel, click on the word File (top-left corner of the screen); click on Save as and then File of type: and select format CSV (comma delimited). Click the box labeled Filename: and type C:\MFITDATA\SHARES1.CSV as the filename. If you use Lotus 1-2-3 (or a similar program), you save a CSV file by “printing” the data to a file. If you have a printer, you may wish to print this spreadsheet: you can use the printout to make sure you have typed the data correctly, because it is not easy to print data from Microfit. It is also possible to use the spreadsheet program to draw a graph of the data, but there is no need to do so: creating graphs is easier in Microfit. 5 2.2 STARTING MICTOFIT You should now leave the spreadsheet program. Assuming you have installed Microfit (as explained in section 1.5), you can now start Microfit whichever Windows version you use, you should see in icon for Microfit somewhere on your screen (this icon is four jigsaw pieces colored brown, green, blue&red). Point the mouse at this icon and “double-click”, which means press the left mouse switch twice within about half a second (if you find this difficult, there is an another way: press the left switch on the mouse once, and then press the ENTER key on your keyboard). Microfit may take a few minutes to start, so relax. When Microfit has finished loading itself into your computer’s memory, you should see a window like this: The above image was created using Windows 3.1; if you have a different version of Windows (such as Windows 95 or Windows98), then your screen will not look exactly the same as that reproduced above. 2.3 TYPING DATA IN MICROFIT Before you can use Microfit, you need to type in some data. For simplicity, we will start with one variable. This variable is the end-of-day price of undated UK Treasury 2 1 % undated 2 stock, from 9 th March to 13 th April 1999, excluding weekends & bank holidays (this is the same time-period as the share price data you typed into a spreadsheet, in section 2.1 above). These Treasury stock price are in pence, adjusted for dividend payments, as reported in The Times newspaper the following day; this variable will be referred to as 6 Treasur in this booklet. Before you can enter the data into Microfit, you must prepare Microfit by giving details on the new variable(s) you intend to create. This is done by moving the mouse to the word File (top-left corner of your screen) and clicking; then moving down the list to the word New and clicking. Microfit will ask you about the frequency of the data you want to type in (you must select “undated” data, because Microfit does not have a category of daily data), and the number of observations and variables you are going to type: click on the word Undated to change the “Data frequency”; click the “Number of Observations” box, & type 22 ENTER click the “Number of variables:” box, and type 1 ENTER Now click on the OK button, to go to the next screen shown by Microfit. Microfit will now give you a chance to choose the name of the new variable, and suggests the name “X1” (highlighted in black). It is better to choose a name that will remind you what the data represent, so replace the name “X1” by the name TREASUR (there is another box to the right of the variable name , which you can use for a description of this variable if you wish). Then move the mouse to Go button and click. You can now type in the data. You do not need to type the observation number: Microfit already shows this on the screen, as number 1 to 22. Move the mouse to the top row of data (observation 1): delete the contents of the box (*NONE*), and type the first observation, which is 51.8 ( as shown in table 2 below.) Then, when you press the downward arrow on the keyboard, Microfit will take you to the place to type in the next observation. Type in the rest of the observations shown in table 2. When you have finished typing, check you have typed the data correctly; then move the mouse to the Go button (towards the right of the screen), and click. TABLE 2: TREASURY STOCK day Treasur 1 51.8 2 52.2 3 52.0861 4 52.38 5 52.4206 6 53.1915 7 53.0989 8 52.8926 9 52.13 10 52.4993 11 52.02 12 51.9092 13 51.5521 14 52.1282 15 51.5014 7 2.4 16 52.126 17 52.79 18 53.14 19 53.19 20 53.69 21 53.49 22 53.3 SAVING A DATASET IN MICROFIT FORMAT Before you do anything else, you should now save the above data (variable Treasur) in a Microfit format data file. To do this, move the mouse to the word File (top-left of the screen), and click; then point to Save and click; then type in C:\MFITDATA\TREASURY.FIT (what you type will appear just below the words File name:). The file name “TREASURY” should remind you that the file contains data on the price of a Treasury stock. Note the slight difference between the two names: TREASURY is the name of the file, and Treasur (without the letter “y”) is the name of the variable it contains. Now move the mouse to Ok and click; Microfit will ask if you want to keep all observations – you do, so move to Ok and click. Your data should now be saved on the hard-disk of your computer. Remember to keep a copy of this file on diskette – one way is to save it again, using the filename A:\TREASURY.FIT (then write on the label of your diskette, so you know this diskette is for Microfit files). 2..5 LEAVING MICROFIT Before you do anything else, you should now save the above data (variable Treasur) in a Microfit format data file. To do this , move the mouse to the word File (top-left of the screen), and click; then point to Exit and click. Microfit warns you that you will lose any unsaved data – assuming you have just saved the data, this is no problem, so move to Ok and click. CHAPTER 3: WORKING WITH DATA FILES In the previous chapter, you typed in some data; in this chapter, you will read that data into Microfit, and save it in a new (Microfit format) file. 8 3.1 READING A “CSV” FORMAT FILE I hope you have taken a coffee-break since working on chapter 2, so that you feel refreshed and able to cope with the next set of tasks. Go back into Microfit, as explained in section 2.2 above. In section 2.1, you typed some data into a CSV file; you should now read the CSV file into Microfit. Point the mouse at the word File (top-left corner of the screen), and click; point the mouse to the word Open and click; point the mouse to the down-pointing arrow just below right of List files of type and point the mouse at CSV files. Now move the mouse up and point at the box immediately below the words File name: and type C:\MFITDATA\*.CSV and press the ENTER key. You should now see the filename SHARES1.CSV appear just below where you typed C:\MFITDATA\*.CSV so point the mouse at it, and click. Now move the mouse to the right and click on the word Ok. Microsoft gives you some information about CSV files (click on Ok), and then warns you that CSV files can be slow to read (click on Yes), and then tells you that the file has been read successfully (click on Ok). You should now check the data has been read into Microfit correctly, using table 1(in section 2.1). To see the data on the screen, move the mouse to the Data button (near the top of the screen, slightly to the left) and click. 3.2 ADDING A FILE TO THE DATA IN MEMORY We will now combine the data you just read into memory (from the CSV file), with the Treasury stock prices you saved as a Microfit file C:\MFITDATA\TREASURY.FIT, in section 2.4 above. Move the mouse to the word File (top-left of the screen) and click; then point to Add and click; then type in the filename C:\MFITDATA\TREASURY.FIT (what you type will appear just below the words “Filename:”). Then move to Ok and click. You should now find four variables in Microfit’s data store: three (Allied, Zambia&Nippon) from the CSV file, and one (Treasur) from the C:\MFITDATA\TREASURY.FIT – you should check this, by clicking on the word Data near the top of the screen. We will want to use this dataset later, so save it as file C:\MFITDATA\SHARES2.FIT (if you can’t remember how to save a file, see section 2.4 above). 3.3 READING A MICROFIT FORMAT FILE Now make sure you can read back the Microfit format file you created. Point the mouse at the word File (top – left corner of the screen), and click; point the mouse to the word Open and click. Move the mouse up and point at the box just below the words File name: and type C:\MFITDATA\*.FIT and press the ENTER key. Filename SHARE2.FIT should appear just below where you typed C:\MFITDATA\*.FIT so point the mouse at SHARE2.FIT and click. Move the mouse to the right and click on OK. Microfit warns you that you will lose any unsaved data, but this is not a problem: you have saved the dataset. So, click Ok to read in your file. 9 3.4 HOW TO USE MORE VARIABLES THAN YOU CAN FIT INTO ONE CSV FILE This section is for future reference only: you will need to use it when you do your own research, if you want to create a spreadsheet file containing a large number of variables. For the present, you can skip the rest of this section, and go to section 3.5 below. There is a limit to how many columns of data you can use in a CSV file: each line must be less than 256 characters long. The number of columns you can fit in one file depends on the width of your columns. So it is important to check the right – hand column of your dataset in Microfit, to make sure it has been read in correctly. If the right – hand column of data in Microfit is not the same as you typed into the spreadsheet, then you will need to do the following: (1) If the right – hand variable has only been partly read into Microfit, erase that variable. Then save the present (incomplete) dataset as a Microfit file, as explained in section 2.4 above. Write down which variable are saved in this .FIT file. (2) Go back to Excel or Lotus 1-2-3; going from left to right, find the first column, which is not saved in the .FIT file. Insert two blank columns immediately to the left of this, and copy column (containing dates or other labels) into the right-hand column you have just inserted. Save this (dates or labels) column, and everything to the right of it, as a new .CSV file – as explained in section 2.1 above. (3) Go back to Microfit, and read in the second .CSV file as explained above. Save it in Microfit format, with a new filename. Check that the second file includes all variables, which were not in the first .FIT file (if not, you will need to create a third .CSV file). (4) Combine all of these .FIT (Microfit format) files, as follows: read in the first .FIT file into Microfit as explained in section 3.4 above, and then add the other .FIT files as explained in section 3.2 above. Finally, save the complete dataset as a new .FIT file. 3.5 MANAGING DATA FILES If you have worked through all of the previous sections in this booklet (except for section 3.4), then you should now have created four data files, of various types. There is a risk that you may lose track of file: which is the most recent version of a data file? And how can you tell if a file is a spreadsheet file, or a CSV file, or a Microfit file? Here is a list of data files, in the order they were created: C:\MFITDATA\SHARE1.XLS (or C: MFITDATA\SHARES1.WK1) C:\MFITDATA\SHARE1.CSV C:\MFITDATA\TREASURY.FIT C:\MFITDATA\SHARES2.FIT The first point to observe is the last three letters of the filename: this tells us what type of file 10 it is. There are three types of file in the above list: .XLS (or .WK1) spreadsheet files in Excel (or Lotus 1-2-3) format .CSV temporary files, to convert from spreadsheet to Microfit .FIT data in Microfit format It is vital to be systematic with computer files. There is a risk that you may get to confused that you have to type in a dataset again. Or suppose you find a typing mistake in a dataset you typed, make a correction, and save it with a different filename: you may get confused as to which file is the correct version. We don’t mind you retyping data, but we are concerned that you may make mistakes in your research. So, number your files. In the above list, number 1 in filename C:\MFITDATA\SHARES1.CSV reminds you that it is the first version. File C:\MFITDATA\SHARES2.FIT is more recent than C:\MFITDATA\SHARES1.CSV (later in this booklet, we will make more changes to this file, and save it as file C:\MFITDATA\SHARES3.FIT). Here are some other tips you may find useful: ﹡ choose a file name you will remember ( for example, MYDATA.FIT is a stupid name!). ﹡ erase all .CSV files after you have used them; keep .XLS (or .WK1 files), and .FIT files. ﹡ keep all of the files for one assignment in the same directory. ﹡ use the hard disk of your computer (not diskettes) as the main version. ﹡ keep backup copies on diskettes (if your hard disk breaks, every file on it may be lost). ﹡ label each diskette you use, so you know what files are on it. Some students are too lazy to keep back-ups; they regret it when things go wrong later. You should get into the habit of keeping at least one back-up copy of every file you create. CHAPTER 4: CREATING VARIABLES AND GRAPHS This chapter introduces statistical analysis and graphical presentation of data. 4.1 CREATING & DELETING NEW VARIABLES You will probably need to create several variables in Microfit. Let’s start by the simplest: a constant. You should include a constant term in every regression you run. There are two ways to create a constant in Microfit: for both of them, you need to have a dataset to work on, so load file C:\MFITDATA\SHARES2.FIT into memory (if you can’t do this, see section 3.3). When you have read in a dataset, you should be taken straight into the 11 process window; but to make sure you are in the Process window, click on the Process button near the top of the screen. One way to create a constant is to use the button created by Microfit. To do this, click on the button labeled = => Constant (at the bottom-left of the screen). Microfit asks you for a name for this constant, so type the name CON in the box, and click the Ok button (lower down the screen). There is a second way to create a constant. Still in the Process window of Microfit, type in CONSTANT = 1; and click the Go button (on the right). It appears that nothing happened, but Microfit has created a new constant. To check, point the mouse to the Data button (top – center of the screen) and click. You should see four variables (Allied, Zambia, Nippon, & Treasur) and two constant (CON & CONSTAN). What is the value of each 1 constant? { } Now, create another new variable: this time, the log of an existing variable. Microfit calculates log to base e, which is often called ‘natural log’. Calculate the log of Zambia Copper’s share price, as follows: click on the Process button (top – center of screen). If you see the line you just typed in (CONSTAN =1;) then remove it using the delete key on your keyboard. Type LZAMBIA = LOG(ZAMBIA); and click the Go button. To make sure this variable has been created, click on the Data button, and check that LZAMBIA has the value of 4.55387689 on day 1. There is no need to keep both constants (CONSTAN and CON): they are identical. Erase CON by clicking on the Process button; type in DELETE CON; and click the Go button. Make sure CON has been removed, by clicking on the Data button: you should still see CONSTAN but CON should now have been removed. Now save the file, as C:\MFITDATA\SHARES3.FIT (see section 2.4 if you can’t do this). 4.2 MEAN, STANDARD DEVATION, & CORRELATION BETWEEN VARIABLES Now, we can look at some statistics on the data you typed in. Click on the Process button, and type in COR ZAMBIA NIPPON; and click the Go button. Microfit will display the information about the two variables, shown in table 3 on the right of this page. The “mean’ and ‘standard’ deviation’ rows can provide a useful check on your data: if you do not get the same results, check you typed the data correctly. 12 Table 3: SUMMARY STATISTICS We might expect that share prices would tend to follow a pattern, being high in times of optimism but low when markets are pessimistic: some factors (such as a global recession) may tend to reduce profits for firms in general, making shares less desirable. If so, we would expect share prices for a typical firm to be correlated with most other share prices. Is there a correlation between ‘Zambia Copper’ and ‘All Nippon Air’? To test this, click on the Close button (lower down the screen); Microfit now produces a “Correlation Matrix” between the two shares, as shown in table 4 below. TABLE 4: CORRELATION COEFFICIENTS Is there a correlation between the two share prices, according to table 4? The top – left number is 1.0000, which tells us that the ‘Zambia Copper’ share price is perfectly correlated with itself. The number we are interested in is -.58803 which is the correlation coefficient between Zambia and Nippon (if you do not produce this number on your computer, check you typed the data correctly). This number tells us that there is a negative correlation between these two share prices. If you would like a printout, look near the top of your screen for the world Result. Now look below this, on the left half of the screen: there are seven buttons. The button on the far left represents a printer: to check this, move the mouse so the pointed is over this button (but don’t press a switch on the mouse). You should see a black – and – yellow label appear, which says “Print”. Now click on the printer button, and you will see another menu; just click on the OK button. You should now get a printout of the correlation matrix. Now click on the Close button, to get back to a previous Microfit screen. As an exercise, use Microfit to work out the correlation coefficients between all three variables in the 13 dataset; are Allied share prices correlated with Nippon? Check your results with answer 2 { } at the end of this booklet. 4.3 GRAPHS If you type data in a spreadsheet, you can use the spreadsheet program to draw graphs. However, it is generally easier to create graphs in Microfit: to do so, click the Process button, and type PLOT ZAMBIA; and click the Go button. You should obtain a graph like chart 1 on the right of this page. Chart 1: GRAPH OF ZAMBIA SHARE_PRICE OVER TIME Note that Microfit can draw two graphs on the same axes. To try this, click on the Process button and type in PLOT ZAMBIA NIPPON; and click the Go button. As an exercise, create a graph of the Treasur stock price against time, in the same way as you just did for Zambia & Nippon. CHAPTER 5: THE CLASSICAL LINEAR REGRESSION MODEL This chapter introduces regression in Microfit: it uses OLS regression, with only one explanatory variable. Later chapters will discuss more complex forms of regression. 5.1 WHAT IS REGRESSION TRYING TO ACHIEVE The aim of regression is to see if one variable is dependent on one (or more) other variables. For example: suppose you measure the height of a tree near your home each year, and find data on the world population, and the type the data into Microfit. If you calculate the correlation coefficient of ‘tree height’ against ‘world population’ (as explained in section 4.2), you would find a positive correlation: but this does not mean that growth of your tree is a cause or effect of world population – the correlation is simply because both grow over 14 time. Regression results are generally better than correlation coefficients for detecting a link between variables, for several reasons: on reason is that there are a number of diagnostic tests which are produced with regression, and these diagnostic tests can warn us if there are problems with the regression. If a regression fails one or more diagnostic tests, then we should treat the results of that regression as unreliable. In the tree example (previous paragraph), we would probably find that the regression had a problem of several correlation (explained in section 7.3 below); this would warn us not to trust the apparent link between tree height and the world population. You should always be cautious when interpreting regression results. Even if regression results suggest a link between two variables, and the diagnostic tests are acceptable, we cannot be sure which variable is the ‘cause’ and which is the ‘effect’ (or the variables may move together because they are both caused by something else). One way to decide which is cause and which is effect is to use lagged variables (such as the one you will create in section 7.3 below), to look for a delayed effect: if one change happens before another and regression results suggest a link between the events, then it seems likely (but not certain) that the first event causes the second. 5.2 A SIMPLE REGRESSION EXAMPLE Go back to the dataset C”\MFITDATA\SHARES3.FIT which you created in chapter 4 (if you don’t remember how to read a dataset into memory, see section 3.3). For the first time we try regression in Microfit, let’s take a simple example: one dependent variable, one explanatory variable, and a constant, using OLS (Ordinary Least Squares) regression. We wish to test the equation Zambia = + (Nippon) + u where u represents the error term; and are coefficients which Microfit will estimate for us. EQUATION 1 is unchanged if we multiply Zambia = [EQUATION 1] by 1, so we can rewrite it as (1) + (Nippon) + u We must tell Microfit to estimate an equation with Zambia dependent on 1 and Nippon (recall from section 4.1 that CONSTAN is equal to 1). Microfit will calculate the error term u (Microfit refers to it as the ‘residual’ or ‘disturbance’), but you do not need to type in u or or . Begin the regression by clicking the button labeled Single (near the top right of the screen), and typing: ZAMBIA CONSTAN NIPPON (you must not type a semicolon at the end of the line). Microfit assumes the first name you type (Zambia, in this case) is the dependent variable. Then click the button labeled Start 15 near the top – right corner of the screen. You should now see information like this on the screen: TABLE 5: REGRESSION RESULTS It is essential that you learn to interpret regression results. Look at the above results: what is the value of values of estimated by Microfit? And what is the estimate of ? Write down the and , and then check what you have written with answer { 3 } at the end of this document. 5.3 GOODNESS – OF – FIT STATISTICS In addition to estimating coefficients such as and , Microfit calculates the ‘error’ term for statistics and diagnostic tests. The error term (often called the “residual”) is the term labeled u in EQUATION 1 above. The first goodness – of – fit statistic Microfit 2 reports is the “R – Squared” value (which is often written R ); this gives us a measure 16 of the proportion of variation of the dependent variable, which is explained by variation 2 of the independent variables(s). In this regression, the R value is .34578 which tells us that 34.578% of the variation in Zambia is explained by variation in Nippon (make sure you can find number .34578 in table 5). On the right of the R – squared statistic, Microfit reports the R-Bar-Squared statistic: this is a modified version of R-Squared (discussed in section 7.5 below). Now compare these regression results with the correlation coefficient between Nippon&Zambia, calculated in section 4.2 above: the correlation coefficient is –0.58803 and the coefficient (calculated by regression) is also negative. If there is only one explanatory variable, the coefficient from OLS regression must be the same sign as the correlation coefficient (for the same pair of variables). There is another 2 connection between regression and correlation: the R value of the regression must be equal to the square of the correlation coefficient (this only applies to a regression equation with a single explanatory variable, so it will not apply to the next chapter of this booklet). In this case, (-0.58803) 2 is equal to .34578 (approximately). You can ignore the post – regression menu for now. The simplest way back to a familiar Microfit screen is to click the Cancel button at the bottom of the screen; click the Cancel button again; and then click the Process button. 5.4 SIGNIFICANCE LEVEL: IS A PATTERN SIGNIFICANT, OR JUST “RANDOM” There is a decision to take, when doing research: you should choose a probability level, which you consider ‘significant’. What you are choosing is how “unusual” a result must be, before you consider it notable. For example, the average person is between 5 and 6 feet tall; but how tall would a person need to be before you describe them as “unusually” tall”? If you decide that anyone taller than six feet is ‘unusually tall’, then you have an objective test, which you could apply to everyone you meet. However, another researcher might consider a different height to be ‘unusually tall’. It is desirable to have an objective way of deciding which statistics are ‘unusual’, and which are not. There is a convention in social sciences: each researcher should decide on a significance level, and use this level to decide if a statistic is ‘unusual’. So, what level of significance should you adopt? Some researchers adopt 1%; but most social science researchers adopt 5% as the significance level. I recommend that you adopt the 5% level, unless you are instructed otherwise. For the remainder of this document, I adopt the 5% significance level, but remember that this level is arbitrary: my only reason for using 5% is that most researchers do so. Refer to the example of the first paragraph in this section: how would you apply a 5% significance level to heights? The answer is to obtain data on the heights of a large 17 sample of people, and arranges them in height order; select the tallest 5% of the sample; find the height of the smallest person of these tallest 5% in the sample (let’s call this H). From then, you would say that anyone taller than H is ‘unusually’ tall, but anyone less than H in height is not unusually tall. CHAPTER 6: (UNIT 6) MULTIPLE REGRESSION This chapter explains the diagnostic tests reported by Microfit, and introduces multiple regression. 6.1 DIAGNOSTIC TESTS: ARE REGRESSION RESULTS RELIABLE? Microfit carries out various tests, which can warn you if there is a problem with a regression. One of these is the Durbin – Watson statistic, which Microfit calls the “DW – statistics”. This statistic should be around 2. Look back to table 5 (section 5.2 in the previous chapter). The value of the DW statistic is .78752, which suggests a problem: this is not close to 2. It is not obvious how close DW has to be to 2, for the regression to be acceptable; so I recommend that you use the serial correlation test (discussed below), and ignore the DW statistic. Now look further down the previous regression results (table 5, section 5.2). Look half – way down the table, for the words “Diagnostic Tests”. Below these two words, there are tests for four possible problems with the regression: serial correlation; functional form; normality; and heteroscedasticity (we will consider each of these below). Microfit reports a probability for each statistic in square brackets; for these four tests, any number in square brackets less than the chosen significance level (usually 0.05: see section 5.4) indicates a problem with that test, and hence with the regression; all results of that regression are unreliable. For three of these four tests (serial correlation, functional form, and heteroscedasticity), there are two alternative tests: they are labeled “LM version” and “F version”. Here, “LM” stands for ‘Lagrange Multiplier’, and “F” refers to the F-distribution; but for this booklet, you need not know how they are calculated. Usually, it does not matter whether you study the LM or the F version of the test, because they give the same result: a regression will pass both LM and F tests, or fail both tests. If a regression passes the LM tests but fails the F test (or vice versa), then the results are unclear; you can report them, but the findings are unreliable. The first ‘problem’ Microfit looks for is serial correlation; this is similar to the Durbin – Watson (DW) test discussed in the previous section. In general, this serial correlation test is more reliable than the DW test, because DW only tests for first – order serial 18 correlation; but the serial correlation test in the Microfit ‘Diagnostic Tests’ section th th considers serial correlation up to 4 order for quarterly data, or 12 order for monthly data. In the case of annual or undated data, Microfit limits this serial correlation test to st 1 order serial correlation (i.e. the same as the DW test). Nevertheless, even for our data (which Microfit treats as undated), this serial correlation test is better than the DW statistic because in this second serial correlation test, Microfit reports a probability level [in square brackets]. According to table 5, this regression does have a problem with serial correlation: the probability is reported as [.005] and [.004] for the LM and F version (both well below 0,05); I return to this problem below. The second problem is ‘functional form’. This problem arises when you choose an inappropriate regression specification. For example, suppose there is actually a linear relationship between a share price and its annual dividend (assuming all shares have a similar level of risk). But suppose we ran an inappropriate regression: ‘number of shares’ (meaning the number of shares you can buy for $100) dependent on ‘dividend’. We would not expect a linear relationship between ‘number of shares’ and ‘dividend’, because the share price is inversely related to ‘number of shares’. The fact that this relationship is non – linear should be picked up by Microfit’s ‘functional form’ test. In general, if your regression fails this test, you should transform one or more variables in your regression. If you do not know the correct functional form, it may be worth starting with the log of one or more variables (as explained in section 4.1 above), and using log variable(s) in your regression instead of raw data. Table 5 indicates that the probability of having an appropriate ‘functional form’ is [.196] or [.227], so neither LM nor F versions suggest a problem. The next potential ‘problem’ is the absence of normally – distributed errors. The theory underlying OLS regression makes various assumptions, including the assumption that the residual term u is normally – distributed (Pesaran M.H.& Pesaran B., 1997, Working with Microfit 4.0: p.72). If a regression fails the ‘normality’ test, then the error term is not normally – distributed, so we should not trust the regression results. A regression fails this test if the number in square brackets is below 0.05 (see section 5.4). In table 5, this probability is [.426] which is more than 0.05, so the regression passes this test: the residuals are (approximately) normally – distributed. If a regression fails this test, you could try calculating new variables based on a transformation (such as the log) of the variables, and run a regression with these new variables. The final diagnostic test is for heteroscedasticity. This examines whether error term u is related to the explanatory variables. Ideally, we want the residuals to be just random; but if there is a clear pattern in the residuals (such as a tendency for residuals to increase over time), then the regression results are suspicious. For example, suppose we estimate a regression where the dependent variable doubles every year, but the explanatory variable has a linear trend (increasing by a fixed amount each year, approximately): this regression might seem reasonably successful for the earliest observations, because the annual increase in the dependent variable is only a few £; but the residuals will tend to 19 grow in the later observations ( when the dependent variable increases by many £s per year). If you find heteroscedastic errors, you should create new variables which are transformations of the original variables (including the dependent variable): a good starting – point is to take the log of all variables in your regression, and run a regression using these new variables. In the case of table 5, the numbers in square brackets are [.184] (for the LM version) and [.201] (for the F version); both are above 0.05, so the regression “passed” this test – in other words, heteroscedasticity is not a problem in this regression. In the “Diagnostic Tests” section of table 5, two numbers in square brackets are below 0.05: the LM and F tests for Serial Correlation. Hence, this regression fails the test for serial correlation, so these regression results are unreliable (despite the fact that none of the other three tests indicate a problem). We will try to solve this serial correlation problem in chapter 7; but before then, we will study variables (in a different dataset), which do not have such a serious autocorrelation problem. 6.2 AN EXAMPLE OF MULTIPLE REGRESSION The remainder of this chapter does not use the data file (C:\MFITDATA\SHARES3.FIT) you typed and used earlier, because we are not yet able to solve the autocorrelation problem it contains (we will solve it in chapter 7). But do not delete that file: we will use it again in chapter 7 and 8. For this chapter, we use dataset C:\MFIT4WIN\TUTOR\PTMONTH.FIT provided with Microfit 4; I refer to it as the PTMONTH dataset. It contains USA data on the ‘SP500’, the Standard & Poor portfolio of 500 shares. This file should have been copied to your computer when you installed Microfit (if you cannot find it, you may need to re – install Microfit on your computer). In Microfit, load the PTMONTH dataset into your computer memory as explained in section 3.3 above. Now look at the screen, under the button labeled Data and you should see the words “Current sample” followed by some information about this dataset. How many observations, and how many variables, does it 4 contain? Check your answer with { } at the end of this booklet. It is standard practice to include a constant in any regression you estimate, but this dataset dose not includes one. So create a constant, using the name CONSTAN (as you did in section 4.1). Next, create a time – trend variable: click the Process button, and then click the Time trend button; you need to type in the name of a new variable, so type MONTH and click the Ok button. We wish to study the connection between the (weighted average) return on a portfolio (vw), and the level of dividends on this group of shares (divSP0, by testing the equation vw= (month) (divSP) u [EQUATION 2] 20 This regression has two explanatory variables (month & divSP), unlike EQUATION 1 (in section 5.2) which had only one explanatory variable; so EQUATION 2 is an example of multiple regression, whereas EQUATION1 was not (it only had only t=one explanatory variable). Now, test the regression specified in EQUATION 2 by clicking the button labeled Single (top right of the screen). Do not use the Multiple button: that refers to Vector AutoRegression, which means using several dependent variables at once (Vector AutoRegression is beyond the scope of this booklet): confusingly, the Single button is the one to use for the ‘multiple linear regression’ referred to in unit 6 of the ‘Quantitative methods for financial management’ folder. Now type: VW CONSTAN MONTH DIVSP and then click the button labeled Start (top – right corner of the screen). Look at four the diagnostic tests (as we did in section 6.1): which of these four tests did the 5 regression “pass”, and which did it “fail”? Check your answer with { } at the end of this booklet. Now click the Close button (which takes us to the ‘Post regression menu’); click the Cancel button, and (after Microfit puts up another menu) click the Cancel button again. 6.3 WHICH FUNCTIONAL FORM SHOULD YOU USE? Section 6.1 discussed four diagnostic tests, each of which checks for a problem in a regression. For three of these problems, I suggested that if the problem occurs, you might be able to solve it by creating new variables, which are transformations of the original data. In particular, I suggested that computing the log of a variable may help. Sometimes, other transformations may be better. For example, rather than regressing ‘number of factory closures’ on ‘total output of the industry’, it may be better to calculate [1/(number of factory closures)] as a new variable, to represent the average life – span of a factory. How can you tell which transformation to use? As a guide, it is desirable that every variable you use in a regression is normally – distributed: this applies to both explanatory and dependent variables. So if a variable is far from being normally – distributed, you should consider creating a new variable, which is a transformation of it. Sometimes, this seems impossible: for example, you cannot transform a dummy variable (explained in section 7.2) to make it closer to a normal distribution. The log transformation (recommended in section 6.1) is often useful, but has limitations: the log of zero, and the log of a negative number, are not defined. This means that if any observation of a variable is negative, then you cannot use the log transformation on that variable (if some observations of variable X are zero, but none are negative, then you could consider creating the transformation Log (X+1) to solve this problem). You can see if a variable is normally – distributed, by using the HIST command. You need to type: 21 LDIVSP =LOG (DIVSP); ENTER HIST DIVSP; ENTER HIST LDIVSP; ENTER The first of the above lines created a new variable, called LdivSP. The next two lines produces histograms of divSP&LdivSP. These two charts are shown side – by – side below (as charts 2 and 3), to show how the distribution of LdivSP is different to that of divSP. Looking at charts 2 and 3, which of these 2 variables do you think is closer to a normal distribution? chart 2: A SKEWED DISTRIBUTION chart 3: A SYMMETRIC DISTRIBUTION Charts 2 and 3 show the distribution of a variable before and after a log transformation: the left – hand chart is a histogram of divSP, and the right-hand chart a histogram of the log of divSP. On each histogram, Microfit superimposes a normal distribution: a bell – shaped curve line. By comparing this bell curve with the histogram we can see that the left – hand histogram is ‘skewed’ (asymmetrical): it has a long ‘tail’ on the right (a few values are much higher than the average) so divSP is far from normally – distributed. The histogram in the right – hand chart is closer to the normal distribution curve. The distribution in the left – hand chart is a common pattern in economics & finance: for example. a similar distribution applies to incomes (the few people who are millionaires would form a long ‘tail’ on the right of such a diagram). In such cases, taking the log often makes a variable, which is closer to a normal distribution. Chart 2 & 3 suggest that LdivSP is closer than divSP to a normal distribution; in the following chapter (section 7.1), we will modify a regression by changing divSP to LdivSP. 6.4 COLLINEARITY Unit 5 of the ‘Quantitative methods for financial management’ folder discussed the 22 assumptions required for OLS regression to produce ‘Best Linear Unbiased Estimators’; several of these assumptions can be assessed using the diagnostic tests produced by Microfit (as examined in the previous chapter). The same diagnostic tests apply to this chapter, but there is now another complication: collinearity. The word ‘collinearity’ describes a regression where two or more explanatory variables are closely related to each other; the word ‘multicollinearity’ has the same meaning . Suppose we measure the output of workers of different ages, and find their ages and amount of work experience, We could run a regression with output as dependent variable, with age and work experience as explanatory variables. But work experience is likely to be closely correlated with age, so it is difficult to separate the effects of these two factors. Collinearity could not arise in chapter 5, because there was only one explanatory variable. But in this chapter, we use more than one explanatory variable. Is collinearity a problem in our latest regression? Look at the diagnostic statistics (produced by Microfit) reported in table 6 (section 7.1): which of these warns us about collinearity? The answer is none of them. This is not simply a weakness of Microfit – econometricians have not yet agreed how to measure collinearity, or how much collinearity is “too much” for regression results to be reliable. Bryman & Cramer suggest measuring the correlation coefficient between explanatory variables, and rejecting a regression if there is a correlation coefficient greater than 0.8 (or less than –0.8) between any two of the explanatory variables (Bryman A. & Cramer D., 1990, Quantitative data analysis for social scientists, Routledge: London, p.236). Other writers (e.g. Pesaran M.H. & Pesaran B., 1997, Working with Microfit 4.0: interactive econometric analysis, OUP: Oxford, p.191) would disagree, claiming that when deciding if collinearity is a problem, we should consider not just correlation coefficients between explanatory variables, but also the sample – size. For this course, we expect you to understand the problem of collinearity, but we do not require you to test for it (if you wish to test for collinearity, you can use the COR command to find correlations between variables: see section 4.2). In extreme cases of collinearity, Microfit cannot estimate a regression, and shows the message Correlation matrix near singular (possible multicollinearity) ! **CALCULATIONS ABANDONED** If you see the above message, you should drop one of your explanatory variables (or obtain more data), and then run the regression again. CHAPTER 7: (UNIT 7) TOPICS IN MULTIPLE REGRESSION This chapter focuses on some issues often encountered in multiple regression, including serial correlation. 23 7.1 DUMMY VARIABLES The regression in section 6.2 (using dataset C:\MFIT4WIN\TUTOR\PTMONTH.FIT) didn’t pass all diagnostic tests. The problem is ‘outliers’, including the October 1987 crash (Pesaran M.H.& Pesaran B., 1997, Working with Microfit 4.0, p.243); such outliers are ‘shocks’, when information suddenly becomes available to stock markets. We will solve this using dummy variables where residuals indicate a shock; if you can, it is better to include a variable, which measures the cause of shocks. I found that three dummy variables are sufficient to correct for non–normality of residuals: October 1974, January 1975 & October 1987. Create a dummy variable: click the Process button & type: $ CREATE A DUMMY VARIABLE; ENTER OCT74 = 0; ENTER SAMPLE 1974M10 1974M10; ENTER OCT74 = 1; ENTER SAMPLE 1948M1 1993M5; ENTER LIST OCT74; ENTER The first of the above six lines starts with a $ symbol, which tells Microfit to ignore the rest of that line; it is just a comment, to tell you what the lines do. Always end a comment with a ; (semicolon) or Microfit treats the next line as part of the comment. The next line creates a new variable (Oct74) and sets it to zero. The next line uses the Microfit SAMPLE command, to restrict the data to just month 1974m10 (type it twice, to use data from 1974m10 to 1974m10). For 1974m10, Oct74 is set to 1. The next line resets the sample to all available data. The final line lists this new variable on your screen. Before going further, save the above six lines as a ‘.EQU’ file, in case you want it later – click on the ‘save .EQU file’ button indicated below: 24 Microfit asks for a filename, so type C:\MFITDATA\PTMONTH.EQU ENTER and click the OK button. Now run these six lines, by clicking the Go button. To check they worked correctly, click the Data button and confirm that Oct74 is zero for every month except October 1974. You now need to create two more dummy variables, in the same way: call them Jan75 (equal to 1 for 1975m1) and Oct87 (equal to 1 for 1987m10). If you cannot 6 create these two variables, look at answer { } at the end of this booklet. Next, you should test the following regression equation: vw (month) (divSP) (Oct 74) ( Jan75 (Oct 87) u [EQUATION3] which is based on EQUATION 2 in section 6.2 (chapter 6); but the new regression equation adds three dummy variables. You should get the results shown in table 6; which of the four diagnostic tests does it fail? 25 Table 6: MULTIPLE REGRESSION Table 6 indicated a problem with functional form: the probabilities were [.018] and [.019] for the LM and F versions respectively (both are below 0.05, so the regression “fails” the test). Hoe can we solve this? For this regression, I have found (by experimentation) that using the LdivSP (Log of divSP) instead of divSP seems to solve this problem. As indicated in section 6.3 above, divSP has a skewed distribution; whereas LdivSP is closer to a normally–distributed variable. In general, it is desirable for all variables in a regression to be approximately normally–distributed. So, now, estimate a new regression based on EQUATION 3; but this time replace divSP by LdivSP (the Log of divSP) to solve the ‘functional form’ problem indicated in table6. vw (month) ( LdivSP) (Oct 74) ( Jan75) (Oct 87) u [EQUATION 4] 26 You should obtain the results shown in table 7. Have we solved the functional form 7 problem, and are there any other problems? { } Table 7: REGRESSION COEFFICIENTS 7.2 INTERPRETING REGRESSION COEFFICIENTS What do the results in table 7 tell us? Focus near the top–left corner of table 7. The third line tells us the name if the dependent variable (vw). Below this, under the word “Regressor”, is a list of dependent variables used in the regression. To the right of this list (under the word “Coefficient”) is a column of numbers, which are the coefficients estimated by Microfit. The first number (.077305) corresponds to in EQUATION 4. The second number represents in EQUATION 4; it is written by Mictofit as -.1195E-3 which is a shorthand for -.1195 multiplied 3 by 10 but would be better represented as -.0001195 in a report. The fact that this is negative indicates that vw tends to decrease as month increases, if all other variables in the regression remain the same (so vw appears to have a downward trend). The following coefficient is .027632, which corresponds to in EQUATION 4. The fact that this is positive tells us that increases in 27 LdivSP tend to be associated with increases in vw (if all other variables are unchanged). We can also tell from regression results such as table 7 which coefficients are statistically significant; this issue is explored in section 7.5 below. 7.3 SERIAL CORRELATION We will now return to the dataset you typed in earlier, and saved as the Microfit-format file C:\MFITDATA\SHARES3.FIT (last used in chapter 5). When you used the same dataset earlier, you discovered a problem of serial correlation with the regression specification (EQUATION 2); see table 5. This is a very common problem with time-series data (such as share prices), so we need to solve it. Thankfully, there is a relatively simple solution; but before moving to this, let’s look more carefully at one variable. Look at variable Zambia, as shown in chart 1 of this document (section 4.3). There is a large fall in the share-price at day 15; but apart from that, the price remains fairly steady over time – if you think of Zambia as today’s share-price, then today’s share-price will be similar to yesterday. Let’s investigate this, by looking at the correlation between today’s and yesterday’s price, by creating a lagged variable. The new variable you should create is Zambia 1, which is Zambia lagged by one day. To do this, click the Process button in Microfit, and type: ZAMBIA1 = ZAMBIA(-1); ENTER LIST ZAMBIA ZAMBIA1; ENTER Table8: A LAGGED VARIABLE where ”(-1)” after the variable name Zambia means ‘lag this variable by one observation’. The second of the above lines will list these two variables on your screen, so you can check that variable Zambia1 has been created successful. You should obtain the results shown in table 8 on the right of this page. Notice that Zambia1 data is missing for the first day, and that the second observation of Zambia1 is identical to the first observation of Zambia. Now, find the correlation coefficient between Zambia and Zambia1, using the method explained in section 4.2 above. Check your answer with {8} at the back of this booklet. The value of the correlation coefficient is near +1, which indicates a strong positive correlation between Zambia and Zambia1; it is evidence of ‘serial correlation’, which is also called “autocorrelation” (meaning correlated with itself). We seem to have found serial correlation: the correlation between Zambia & Zambia1 is near 1. But let’s be scientific: is this correlation statistically significant? We will use the Microfit command COR in a different way to section 4.2 above: this time, type just one variable name (rather than two, as you did in section 4.2). Type COR ZAMBIA; into the Microfit Process window 28 (removing anything you typed earlier), and click the Go button; Microfit then reports statistics such as the mean & standard deviation, and you should then click on the Close button; Microfit then indicates the extent of autocorrelation of variable Zambia (you should obtain the same results as are shown in table 9 below). Microfit also creates a chart, which you can ignore. Look at the top of table 9. Microfit reports a coefficient of 0.77943 for the first-order serial correlation; this must have the same sign as the correlation coefficient between Zambia & Zambia1 you produced earlier in this section. Table 9 also indicates two statistics which we can use to assess if this serial correlation is statistically significant: the ‘Box-Pierce’ and ‘Ljung-Box’ statistics. If the numbers in the square brackets is less than 0.05 for either of these, then the serial correlation is statistically significant (see section 5.4). In this case, each of these probabilities is [.000] so we conclude that the first-order correlation is statistically significant. For this booklet, you can ignore all of the rows below this: we are not concerned with second-order (or higher-order) serial correlation. Table 9: AUTOCORRELATION OF VARIABLE ZAMBIA 7.4 DIFFERENCING A VARIABLE Having established that Zambia does indeed show serial correlation, we now consider a solution. The standard approach for this problem in time-series data is to “difference” the variable: this means calculating the difference between the value of the variable on one day and the value of the same variable on the previous day. In Microfit, this is done by clicking the Process button, and typing: DZAMBIA = ZAMBIA – ZAMBIA (-1); ENTER which will create a new variable called dZambia (include a letter “d” in the name dZambia to remand you that this variable is differenced). Has this new variable solved the serial correlation problem? Calculate the serial correlation of dZambia (see {9} if you cannot do this). Is the serial correlation for this new variable statistically significant? Look in the [] brackets on your screen, at the row representing first-order serial correlation: the values are [.667] and [.655] so both are above 0.05 and hence not statistically significant. We can be reassured that differencing has solved the serial correlation problem. Does the other variable in table 5 (Nippon) show autocorrelation? Carry out an autocorrelation test on Nippon; is it a problem? {10} Even if there were no autocorrelation in Nippon, it would be better to use a differenced version of Nippon in a regression with dZambia. We cannot use Zambia because of serial correlation (and 29 hence risk of spurious results: see the tree example in section 5.1). but if we regress dZambia on Nippon (differencing one variable but not the other), we may fail to detect a genuine relationship. If there is a linear relationship between Zambia & Nippon, we should find a significant relationship between dZambia and a differenced version of Nippon. Calculate a variable called dNippon, equal to the first difference of Nippon (see {11} if you cannot do this). I have found that dNippon does not show significant autocorrelation (you do not need to test this). Next, run a regression with dZambia dependent on dNippon and the constant. Does this regression still have a problem with serial correlation? {12} However, there is now a problem with normality (probability [.000]). The problem that residuals of the new regression are not normally-distributed suggests that we cannot rely on the results. I experimented with taking logs of Zambia and Nippon (and then differencing to remove autocorrelation), but even this regression still had non-normally-distributed residuals. To see why (the residuals of) Zambia shows non-normality, create a histogram of Zambia by clicking the Process button and typing: HIST ZAMBIA; ENTER You should obtain a histogram like chart 4 on the right. Looking at the histogram, we can see why dZambia is not normally-distributed (and hence why the previous two regressions do not have normally-distributed residuals): there is an outlier (a value very different from most observations) between -16.42 & -9.75. In a large sample-size, a few outliers need not prevent the distribution from being approximately normal; but we have few observations. If you found more observation on Zambia (and the explanatory variable) to make dZambia closer to a normally-distributed variable, then the regression residuals might become normally-distributed. A second way to produce normal residuals with this dependent variable is to limit the sample, to exclude the outlying observation; but I would not recommend doing so here, due to the small sample size, and because the outlier is not near the start or end of the data. A third option is to transform the data, but I am not aware of a transformation which would create a normally-distributed variable from Zambia. Chart 4: A NORMAL DISTRIBUTION? Let’s go back to the residuals of the latest regression (dZambia on dNippon). Chart 5 shows the residuals for each observation. You do not need to replicate this chart; I created it in Microfit as a “3-dimensional” image, to make it look different to other charts in this document. There is a large 30 negative residual at day 15, which corresponds to the drop in the Zambia share-price at day 15: this negative value at day 15 is Chart 5: RESIDUALS visible in the residuals (in chart 3) because it was not explained by the explanatory variable (dNippon). This negative value at day 15 is the outlier on the left-hand-side of chart 4. You can also see this as a sudden drop at day 15 in table 1 (section 2.1). In section 6.1 above, we discussed four possible problems with regressions: serial correlation; functional form; normality; and heteroscedasticity. Before trusting the results from a regression equation, you should check that the regression passes all four tests. If a regression fails any test, then there may be a risk of spurious results. So far, we cannot tell if there is a link between the share prices for Nippon & Zambia; even after differencing, we were unable to produce a regression which satisfied all diagnostic tests, so the above regression results cannot be relied on. The following section will produce a regression equation which does pass all the diagnostic tests. 7.5 WHICH VARIABLES SHOULD BE INCLUDED IN A REGRESSION? The problem with the regression of dZambia dependent on dNippon (in the previous section) was that dZambia was not normally-distributed, so the regression did not have normally-distributed residuals. Ideally, we would like to know the cause of the sudden drop in the share price at day 15, and add this variable to the regression as an explanatory variable (as well as dNippon). The fall may be due to an announcement of falling profits: or a drop in demand for copper; or an accident in one of the firm’s factories. I do not know the cause of the price drop on day 15, but we can try creating a new dummy variable set to 1 on the day of the sudden fall (and zero on other days); this may give a satisfactory regression. To try this, click on the Process and type $ CREATE A DUMMY VARIABE; ENTER DAY15 = 0; ENTER SAMPLE 15 15; ENTER DAY15 = 1; ENTER SAMPLE 1 22; ENTER LIST DAY15; ENTER The above lines create a new variable (Day15), equal to 1 on day 15 and equal to zero on all other days. Next, test the regression equation dZambia = α(1) + β(dNippon) + ι(Day15) + μ [EQUATION 5] where ι is an extra coefficient for Microfit to estimate, and other symbols are as explained above. To estimate this new regression specification, click on the button labeled Single and repeat the process you used in section 5.2, but this time adding an extra explanatory variable. You need to type: 31 DZAMBIA CONSTAN DNIPPON DAY15 and click the Start button on the right. Look at the regression results on your screen: does it pass the diagnostic test? You should find that this new regression still fails the normality test. After experimenting, I found that one way to produce an acceptable regression is to create two more dummy variables: Day5 and Day8, each defined similarly to Day15 (Day5 equal to 1 for day 5, and zero for all other days; Day8 equal to 1 for day 8, and zero for all other days). You should now create these two new dummy variables. Next, try the following regression: dZambia = α(1) + β(dNippon) + ι(Day15) + δ(Day5) + ε(Day8) + μ [EQUATION 6] where δ and ε are two additional coefficients to be estimated. You should obtain the results shown in table 10; if you do not, check the specification you used, and run the regression again until you get the results shown here. Table 10: REGRESSION WITH THREE DUMMY VARIABLES Now save your data file, as C:\MFITDATA\SHARES3.FIT (if you can’t do this, see section 2.4). Microfit will warn you that there is already a file of the same name; you should save it again (with the same filename) so click on Yes. By saving this file again, you will keep the changes you have made to the dataset (such as the dummy variables Day15, Day5, and Day8 you created). To keep a 32 copy on diskette, save it again using the filename A:\SHARES3.FIT and keep your diskette some where safe. We now have a regression which passes all four diagnostic tests (see section 6.1), so we can consider the findings. The sample-size is small, and we had to add three dummy variables to correct for the ‘shocks’ in Zambia prices, so our results should not be taken too seriously. Nevertheless, let us look at the results (in table 10): what have we learnt? The first place to look is to see which variables are statistically significant, by looking at the “T-ratios”. This ratio is simply the coefficient divided by the standard error of that estimate. In the case of dNippon, for example, the T-ratio is (-.016800/.032019) = -0.5426885; this is very close to the value -.52471 reported by Microfit (the slight difference is due to rounding errors). A T-ratio of more than about 2 (or less than about -2) is statistically significant at the 5% level (see section 6.1); but I use the phrase “about 2” because the point at which a T-ratio is statistically significant depends on the number of observations. You can look up this value in tables of the T distribution (in the back of many statistical textbooks), but there is an easier way. At the far right-hand-side of table 10, we see a number in square brackets next to the T-ratio: this tells us the probability of a variable having that T-ratio. In the case of dNippon, for example, the probability that a T-ratio is -.52471 (with the sample size of 21 observations) is [.607] (check that you can locate this number in table 10). As discussed in section 5.4 above, there is a social science convention that a probability of under 5% (i.e. 0.05) is ‘statistically significant’. Because the probability for dNippon is above 0.05, we can say that this is variable is not statistically significant. A variable with a probability less than 0.05 (based on a T-ratio) does not significantly improve the regression. We can say that taking account of its standard error (.32019), the dNippon coefficient (-.016800) is close to zero. If we created a variable made up of random numbers, this should be unrelated to dNippon; but it is quite likely that such random numbers (when used in a regression) would produce a T-ratio of -.52471 or lower, or a positive T-ratio of +.52471 or higher. So this is the test for whether or not to include an explanatory variable: is the probability (based on the T-ratio) less than 0.05? If so, then that variable appears to be a significant influence on the dependent variable; whereas if the probability is not under 0.05, then it does not seem to be significantly linked to the dependent variable, and can be dropped from the regression. Now, look at all variables in table 10: which of these are statistically significant? Check your answers with {13} at the end of this booklet. Does this regression suggest that dZambia and dNippon are significantly related to each other? They do not appear to be linked, because the T-ratio for dNippon is nor statistically significant. Some researchers like to have an “objective” test as to how many explanatory variables should be included in a regression; it may seem possible to use the R2 statistic (called “R-Squared” in Microfit) for this purpose, because any variable which increase the R2 value helps explain more of the variation in the dependent variable. In fact, the R2 value is not appropriate for this task: almost any explanatory added to a regression will increase the R2 value, even if it has little connection with the dependent variable. The R-Bar-Squared statistic is better: this is related to the R2 statistic, but the definition of R-Bar-Squared is such that it is reduced as more explanatory variables are added to the regression. 33 Table 11: COMPARISON WITH TABLE 10 The regression results in table 11 allow us to compare R2 values with the previous regression (there is no need to carry out this new regression yourself). The only difference between the regression used to produce tables 10 and 11 is that the table 10 regression includes an extra variable: dNippon. Compare the R2 values of these two tables: the R2 value in table 11 is .96579, which is increased slightly to .96636 in table 10 by adding variable dNippon. This might suggest that the table 10 regression is better, because it explains a (slightly) higher proportion of the dependent variable than the table 11 regression. Yet we found earlier in this section that dNippon is not significantly linked to dZambia. In deciding where table 10 regression is better or worse than that for table 11, comparing R2 values gives the ‘wrong’ answer (‘wrong’ because dNippon is not statistically significant). Now, compare the R-Bar-Squared values of tables 11 and 10. The value falls from .95975 in table 11 to .95975 in table 10, so adding the extra variable has given us the ‘correct’ answer: that we should not include variable dNippon in this regression. Nevertheless, you should not use either R2 or R-bar-squared to decide which variables to include in a regression: it is more appropriate to use the method explained in this section, based on the T-ratio. For many research projects, the aim is to try to “explain” variation in a key variable; there may be a number of variables which are thought likely to influence this key variable. One approach often used is to include all possible explanatory variables initially, and then remove each variable which is not statistically significant (using the probability of the T-ratio). If you use the approach, it is advisable to keep a constant term in every regression even if it is not statistically significant. In general, the ‘answer’ you seek is a list of “cause” of the key variable, and you would only report regression results when all explanatory variables are statistically significant. However, other research has different aims: you may be testing a specific claim, such as “increasing the money supply causes inflation” – in this case, finding that a particular explanatory variable is not 34 statistically significant is an ‘answer’, so you could report results of a regression containing a non-significant variable. CHAPTER 8: (UNIT 8) REGRESSION AND THE CAPM This chapter looks at CAPM and portfolio theory: it explains how to calculate the beta of an asset, and how to test CAPM. 8.1 ESTIMATING ‘RISK’ AND ‘RETURN’ FROM SHARE DATA This chapter discusses the CAPM (Capital Assets Pricing Model), which is explained in unit 8 of the ‘Quantitative methods for management’ folder. This topic is relevant to the “best” way to invest in stocks & shares. This choice depends partly on how much risk an investor will accept to obtain a higher return; but CAPM suggests that some portfolios are ‘better’ than others, regardless of how risk-averse you are. The concept of a ‘risk-free’ asset is central to this chapter. Government stocks are often described as “risk-free”, because most people trust the government to repay the loan; but such stocks are not literally free of risk. Consider the 21/2% Treasury stock price, shown in table 2 (section 2.3): you would lose money if you bought on day 6 (at 53.1915 pence) and sold on day 15 (at 51.5014 pence). There is a way to make a risk-free purchase (Brealey & Myers, 1996: p.144): on day 1, buy a government stock with an expiry date, and hold it until its expiry date. In this case, the interest-rate is fixed, because we will not sell it before the expiry date. For this chapter, we will make two assumptions: that there is a Treasury stock which expires on day 22; and that this (dated) stock has the same return as the (undated) Treasury stock Treasur. I will refer to this (imaginary) government stock with an expiry date as rskFree; it has the same average return as Treasur (i.e..0013941 per day), but no risk. One of the key ideas in this topic is the ‘portfolio’, which means placing your savings in a mixture of assets. Suppose you have some money to invest (students may find this difficult to imagine!). You could spend all of your savings on shares in a firm which sells umbrellas; if there is a lot of rain next year, the firm may have high profits, and pay a large dividend - so you will make money. But this is risky – next year may be sunny, in which case few umbrellas will be sold, so your dividends will be low. A safer way is to “diversify” your savings, which means buying more than one type of asset. For example, you could put half of your savings in the umbrella firm, and the rest in a firm selling ice-creams – that way, you should get a reasonable return on your savings, whatever the weather. In financial management, ‘return’ on investment is usually measured as the proportionate increase in value over a fixed time (per day, in this dataset). So we use the following definition of return: r = [p – p(-1)]/p(-1) [EQUATION 7] 35 where r is return on an asset; p is the price of the asset, and p(-1) the price on the previous day. Let’s now calculate the return for one of the firms in the dataset C:\MFUTDATA\SHARES3.FIT you created in section 4.1 above. We will look at share price Allied, and use the name returnA for the return on Allid (it is important to be systematic when choosing variable names). In Microfit, create the new variable by clicking the Process button, and then typing RETURNA = (ALLIED – ALLIED(-1))/ALLIED(-1); ENTER You should now work out the returns on the two other shares (Zambia and Nippon) in the same way: call these new variables returnZ and returnN. We want to know the average return for these three variables; the easiest way is to use the COR command, as follows: COR RETRUNA RETURNZ RENTURNN; ENTER You should now have four lines visible in the Process window: three lines to calculate returns on different shares, and the above COR line. The COR line must appear after the other three lines. You can now run these four lines, by clicking the Go button. You should obtain the following results: Table 12: SUMMARY STATISTICS On your computer, you should see the same results as shown in table 12. Microfit also computes a correlation matrix, but this is not helpful (so ignore it). The first line of table 12 tells us that Microfit used observations from 2 to 22; why did Microfit not use observation1? (check your answer with {14}). Table 12 shows that the average for returnZ as -.0069281 (a fall of about two-thirds of 1% per day). The fact that Zambia copper has a negative return is consistent with the data in table 1 (section 2.1): the Zambia share price fell from 95 pence to 80.75 pence. If we are looking for an investment, the falling price over these 22 days might put us off buying Zambia shares. We have data on Allied & Nippon shares; which of these is a better investment? Nippon has a higher growth-rate than Allied, because the average value of returnN is more than that of returnA (see table 12). But as well as the return, we should consider ‘risk’. The conventional definition of the “risk” of owning a share is the standard deviation of the return of the return on that share; we can use this definition to compare the riskiness of different shares. The standard deviation of the return on each share is shown in table 12, on the line below ‘Mean’: returnA (standard deviation 0.018175) has a lower risk than returnN (standard deviation 0.032514). some investors may prefer Allied, to reduce the risk; others might prefer Nippon with a higher return. Or, we could combine Allied & Nippon shares in a portfolio. 8.2 CHOOSING THE BEST PORTFOLIO Suppose we invest one penny in a portfolio, on day 1.If we spent the whole penny on Allied, we would get (1/1026.5) shares (the price was 1026.5 on day 1: see table1, section 2.1). Or we could spend one-fifth (0.20) of a penny on Allied to buy 0.20 (1/1026.5) shares; this would leave 36 four-fifth of a penny, which would buy 0.80(1/181.75) of the Nippon shares. The value of our portfolio would be: PORTFOLO [20% Allied, 80% Nippon] = (0.20(1/1026.5)Allied) + (0.80(1/181.75)Nippon) The value of this portfolio would change from day to day, as the values of Allied and Nippon vary; the return on the portfolio is a weighted average of returns on Allied & Nippon. The best way to calculate the return of a portfolio is by computing the value of this portfolio as a new variable: PORTFOL20 = ((0.20/1026.5)*ALLIED) + ((0.80/181.75)*NIPPON); RETURNP20 = (PORTFOL20 – PORTFOL20(-1))/PORTFOL20; In Microfit, the above asterisk * means ‘multiplied by’. The “20” in Portfol20 refers to the fact that 20% of the portfolio consists of Allied shares. The formula for the return on Portfol20 is similar to that for a share, in section 8.1 above. Type the above two lines into Microfit to create Portfol20 and returnP20; and then type the equivalent pair of lines for a portfolio containing 40% (rather than 20%) Allied shares: call these Portfol40 & returnP40. Do the same for a portfolio containing 60% Allied shares, and for a portfolio containing 80% Allied shares. Compare what you typed with answer {15} at the end of this booklet, and run them in Microfit by clicking the Go button. There is a relationship between Nippon and Allied, and the four portfolios you have created: you can thick of Nippon as a portfolio with 0% shares in Allied, and Allied as a pportfolio with 100% shares in Allied. So in the rest of this chapter, I use the phrase ‘six portfolio’ to include Allied and Nippon. Now use the COR command (as in section 8.1) to find the mean & standard deviation of the six portfolios: return, returnP20, returnP40, returnP60, returnP80, returnA (you should get the results shown in table 13). Table 13: STATISTICS ON 6 PORTFOLIOS Microfit then produces a correlation matrix, which you can ignore. Look at table 13, and focus on the row labeled ‘Mean’: which of the six portfolios gives the best return? And which is the lowest risk, based on the standard deviations? Check your answer with {16} at the end of this booklet. The choice of which portfolio is “best” may seem arbitrary, because some investors are more risk-averse than others. But we can say that returnP80 is better than returnA: returnP80 has a higher return and a lower risk than returnA. To compare the risk and return for all these six portfolios, look at chart 6. Chart 6: SIX PORTFOLIOS, AND A “RISK-FREE” ASSET Chart 6 above cannot be produced in Microfit; you do not need to replicate it. The chart presents data from the mean & standard deviation rows of table 13 for the six portfolios. For example, the risk & return for Nippon shares are shown in the returnN column of table 13. Chart 6 is in a form 37 similar to Brealey & Myers (1996: figure 8-5, p.177; but we would need to multiply all numbers by 100, if we wished to use risk & return in percent as Brealey & Myers do). In chart 6, each of the six portfolios is represented as a point; they are labeled according to the name of the share or portfolio. Nippon (nearest to the top of the chart) has the highest return; the lowest risk is portfol80, which is nearer the left-hand side of the chart than the other portfolios. I connected these six points by a smooth curve; other portfolios of these two shares (such as one containing 50% Allied shares) would lie on this curve. The section at the top-left part of this curve (between portfol80 and Nippon) is called the set of “efficient” portfolios; what does this mean? {17} On chart 6, I added a point labeled rskFree (see section 8.1). Consider the continuous straight line starting at rskFree, and touching the portfolios curve near portfol40: what is this line called? {18} Of the six portfolios we studied in this section, portfol80 has the lowest risk. But chart 6 shows that investors can do better than this: for example, the point on the security market line vertically above portfol80 has a higher return than portfol80, but the same risk (investors can reach this point by a portfolio of Allied & Nippon in the same proportions as portfol40, with some rskFree stock). Portfolio portfol40 is a good combination of shares because it lies (approximately) on the security market line – it has relatively high return but low risk. Many investors would prefer a safer investment than portfol40, and would choose a point on the security market line closer to rskFree. 8.3 ESTIMATING THE BETA OF AN ASSET Consider the ‘beta’ of an asset: this compares the behaviour of one firm’s shares, with the behaviour of the stock exchange as a whole. We need a measure of the rate of return for the entire stock market. There are various possible data sources on groups of shares, such as the ‘FTSE-100’ index of the largest 100 firms on the London stock exchange (produced by the Financial Times newspaper). But to save you typing in more data, we will construct our own (very limited) market index from the four data series you typed earlier. Go to the Microfit Process window, and type the following: SET = ((ALLIED/1026.5) + (ZAMBIA/95) + (NIPPON/181.75) +(TREASUR/51.8))/4; ENTER RETURNS = (SET – SET(-1))/SET(-1); ENTER We treat this new variable returnS as an approximation of the return for the whole stock market (we should really use an efficient portfolio, and there is no reason to suppose Set is efficient). Now look at the performance of Allied relative to Set. To calculate this ‘beta’, type (in the Microfit Process window): COR RETURNA RETURNS; ENTER When you click the Go button, Microfit should produce the results shown in table 14; and then when you click the Close button, you should see the correlation matrix shown in table 15. Table 14: MEAN & STANDARD DEVIATION 38 Table 15: CORRELATION Pause for a moment: you now have enough information to calculate the beta of Allied (using a calculator): can you do so? Microfit does not make it easy to work out beta, so I will explain the method. Using formula (8.2) in the ‘Quantitative methods for financial management’ folder (unit 8: p.5), the beta of stock X is betaX = cov(X,S) / var (S) [EQUATION 8] where cov(X,S) is the covariance between returns on stock X and the average return for an efficient portfolio, such as the entire stock market; and var (S) is the variance of the average rate for the efficient portfolio. We are studying Allied shares, so we want betaA rather than betaX. Microfit reports the correlation coefficient which I will write as cov(A,S) but we need the covariance cov(A,S) so we need a little more work. We can caluculate the covariance from the correlation, using the formula from Brealey & Myers (1996: p.158), which I present using the notation used in the ‘Quantitative methods for financial management’ folder): cov(A,S) = (cor(A,S)·(SD (A))·(SD(S)) [EQUATION 9] where SD() indicates the standard deviation of a variable. Noting that var(S) is equal to SD(S) squared, we can then substitute this calue of cov(A,S) into EQUATION 9, to give betaA=(cor(A,S))·(SD(A))·(SD(S))/(SD(S))2 which simplifies to betaA =(cor(A,S))· (SD(A) )/(SD(S) ) [EQUATION 10] Now substitute results from tables 14&15. The correlation coefficient cor(A,S) between Allied & Set is 0.48224; and the standard deviations of Allied and Set are 0.018175 and 0.015435 respectively. Hence betaA =(048224)(0.018175)/(0.015435)=0.5678 approximately. Now let’s try a different approach. Look at section 8 of the ‘Quantitative methods for financial management’ folder (p.5): we can calculate the beta of stock X using equation 8.3: rx-rf =βx(r m – r f) where r indicates the return on an asset. In the above equation, βx is closely related to betax (as explained below). What is r f in the above equation? Earlier, we calculated the return on various assets: for example, returnA is the return on share Allied. However, the CAPM requires us to 39 study the “risk premium”, defined as the return on a share minus the “risk-free” return. Using the “risk-free” return discussed in section 8.1, you can calculate the risk premium for each share, by typing the following into the Microfit Process window: PREMIUMA=RETURNA-0.0013941; ENTER PREMIUMS=RETURNS-0.0013941; ENTER Where 0.0013941 is the average return of the “risk-free” asset (see section 8.1). Now run an OLS regression. Click the button labeled Single (near the top right of the screen), and type the following regression specification: PREMIUMA CONSTAN PREMIUMS; ENTER And click the START button on the right of the screen. Your results should be the same as table 16 below. Table 16: REGRESSION RESULTS In the case, you can ignore the diagnostic statistics produced by Microfit. We are interested in the coefficient of premiumS, which is 0.56786 (make sure you can find this coefficient in table 16). This is very close to the value of beta we calculated earlier in this section, using the correlation coefficient and standard deviations of returnA and returnS. Does this prove that the CAPM theory is correct? No, it doesn’t prove anything (Berndt E.R., 1991, The practice of econometrics: classic and contemporary, Addison-Wesley: Reading Mass., p.35). Perhaps by accident, the economists who invented beta chose a formula which could be estimated by OLS regression, as we have just done. But there is another way of calculation beta, using covariance and standard deviations, which we did earlier in this section. To prove that these two are equivalent, start with EQUATION8 (above): beta x cov(X, S) var (S) substitute the covariance and variance the terms, by the formulas for covariance and variance: beta X ((X - Xbar)(Y - Ybar)) ((X Xbar )(X Xbar )) i i i i where Xbar represents the mean of X, and Ybar the mean of Y. Focus on the top lines: multiplying the two inner brackets gives 40 Σ(Xi·Yi - Xbar·Yi - Xi·Ybar + Xbar·Ybar) We can re-write this line as four separate summations: Σ(Xi·Yi) –Σ(Xbar·Yi) – Σ(Xi·Ybar) + Σ(Xbar·Ybar) The symbol Σ means «add this term for each value of I from 1 to n, where n is the number of observations; so any term which does not vary with I can be moves outside the summation, to give Σ(Xi·Yi) - Xbar·Σ(Yi) - Ybar·Σ(Xi) + Xbar·Ybar·Σ(1) By definition, Xbar=(ΣXi)/n so we can rewiteΣXi as n·Xbar; similarly for Y. Note also that the Σ1=n. So we can rewrite the above line as Σ(Xi·Yi) - Xbar·n·Ybar - Ybar·n·Xbar + Xbar·Ybar·n The third and four the terms cancel each other, to give Σ(Xi·Yi) - Xbar·n·Ybar We can carry out the same process on the bottom line of EQUATION 11, to produce Σ(Xi·Xi) - Xbar·n·Xbar So, EQUATION 11 becomes beta X (X (X i Yi ) n Xbar Ybar i Yi ) n Xbar Xbar The formula for betax is identical to the formula for βin OLS regression (equation 4.15a, p.8 in unit 4 of the ‘Quantitative methods for financial management’ folder). The above mathematics mean that the beta (calculated from covariance and variance) must be the same as the β (calculated by OLS regression); so the fact that both methods gave the same value (about 0.5678) does not “prove” that the CAPM theory is true. Note one more complication. The latest regression used premiumA & premiumS to find βA ; but to calculate betaA you used the covariance and variance of returnA & returnS (rather than premiumA & premiumS). There is a difference of 0.0013941 between returnA and premiumA (and the same difference between returnS and premiumS). This constant difference of 0.0013941 has no effect: subtracting any constant from returnA & returnS has no effect on the standard deviations, or on the correlation between them. It is desirable to have a clear idea of what beta means. The aim of beta is to assess if return on one stock varies less, or more, than the stock market as a whole. So what does the Allied beta value of 0.5678 mean? The first point is that this beta is less than 1; this suggests that it is less risky than most assets. Risk-averse investors would prefer Allied to riskier shares. Now calculate the beta of the other three assets (Nippon, Zambia & Treasur), using the regression method, and check your answers with those at the end of the book {19}. Also, use these beta vales to list the four assets in order from lowest to highest risk (check your answers with {20})). Two assets (Zambia & Nippon) have higher-than-average risks, and the other two have lower-than-average risk. Note that the average of these four beta values is 1. we should always get an average of 1 if we look at a large number of assets, ad compare each asset with a portfolio like Set which is an average of all assets. 8.4 TESTING CAPM In unit 8 of the ‘Quantitative methods for financial management’ folder (p.6), we read “the more risky the stock is the higher are the returns required by investors”. The previous section found that 41 the four assets have different beta values, and hence different levels of risk. The CAPM theory implies that there should be a strong link between risk & return: every share must lie on the ‘security market line’, and this security market line must slope upwards. Chart 7: TEXTING THE CAPM Remember that this booklet is only a training exercise; we cannot carry out a serious test of CAPM, because that would require long-term data on prices (and dividends paid) for a large number of shares. Because we have no data on dividends, we were forced to assume that the only reason to hold stocks & shares is because their prices are expected to rise (whereas in reality, the main reason for buying such assets is the dividends which you expect the firm or the Treasury will pay). All we can do here s to illustrate an approach we could use to test CAPM. We can draw a graph of risk against return, and place the four assets on it: this is shown as chart 7 above (there is no need for you to replicate it). I display the real Treasur stock, rather than the imaginary rskFree stock (discussed in section 8.1). is the CAPM theory supported by our current data?Not really. One obvious problem is that one share (Zambia) has a negative return. This is because of the short time-span of this data (22days), and because we assume no dividend was paid during the period; if we looked at a longer time span (say a few years), we could be confident between risk & return: they seem to lie approximately on an upward-sloping line, which (CAPM suggests) we could call the ‘security market line’, which could be taken as support for the CAPM theory. I suspect it is just coincidence that three of these four assets lie approximately on an upward-sloping line: we are unlikely to find clear support for CAPM in such a small sample-size note that the apparent ‘security market line’ would cut the vertical axis (i.e. the part of chart 7 where risk is zero) appears to show a zero or negative return, which is not plausible. 8.5 FINAL COMMENTS In section 8.3, we were able to estimate beta values of assets without using regression. But if you test CAPM for almost any other theory in finance), you should expect to use regression. So do not forget the comments of previous chapters in this booklet: that if there are departures from the assumptions associated with the GaussMarkov theorem (see unit 5 of the ‘Quantitative methods for financial management’ folder), then OLS regression is unreliable. In the case of CAPM, OLS regression seems appropriate because of the definition of beta; but you might obtain very unrealistic estimates in some datasets (it is not possible to explore such issues in this MSccourse). In general, do not restrict yourself to OLS regression: in particular, the issue of autocorrelation (discussed in unit 7 of the ‘Quantitative methods for financial management’ folder, and chapter 7 of this booklet) is a very important issue for time-series data, which are typical of datasets used in financial management. The problem of autocorrelation in share prices is partly, but not entirely, solved by calculating the return on shares (as we did in section 8.1): this is because the formula for return on a share, which you used in section 8.1 of this booklet, is fairly similar to the first difference of the share price. Gook luck with your research. 42 43