CHAPTER 1:

advertisement
CHAPTER 1:
INTRODUCTION
This chapter introduces Microfit, and explains how to install it on your
computer. You should read this chapter (quickly) even if you have
already install Microfit.
1.1 WHAT IS MICROFIT, AND HOW SHOULD YOU USE THIS BOOKLET?
Microfit is a computer program, for carrying out econometric analysis: it was designed for
economists. Microfit is mainly intended for regression, but can also be used to enter data from the
keyboard, and to draw graphs. Several versions of Microfit have been produced: this document is
for Microfit 4 (windows) version, but all other versions of Microfit are for MS-DOS (they can be
run under Windows, but were not intended to be used that way). Note that there is a MS-DOS
version of Microfit 4, which is very similar to Microfit 4 for Windows.
This document is written for students taking the ‘Quantitative methods for financial management’
module (FM105), which is part of the MSc in Financial Management run by CIEE (SOAS,
University of London). This booklet assumes you have the folder ‘Quantitative methods for
financial management’, and that you can understand the mathematics and statistics it contains. We
hope you will be able to use this booklet without help; you may need advice from your tutor on
some topics, but please read the relevant part of this booklet carefully before asking for help. We
strongly recommend you read the whole of this booklet.
This booklet does not assume that you have experience of Microfit (or any other computer
program). However, it is essential that you have access to Microfit on a suitable computer (see
below), and that you use your computer to experiment with the techniques explained in this
booklet. Experimenting takes time, but is essential if you want to pass the MSc course. There are
self-test questions throughout this booklet (shown by numbers in {} brackets); always try to
answer them yourself (don’t immediately look at the answer).
Chapter 1 (this chapter) prepares you to run Microfit on your computer. Chapter 2 and 3 explain
some important features of Microfit, such as data files. Chapter 4 explains introductory statistics in
Microfit. Chapter 5 to 8 involve regression: they are the hardest chapters, but the most important
for your MSc. It is important to take regular breaks, so your brain can absorb ideas: this booklet
requires you to cope with statistical theory at the same time as you use a computer, so it isn’t easy.
You should take a break at the end of each chapter, and usually at the end of each section, so that
you feel fresh when you return to Microfit.
1.2 CAN YOUR COMPUTER WORK WITH MICROFIT 4 (WINDOWS) ?
1
Before you install Microfit, check that your computer is good enough. You will need all of the
following items.
Windows: the Microfit ‘Windows’ operating system allows your computer to run two or more
programs at once. Your computer must be running Windows when you install Microfit 4
(Windows): any Windows version should be fine, including Windows 3.1; Windows95;
Windows98; and Windows NT. At the time of writing, Windows 2000 had not been released, but
we expect Microfit 4 (Windows) to work on Windows 2000.
Mouse:
Windows relies on the use of a ‘mouse’ ( a hand-held device, to select an option from
the screen), or something similar such as a trackball, to run Microfit 4 (Windows).
Memory: This means a set of computer chips, which can store a computer program, data, etc.
Memory is temporary: it will “forget” your data when you switch the computer off. Your computer
must have at least 8 Megabytes of memory before you can run Microfit.
Hard disk: a hard disk keeps data (and other files) more permanently than memory: the hard disk
“remembers” files when the computer is switched off. You need at least 12 MegaBytes of
available disk-space (as well as the 4 MegaBytes of disk-space needed to install Microfit).
Printers:
this document assumes that you have a printer attached to your computer: if you
don’t have a printer, we recommend you buy one. But you can learn to use Microfit without a
printer.
If you do not have Windows on your computer, or do not have enough memory or disk-space, we
recommend you buy a new computer. Any new IBM-compatible personal computer (portable, or
desk-top) should be fine. Apple computers (such as the iMac) are not appropriate for Microfit.
1.3 USING THE MOUSE AND KEYBOARD: HOW TO “CLICK” A BUTTON
This document uses the word “button” to refer to an icon which runs a small program if you select
it. Within Microfit, if you move the mouse over a button (without pressing the mouse switch), a
yellow-and-black box appears just below the button to explain what that button does. Confusingly,
the word “button” is often used to refer to the switches on a mouse, which you press with a finger.
This document will use the word “switch” (not button) to describe the part of the mouse your
finger touches.
To use the mouse, move it until the pointer on the screen points to the icon or word you want to
choose; then press the left switch on the mouse. This booklet uses the word “click” as an
abbreviation for “press the left switch on the mouse”. Some computers (especially portable
computers) use an alternative to a mouse: see your computer manual for the equivalent to pressing
the left mouse switch.
You cannot do everything in Microfit with the mouse: you need to type some things into the
keyboard. In this document, things you are required to type are shown in ‘small capitals’, like this:
A: SETUP ENTER This document uses ENTER as a shorthand for “press the ENTER (or
RETURN) key on your keyboard”.
2
1.4 CREATING A DIRECTORY FOR DATA FILES
In order to make sure you can find the data files you will use for this document, you should create
a new directory for data files. The way to do this depends on which version of Windows you are
using:
Windows version 3.1:
If you are running Windows, exit to MS-DOS. Then type MKDIR C:\MFITDATA
ENTER and then go back to Windows, by typing WIN ENTER
Windows 95, Windows 98, or Windows NT:
Point the mouse at the word Start (bottom-left of the screen) and click; point the
Mouse to MS-DOS Prompt and click; then type MKDIR C:\MFITDATA ENTER
EXIT ENTER
1.5 HOW TO INSTALL MICROFIT 4 (WINDOWS)
Insert ‘disk 1’ of the set of Microfit disks (sent by CIEE) into your computer. The next step
depends on which version of Windows you have:
Windows version 3.1:
Before installing Microfit, you must be running Windows: do this by typing WIN
ENTER Point the mouse to the word File (top-left of the screen) and click: point to
Run and click; then type A:SETUP ENTER
Windows 95, Windows 98, or Windows NT:
Point the mouse at the word Start (bottom-left of the screen) and click; point to
Run and click; then type A:SETUP ENTER Now create a “shortcut” to Microfit:
Find an empty part of the screen, and press the right switch on the mouse; then
point to Create shortcut and type the directory and name of the program:
C:\MFIT4WIN\MICROFIT.EXE ENTER
Microfit often displays a button labeled Start; from now on, when this booklet refers
to the Start button, it means the Start button in Microfit, not the Start button at the bottom-left of
the screen.
After that, follow the instructions on your screen to install Microfit.
CHAPTER 2
ENTERING AND SAVING DATA
In this chapter, you will type some data into a spreadsheet and use
Microfit for the first time.
2.1 TYPING DATA IN A SPREADSHEET: CSV FORMAT
Before using Microfit, we want to type in some data. The best way to type in data is into a
spreadsheet, such as Excel (or Lotus 1-2-3). This document cannot teach you how to use a
3
spreadsheet program – if you have never used one before, ask a friend to show you how.
The simplest way to transfer data from a spreadsheet program to Microfit is in CSV format.
When using CSV format, you must obey all of the following rules:
﹡ The left-hand column must be a label: a date, or an observation number( start with 1
for the first row of data). For dates, use labels like 1979 (annual data), or 1979q1
(quarterly data), or 1979m1 (for monthly data).
﹡ The top row of the spreadsheet must contain labels: one for each data column. Each
variable name must start with a letter, and should not contain special characters like
commas or spaces. Use 7 characters or less for every variable name (so we can add L to
indicate log of a variable, & D for differencing).
﹡ Type each data series in a vertical column. There must be no empty cells in the area of
your spreadsheet which contains your data; if any data is missing, type #N/A in Excel (or
@NA if you use Lotus 1-2-3 ) into each blank cell.
The box in table 1 (on the right) shows London share prices from 9
th
March to 13
th
April
1999 (excluding weekends & bank holidays) for three firms:
‘Allied Irish Bank’, ‘Zambia Copper’, and ‘All Nippon Air’. These figures are end-of-day
share prices in pence (adjusted for dividend payments) as reported in The Times newspaper
the following day. For the rest of this booklet, I will refer to these variables as Allied, Zambia
& Nippon.
day
Allied
Zambia
Nippon
1
1026.5
95
181.75
2
1018.5
94.25
182.25
3
1005
94.25
186.5
4
992.5
93.25
196.5
5
991
88.75
182
6
1009
88.25
188
7
1046
89.25
198
8
1063.5
94.5
197
9
1066
93.5
193.75
10
1057.5
93.5
189.75
4
11
1086.5
93.75
199.75
12
1052.5
94
199.5
13
1056.5
94.25
201.25
14
1070
94.75
203.75
15
1067
80
205.5
16
1067
80.75
200.75
17
1097.5
81
197.25
18
1095.5
81.5
200
19
1069
79.75
209
20
1060.5
80.5
214.5
21
1052
80.25
206.25
22
1083.5
80.75
204.25
Type the data on the right of this page
into a spreadsheet program such as Excel, and save the data as file C:\MFITDATA\SHARES1.XLS in
spreadsheet format (or, if you use Lotus 1-2-3, save it as file C:\MFITDATA\SHARES1.WK1). The
name SHARES1 reminds you that it contains data on share prices; the number 1 in SHARES1
indicates that it is the first version of this dataset. Any filename you choose should consist of 8
characters or less, not containing special characters like spaces.
You can keep this spreadsheet file in case you decide to add more data to your dataset, but it
cannot be read by Microfit. The next step is to create a file that Microfit can read: CSV format. To
do this in Excel, click on the word File (top-left corner of the screen); click on Save as and then
File of type: and select format CSV (comma delimited). Click the box labeled Filename: and
type C:\MFITDATA\SHARES1.CSV as the filename. If you use Lotus 1-2-3 (or a similar program),
you save a CSV file by “printing” the data to a file.
If you have a printer, you may wish to print this spreadsheet: you can use the printout to make sure
you have typed the data correctly, because it is not easy to print data from Microfit. It is also
possible to use the spreadsheet program to draw a graph of the data, but there is no need to do so:
creating graphs is easier in Microfit.
5
2.2 STARTING MICTOFIT
You should now leave the spreadsheet program. Assuming you have installed Microfit (as
explained in section 1.5), you can now start Microfit whichever Windows version you use,
you should see in icon for Microfit somewhere on your screen (this icon is four jigsaw
pieces colored brown, green, blue&red). Point the mouse at this icon and “double-click”,
which means press the left mouse switch twice within about half a second (if you find this
difficult, there is an another way: press the left switch on the mouse once, and then press the
ENTER key on your keyboard). Microfit may take a few minutes to start, so relax. When
Microfit has finished loading itself into your computer’s memory, you should see a window
like this:
The above image was created using Windows 3.1; if you have a different version of
Windows (such as Windows 95 or Windows98), then your screen will not look exactly the
same as that reproduced above.
2.3
TYPING DATA IN MICROFIT
Before you can use Microfit, you need to type in some data. For simplicity, we will start with
one variable. This variable is the end-of-day price of undated UK Treasury 2 1 % undated
2
stock, from 9
th
March to 13
th
April 1999, excluding weekends & bank holidays (this is
the same time-period as the share price data you typed into a spreadsheet, in section 2.1
above). These Treasury stock price are in pence, adjusted for dividend payments, as
reported in The Times newspaper the following day; this variable will be referred to as
6
Treasur in this booklet.
Before you can enter the data into Microfit, you must prepare Microfit by giving details
on the new variable(s) you intend to create. This is done by moving the mouse to the word File
(top-left corner of your screen) and clicking; then moving down the list to the word New and
clicking. Microfit will ask you about the frequency of the data you want to type in (you must
select “undated” data, because Microfit does not have a category of daily data), and the number of
observations and variables you are going to type:
click on the word Undated to change the “Data frequency”;
click the “Number of Observations” box, & type 22 ENTER
click the “Number of variables:” box, and type 1 ENTER
Now click on the OK button, to go to the next screen shown by Microfit. Microfit will now
give you a chance to choose the name of the new variable, and suggests the name “X1”
(highlighted in black). It is better to choose a name that will remind you what the data
represent, so replace the name “X1” by the name TREASUR (there is another box to the
right of the variable name , which you can use for a description of this variable if you wish).
Then move the mouse to Go button and click.
You can now type in the data. You do not need to type the observation number: Microfit
already shows this on the screen, as number 1 to 22. Move the mouse to the top row of data
(observation 1): delete the contents of the box (*NONE*), and type the first observation,
which is 51.8 ( as shown in table 2 below.) Then, when you press the downward arrow on
the keyboard, Microfit will take you to the place to type in the next observation. Type in the
rest of the observations shown in table 2. When you have finished typing, check you have
typed the data correctly; then move the mouse to the Go button (towards the right of the
screen), and click.
TABLE 2: TREASURY STOCK
day
Treasur
1
51.8
2
52.2
3
52.0861
4
52.38
5
52.4206
6
53.1915
7
53.0989
8
52.8926
9
52.13
10
52.4993
11
52.02
12
51.9092
13
51.5521
14
52.1282
15
51.5014
7
2.4
16
52.126
17
52.79
18
53.14
19
53.19
20
53.69
21
53.49
22
53.3
SAVING A DATASET IN MICROFIT FORMAT
Before you do anything else, you should now save the above data (variable Treasur) in a
Microfit format data file. To do this, move the mouse to the word File (top-left of the
screen), and click; then point to Save and click; then type in C:\MFITDATA\TREASURY.FIT
(what you type will appear just below the words File name:). The file name
“TREASURY” should remind you that the file contains data on the price of a Treasury
stock. Note the slight difference between the two names: TREASURY is the name of the
file, and Treasur (without the letter “y”) is the name of the variable it contains.
Now move the mouse to Ok and click; Microfit will ask if you want to keep all
observations – you do, so move to Ok and click. Your data should now be saved on the
hard-disk of your computer. Remember to keep a copy of this file on diskette – one way is
to save it again, using the filename A:\TREASURY.FIT (then write on the label of your
diskette, so you know this diskette is for Microfit files).
2..5
LEAVING MICROFIT
Before you do anything else, you should now save the above data (variable Treasur) in a
Microfit format data file. To do this , move the mouse to the word File (top-left of the
screen), and click; then point to Exit and click. Microfit warns you that you will lose any
unsaved data – assuming you have just saved the data, this is no problem, so move to Ok
and click.
CHAPTER 3:
WORKING WITH DATA FILES
In the previous chapter, you typed in some data; in this
chapter, you will read that data into Microfit, and save it in a
new (Microfit format) file.
8
3.1
READING A “CSV” FORMAT FILE
I hope you have taken a coffee-break since working on chapter 2, so that you feel refreshed
and able to cope with the next set of tasks. Go back into Microfit, as explained in section
2.2 above. In section 2.1, you typed some data into a CSV file; you should now read the
CSV file into Microfit. Point the mouse at the word File (top-left corner of the screen), and
click; point the mouse to the word Open and click; point the mouse to the down-pointing
arrow just below right of List files of type and point the mouse at CSV files. Now move
the mouse up and point at the box immediately below the words File name: and type
C:\MFITDATA\*.CSV and press the ENTER key. You should now see the filename
SHARES1.CSV appear just below where you typed C:\MFITDATA\*.CSV so point the mouse
at it, and click. Now move the mouse to the right and click on the word Ok. Microsoft gives
you some information about CSV files (click on Ok), and then warns you that CSV files
can be slow to read (click on Yes), and then tells you that the file has been read successfully
(click on Ok).
You should now check the data has been read into Microfit correctly, using table 1(in
section 2.1). To see the data on the screen, move the mouse to the Data button (near the top
of the screen, slightly to the left) and click.
3.2
ADDING A FILE TO THE DATA IN MEMORY
We will now combine the data you just read into memory (from the CSV file), with the
Treasury stock prices you saved as a Microfit file C:\MFITDATA\TREASURY.FIT, in
section 2.4 above. Move the mouse to the word File (top-left of the screen) and click; then
point to Add and click; then type in the filename C:\MFITDATA\TREASURY.FIT (what you
type will appear just below the words “Filename:”). Then move to Ok and click. You
should now find four variables in Microfit’s data store: three (Allied, Zambia&Nippon)
from the CSV file, and one (Treasur) from the C:\MFITDATA\TREASURY.FIT – you
should check this, by clicking on the word Data near the top of the screen.
We will want to use this dataset later, so save it as file C:\MFITDATA\SHARES2.FIT (if you
can’t remember how to save a file, see section 2.4 above).
3.3
READING A MICROFIT FORMAT FILE
Now make sure you can read back the Microfit format file you created. Point the mouse at
the word File (top – left corner of the screen), and click; point the mouse to the word Open
and click. Move the mouse up and point at the box just below the words File name: and
type C:\MFITDATA\*.FIT and press the ENTER key. Filename SHARE2.FIT should appear
just below where you typed C:\MFITDATA\*.FIT so point the mouse at SHARE2.FIT and
click. Move the mouse to the right and click on OK. Microfit warns you that you will lose
any unsaved data, but this is not a problem: you have saved the dataset. So, click Ok to read
in your file.
9
3.4
HOW TO USE MORE VARIABLES THAN YOU CAN FIT INTO ONE CSV FILE
This section is for future reference only: you will need to use it when you do your own
research, if you want to create a spreadsheet file containing a large number of variables. For
the present, you can skip the rest of this section, and go to section 3.5 below.
There is a limit to how many columns of data you can use in a CSV file: each line must be
less than 256 characters long. The number of columns you can fit in one file depends on the
width of your columns. So it is important to check the right – hand column of your dataset
in Microfit, to make sure it has been read in correctly. If the right – hand column of data in
Microfit is not the same as you typed into the spreadsheet, then you will need to do the
following:
(1)
If the right – hand variable has only been partly read into Microfit, erase that
variable. Then save the present (incomplete) dataset as a Microfit file, as explained in section 2.4
above. Write down which variable are saved in this .FIT file.
(2)
Go back to Excel or Lotus 1-2-3; going from left to right, find the first column,
which is not saved in the .FIT file. Insert two blank columns immediately to the left of this, and
copy column (containing dates or other labels) into the right-hand column you have just inserted.
Save this (dates or labels) column, and everything to the right of it, as a new .CSV file – as
explained in section 2.1 above.
(3)
Go back to Microfit, and read in the second .CSV file as explained above. Save it
in Microfit format, with a new filename. Check that the second file includes all variables, which
were not in the first .FIT file (if not, you will need to create a third .CSV file).
(4)
Combine all of these .FIT (Microfit format) files, as follows: read in the first .FIT
file into Microfit as explained in section 3.4 above, and then add the other .FIT files as explained
in section 3.2 above. Finally, save the complete dataset as a new .FIT file.
3.5
MANAGING DATA FILES
If you have worked through all of the previous sections in this booklet (except for section
3.4), then you should now have created four data files, of various types. There is a risk that
you may lose track of file: which is the most recent version of a data file? And how can
you tell if a file is a spreadsheet file, or a CSV file, or a Microfit file? Here is a list of data
files, in the order they were created:
C:\MFITDATA\SHARE1.XLS
(or C: MFITDATA\SHARES1.WK1)
C:\MFITDATA\SHARE1.CSV
C:\MFITDATA\TREASURY.FIT
C:\MFITDATA\SHARES2.FIT
The first point to observe is the last three letters of the filename: this tells us what type of file
10
it is. There are three types of file in the above list:
.XLS (or .WK1) spreadsheet files in Excel (or Lotus 1-2-3) format
.CSV
temporary files, to convert from spreadsheet to Microfit
.FIT
data in Microfit format
It is vital to be systematic with computer files. There is a risk that you may get to confused
that you have to type in a dataset again. Or suppose you find a typing mistake in a dataset
you typed, make a correction, and save it with a different filename: you may get confused as
to which file is the correct version. We don’t mind you retyping data, but we are concerned
that you may make mistakes in your research. So, number your files. In the above list,
number 1 in filename C:\MFITDATA\SHARES1.CSV reminds you that it is the first
version.
File
C:\MFITDATA\SHARES2.FIT
is
more
recent
than
C:\MFITDATA\SHARES1.CSV (later in this booklet, we will make more changes to this
file, and save it as file C:\MFITDATA\SHARES3.FIT).
Here are some other tips you may find useful:
﹡ choose a file name you will remember ( for example, MYDATA.FIT is a stupid
name!).
﹡ erase all .CSV files after you have used them; keep .XLS (or .WK1 files), and .FIT
files.
﹡ keep all of the files for one assignment in the same directory.
﹡ use the hard disk of your computer (not diskettes) as the main version.
﹡ keep backup copies on diskettes (if your hard disk breaks, every file on it may be lost).
﹡ label each diskette you use, so you know what files are on it.
Some students are too lazy to keep back-ups; they regret it when things go wrong later. You
should get into the habit of keeping at least one back-up copy of every file you create.
CHAPTER 4:
CREATING VARIABLES AND GRAPHS
This chapter introduces statistical analysis and
graphical presentation of data.
4.1 CREATING & DELETING NEW VARIABLES
You will probably need to create several variables in Microfit. Let’s start by the simplest:
a constant. You should include a constant term in every regression you run. There are
two ways to create a constant in Microfit: for both of them, you need to have a dataset to
work on, so load file C:\MFITDATA\SHARES2.FIT into memory (if you can’t do this, see
section 3.3). When you have read in a dataset, you should be taken straight into the
11
process window; but to make sure you are in the Process window, click on the Process
button near the top of the screen.
One way to create a constant is to use the button created by Microfit. To do this, click on
the button labeled = => Constant (at the bottom-left of the screen). Microfit asks you
for a name for this constant, so type the name CON in the box, and click the Ok button
(lower down the screen).
There is a second way to create a constant. Still in the Process window of Microfit, type in
CONSTANT = 1; and click the Go button (on the right). It appears that nothing happened,
but Microfit has created a new constant. To check, point the mouse to the Data button
(top – center of the screen) and click. You should see four variables (Allied, Zambia,
Nippon, & Treasur) and two constant (CON & CONSTAN). What is the value of each
1
constant? { }
Now, create another new variable: this time, the log of an existing variable. Microfit
calculates log to base e, which is often called ‘natural log’. Calculate the log of Zambia
Copper’s share price, as follows: click on the Process button (top – center of screen). If
you see the line you just typed in (CONSTAN =1;) then remove it using the delete key on
your keyboard. Type LZAMBIA = LOG(ZAMBIA); and click the Go button. To make
sure this variable has been created, click on the Data button, and check that LZAMBIA
has the value of 4.55387689 on day 1.
There is no need to keep both constants (CONSTAN and CON): they are identical.
Erase CON by clicking on the Process button; type in DELETE CON; and click the Go
button. Make sure CON has been removed, by clicking on the Data button: you should
still see CONSTAN but CON should now have been removed. Now save the file, as
C:\MFITDATA\SHARES3.FIT (see section 2.4 if you can’t do this).
4.2
MEAN, STANDARD DEVATION, & CORRELATION BETWEEN VARIABLES
Now, we can look at some statistics on the data you typed in. Click on the Process button,
and type in COR ZAMBIA NIPPON; and click the Go button. Microfit will display the
information about the two variables, shown in table 3 on the right of this page. The
“mean’ and ‘standard’ deviation’ rows can provide a useful check on your data: if you do
not get the same results, check you typed the data correctly.
12
Table 3: SUMMARY STATISTICS
We might expect that share prices would tend to follow a pattern, being high in times of
optimism but low when markets are pessimistic: some factors (such as a global
recession) may tend to reduce profits for firms in general, making shares less desirable.
If so, we would expect share prices for a typical firm to be correlated with most other
share prices. Is there a correlation between ‘Zambia Copper’ and ‘All Nippon Air’? To
test this, click on the Close button (lower down the screen); Microfit now produces a
“Correlation Matrix” between the two shares, as shown in table 4 below.
TABLE 4: CORRELATION COEFFICIENTS
Is there a correlation between the two share prices, according to table 4? The top – left
number is 1.0000, which tells us that the ‘Zambia Copper’ share price is perfectly
correlated with itself. The number we are interested in is -.58803 which is the
correlation coefficient between Zambia and Nippon (if you do not produce this number
on your computer, check you typed the data correctly). This number tells us that there is
a negative correlation between these two share prices.
If you would like a printout, look near the top of your screen for the world Result. Now
look below this, on the left half of the screen: there are seven buttons. The button on the
far left represents a printer: to check this, move the mouse so the pointed is over this
button (but don’t press a switch on the mouse). You should see a black – and – yellow
label appear, which says “Print”. Now click on the printer button, and you will see
another menu; just click on the OK button. You should now get a printout of the
correlation matrix.
Now click on the Close button, to get back to a previous Microfit screen. As an exercise,
use Microfit to work out the correlation coefficients between all three variables in the
13
dataset; are Allied share prices correlated with Nippon? Check your results with answer
2
{ } at the end of this booklet.
4.3
GRAPHS
If you type data in a spreadsheet, you can use the spreadsheet program to draw graphs.
However, it is generally easier to create graphs in Microfit: to do so, click the Process
button, and type PLOT ZAMBIA; and click the Go button. You should obtain a graph like
chart 1 on the right of this page.
Chart 1: GRAPH OF ZAMBIA SHARE_PRICE OVER TIME
Note that Microfit can draw two graphs on the same axes. To try this, click on the
Process button and type in PLOT ZAMBIA NIPPON; and click the Go button. As an
exercise, create a graph of the Treasur stock price against time, in the same way as you
just did for Zambia & Nippon.
CHAPTER 5:
THE CLASSICAL LINEAR REGRESSION MODEL
This chapter introduces regression in Microfit: it uses
OLS regression, with only one explanatory variable.
Later chapters will discuss more complex forms of
regression.
5.1
WHAT IS REGRESSION TRYING TO ACHIEVE
The aim of regression is to see if one variable is dependent on one (or more) other variables.
For example: suppose you measure the height of a tree near your home each year, and find
data on the world population, and the type the data into Microfit. If you calculate the
correlation coefficient of ‘tree height’ against ‘world population’ (as explained in section
4.2), you would find a positive correlation: but this does not mean that growth of your tree
is a cause or effect of world population – the correlation is simply because both grow over
14
time.
Regression results are generally better than correlation coefficients for detecting a link
between variables, for several reasons: on reason is that there are a number of diagnostic
tests which are produced with regression, and these diagnostic tests can warn us if there are
problems with the regression. If a regression fails one or more diagnostic tests, then we
should treat the results of that regression as unreliable. In the tree example (previous
paragraph), we would probably find that the regression had a problem of several correlation
(explained in section 7.3 below); this would warn us not to trust the apparent link between
tree height and the world population.
You should always be cautious when interpreting regression results. Even if regression
results suggest a link between two variables, and the diagnostic tests are acceptable, we
cannot be sure which variable is the ‘cause’ and which is the ‘effect’ (or the variables may
move together because they are both caused by something else). One way to decide which
is cause and which is effect is to use lagged variables (such as the one you will create in
section 7.3 below), to look for a delayed effect: if one change happens before another and
regression results suggest a link between the events, then it seems likely (but not certain)
that the first event causes the second.
5.2
A SIMPLE REGRESSION EXAMPLE
Go back to the dataset C”\MFITDATA\SHARES3.FIT which you created in chapter 4 (if you
don’t remember how to read a dataset into memory, see section 3.3).
For the first time we try regression in Microfit, let’s take a simple example: one dependent
variable, one explanatory variable, and a constant, using OLS (Ordinary Least Squares)
regression. We wish to test the equation
Zambia =
 +  (Nippon) + u
where u represents the error term;
 and 
are coefficients which Microfit will
estimate for us. EQUATION 1 is unchanged if we multiply
Zambia =
[EQUATION 1]
 by 1, so we can rewrite it as
 (1) +  (Nippon) + u
We must tell Microfit to estimate an equation with Zambia dependent on 1 and Nippon
(recall from section 4.1 that CONSTAN is equal to 1). Microfit will calculate the error term
u (Microfit refers to it as the ‘residual’ or ‘disturbance’), but you do not need to type in u or
 or  . Begin the regression by clicking the button labeled Single (near the top right of the
screen), and typing:
ZAMBIA CONSTAN NIPPON
(you must not type a semicolon at the end of the line). Microfit assumes the first name you
type (Zambia, in this case) is the dependent variable. Then click the button labeled Start
15
near the top – right corner of the screen. You should now see information like this on the
screen:
TABLE 5: REGRESSION RESULTS
It is essential that you learn to interpret regression results. Look at the above results: what is
the value of
values of
 estimated by Microfit? And what is the estimate of  ? Write down the
 and  , and then check what you have written with answer { 3 } at the end of
this document.
5.3
GOODNESS – OF – FIT STATISTICS
In addition to estimating coefficients such as
 and  , Microfit calculates the ‘error’
term for statistics and diagnostic tests. The error term (often called the “residual”) is the
term labeled u in EQUATION 1 above. The first goodness – of – fit statistic Microfit
2
reports is the “R – Squared” value (which is often written R ); this gives us a measure
16
of the proportion of variation of the dependent variable, which is explained by variation
2
of the independent variables(s). In this regression, the R value is .34578 which tells
us that 34.578% of the variation in Zambia is explained by variation in Nippon (make
sure you can find number .34578 in table 5). On the right of the R – squared statistic,
Microfit reports the R-Bar-Squared statistic: this is a modified version of R-Squared
(discussed in section 7.5 below).
Now compare these regression results with the correlation coefficient between
Nippon&Zambia, calculated in section 4.2 above: the correlation coefficient is
–0.58803 and the  coefficient (calculated by regression) is also negative. If there is
only one explanatory variable, the  coefficient from OLS regression must be the
same sign as the correlation coefficient (for the same pair of variables). There is another
2
connection between regression and correlation: the R value of the regression must be
equal to the square of the correlation coefficient (this only applies to a regression
equation with a single explanatory variable, so it will not apply to the next chapter of
this booklet). In this case, (-0.58803)
2
is equal to .34578 (approximately).
You can ignore the post – regression menu for now. The simplest way back to a familiar
Microfit screen is to click the Cancel button at the bottom of the screen; click the
Cancel button again; and then click the Process button.
5.4
SIGNIFICANCE LEVEL: IS A PATTERN SIGNIFICANT, OR JUST “RANDOM”
There is a decision to take, when doing research: you should choose a probability level,
which you consider ‘significant’. What you are choosing is how “unusual” a result must
be, before you consider it notable. For example, the average person is between 5 and 6
feet tall; but how tall would a person need to be before you describe them as
“unusually” tall”? If you decide that anyone taller than six feet is ‘unusually tall’, then
you have an objective test, which you could apply to everyone you meet. However,
another researcher might consider a different height to be ‘unusually tall’. It is desirable
to have an objective way of deciding which statistics are ‘unusual’, and which are not.
There is a convention in social sciences: each researcher should decide on a significance
level, and use this level to decide if a statistic is ‘unusual’. So, what level of
significance should you adopt? Some researchers adopt 1%; but most social science
researchers adopt 5% as the significance level. I recommend that you adopt the 5%
level, unless you are instructed otherwise. For the remainder of this document, I adopt
the 5% significance level, but remember that this level is arbitrary: my only reason for
using 5% is that most researchers do so.
Refer to the example of the first paragraph in this section: how would you apply a 5%
significance level to heights? The answer is to obtain data on the heights of a large
17
sample of people, and arranges them in height order; select the tallest 5% of the sample;
find the height of the smallest person of these tallest 5% in the sample (let’s call this H).
From then, you would say that anyone taller than H is ‘unusually’ tall, but anyone less
than H in height is not unusually tall.
CHAPTER 6:
(UNIT 6) MULTIPLE REGRESSION
This chapter explains the diagnostic tests reported by
Microfit, and introduces multiple regression.
6.1
DIAGNOSTIC TESTS: ARE REGRESSION RESULTS RELIABLE?
Microfit carries out various tests, which can warn you if there is a problem with a
regression. One of these is the Durbin – Watson statistic, which Microfit calls the “DW
– statistics”. This statistic should be around 2. Look back to table 5 (section 5.2 in the
previous chapter). The value of the DW statistic is .78752, which suggests a problem:
this is not close to 2. It is not obvious how close DW has to be to 2, for the regression to
be acceptable; so I recommend that you use the serial correlation test (discussed below),
and ignore the DW statistic.
Now look further down the previous regression results (table 5, section 5.2). Look half –
way down the table, for the words “Diagnostic Tests”. Below these two words, there are
tests for four possible problems with the regression: serial correlation; functional form;
normality; and heteroscedasticity (we will consider each of these below). Microfit
reports a probability for each statistic in square brackets; for these four tests, any
number in square brackets less than the chosen significance level (usually 0.05: see
section 5.4) indicates a problem with that test, and hence with the regression; all results
of that regression are unreliable.
For three of these four tests (serial correlation, functional form, and heteroscedasticity),
there are two alternative tests: they are labeled “LM version” and “F version”. Here,
“LM” stands for ‘Lagrange Multiplier’, and “F” refers to the F-distribution; but for this
booklet, you need not know how they are calculated. Usually, it does not matter whether
you study the LM or the F version of the test, because they give the same result: a
regression will pass both LM and F tests, or fail both tests. If a regression passes the LM
tests but fails the F test (or vice versa), then the results are unclear; you can report them,
but the findings are unreliable.
The first ‘problem’ Microfit looks for is serial correlation; this is similar to the Durbin –
Watson (DW) test discussed in the previous section. In general, this serial correlation
test is more reliable than the DW test, because DW only tests for first – order serial
18
correlation; but the serial correlation test in the Microfit ‘Diagnostic Tests’ section
th
th
considers serial correlation up to 4 order for quarterly data, or 12 order for monthly
data. In the case of annual or undated data, Microfit limits this serial correlation test to
st
1 order serial correlation (i.e. the same as the DW test). Nevertheless, even for our
data (which Microfit treats as undated), this serial correlation test is better than the DW
statistic because in this second serial correlation test, Microfit reports a probability level
[in square brackets]. According to table 5, this regression does have a problem with
serial correlation: the probability is reported as [.005] and [.004] for the LM and F
version (both well below 0,05); I return to this problem below.
The second problem is ‘functional form’. This problem arises when you choose an
inappropriate regression specification. For example, suppose there is actually a linear
relationship between a share price and its annual dividend (assuming all shares have a
similar level of risk). But suppose we ran an inappropriate regression: ‘number of
shares’ (meaning the number of shares you can buy for $100) dependent on ‘dividend’.
We would not expect a linear relationship between ‘number of shares’ and ‘dividend’,
because the share price is inversely related to ‘number of shares’. The fact that this
relationship is non – linear should be picked up by Microfit’s ‘functional form’ test. In
general, if your regression fails this test, you should transform one or more variables in
your regression. If you do not know the correct functional form, it may be worth starting
with the log of one or more variables (as explained in section 4.1 above), and using log
variable(s) in your regression instead of raw data. Table 5 indicates that the probability
of having an appropriate ‘functional form’ is [.196] or [.227], so neither LM nor F
versions suggest a problem.
The next potential ‘problem’ is the absence of normally – distributed errors. The theory
underlying OLS regression makes various assumptions, including the assumption that
the residual term u is normally – distributed (Pesaran M.H.& Pesaran B., 1997, Working
with Microfit 4.0: p.72). If a regression fails the ‘normality’ test, then the error term is
not normally – distributed, so we should not trust the regression results. A regression
fails this test if the number in square brackets is below 0.05 (see section 5.4). In table 5,
this probability is [.426] which is more than 0.05, so the regression passes this test: the
residuals are (approximately) normally – distributed. If a regression fails this test, you
could try calculating new variables based on a transformation (such as the log) of the
variables, and run a regression with these new variables.
The final diagnostic test is for heteroscedasticity. This examines whether error term u is
related to the explanatory variables. Ideally, we want the residuals to be just random; but
if there is a clear pattern in the residuals (such as a tendency for residuals to increase
over time), then the regression results are suspicious. For example, suppose we estimate
a regression where the dependent variable doubles every year, but the explanatory
variable has a linear trend (increasing by a fixed amount each year, approximately): this
regression might seem reasonably successful for the earliest observations, because the
annual increase in the dependent variable is only a few £; but the residuals will tend to
19
grow in the later observations ( when the dependent variable increases by many £s per
year). If you find heteroscedastic errors, you should create new variables which are
transformations of the original variables (including the dependent variable): a good
starting – point is to take the log of all variables in your regression, and run a regression
using these new variables. In the case of table 5, the numbers in square brackets are
[.184] (for the LM version) and [.201] (for the F version); both are above 0.05, so the
regression “passed” this test – in other words, heteroscedasticity is not a problem in this
regression.
In the “Diagnostic Tests” section of table 5, two numbers in square brackets are below
0.05: the LM and F tests for Serial Correlation. Hence, this regression fails the test for
serial correlation, so these regression results are unreliable (despite the fact that none of
the other three tests indicate a problem). We will try to solve this serial correlation
problem in chapter 7; but before then, we will study variables (in a different dataset),
which do not have such a serious autocorrelation problem.
6.2
AN EXAMPLE OF MULTIPLE REGRESSION
The
remainder
of
this
chapter
does
not
use
the
data
file
(C:\MFITDATA\SHARES3.FIT) you typed and used earlier, because we are not yet
able to solve the autocorrelation problem it contains (we will solve it in chapter 7). But
do not delete that file: we will use it again in chapter 7 and 8.
For this chapter, we use dataset C:\MFIT4WIN\TUTOR\PTMONTH.FIT provided
with Microfit 4; I refer to it as the PTMONTH dataset. It contains USA data on the
‘SP500’, the Standard & Poor portfolio of 500 shares. This file should have been copied
to your computer when you installed Microfit (if you cannot find it, you may need to re
– install Microfit on your computer). In Microfit, load the PTMONTH dataset into your
computer memory as explained in section 3.3 above. Now look at the screen, under the
button labeled Data and you should see the words “Current sample” followed by some
information about this dataset. How many observations, and how many variables, does it
4
contain? Check your answer with { } at the end of this booklet.
It is standard practice to include a constant in any regression you estimate, but this
dataset dose not includes one. So create a constant, using the name CONSTAN (as you
did in section 4.1). Next, create a time – trend variable: click the Process button, and
then click the Time trend button; you need to type in the name of a new variable, so
type MONTH and click the Ok button.
We wish to study the connection between the (weighted average) return on a portfolio
(vw), and the level of dividends on this group of shares (divSP0, by testing the equation
vw=    (month)   (divSP)  u
[EQUATION 2]
20
This regression has two explanatory variables (month & divSP), unlike EQUATION 1
(in section 5.2) which had only one explanatory variable; so EQUATION 2 is an
example of multiple regression, whereas EQUATION1 was not (it only had only t=one
explanatory variable). Now, test the regression specified in EQUATION 2 by clicking
the button labeled Single (top right of the screen). Do not use the Multiple button: that
refers to Vector AutoRegression, which means using several dependent variables at once
(Vector AutoRegression is beyond the scope of this booklet): confusingly, the Single
button is the one to use for the ‘multiple linear regression’ referred to in unit 6 of the
‘Quantitative methods for financial management’ folder. Now type:
VW CONSTAN MONTH DIVSP
and then click the button labeled Start (top – right corner of the screen). Look at four
the diagnostic tests (as we did in section 6.1): which of these four tests did the
5
regression “pass”, and which did it “fail”? Check your answer with { } at the end of
this booklet. Now click the Close button (which takes us to the ‘Post regression menu’);
click the Cancel button, and (after Microfit puts up another menu) click the Cancel
button again.
6.3
WHICH FUNCTIONAL FORM SHOULD YOU USE?
Section 6.1 discussed four diagnostic tests, each of which checks for a problem in a
regression. For three of these problems, I suggested that if the problem occurs, you
might be able to solve it by creating new variables, which are transformations of the
original data. In particular, I suggested that computing the log of a variable may help.
Sometimes, other transformations may be better. For example, rather than regressing
‘number of factory closures’ on ‘total output of the industry’, it may be better to
calculate [1/(number of factory closures)] as a new variable, to represent the average
life – span of a factory. How can you tell which transformation to use?
As a guide, it is desirable that every variable you use in a regression is normally –
distributed: this applies to both explanatory and dependent variables. So if a variable
is far from being normally – distributed, you should consider creating a new variable,
which is a transformation of it. Sometimes, this seems impossible: for example, you
cannot transform a dummy variable (explained in section 7.2) to make it closer to a
normal distribution. The log transformation (recommended in section 6.1) is often
useful, but has limitations: the log of zero, and the log of a negative number, are not
defined. This means that if any observation of a variable is negative, then you cannot
use the log transformation on that variable (if some observations of variable X are
zero, but none are negative, then you could consider creating the transformation Log
(X+1) to solve this problem).
You can see if a variable is normally – distributed, by using the HIST command. You
need to type:
21
LDIVSP =LOG (DIVSP);
ENTER
HIST
DIVSP;
ENTER
HIST
LDIVSP;
ENTER
The first of the above lines created a new variable, called LdivSP. The next two lines
produces histograms of divSP&LdivSP. These two charts are shown side – by – side
below (as charts 2 and 3), to show how the distribution of LdivSP is different to that
of divSP. Looking at charts 2 and 3, which of these 2 variables do you think is closer
to a normal distribution?
chart 2: A SKEWED DISTRIBUTION
chart 3: A SYMMETRIC DISTRIBUTION
Charts 2 and 3 show the distribution of a variable before and after a log transformation:
the left – hand chart is a histogram of divSP, and the right-hand chart a histogram of
the log of divSP. On each histogram, Microfit superimposes a normal distribution: a
bell – shaped curve line. By comparing this bell curve with the histogram we can see
that the left – hand histogram is ‘skewed’ (asymmetrical): it has a long ‘tail’ on the
right (a few values are much higher than the average) so divSP is far from normally –
distributed. The histogram in the right – hand chart is closer to the normal distribution
curve. The distribution in the left – hand chart is a common pattern in economics &
finance: for example. a similar distribution applies to incomes (the few people who
are millionaires would form a long ‘tail’ on the right of such a diagram). In such cases,
taking the log often makes a variable, which is closer to a normal distribution. Chart 2
& 3 suggest that LdivSP is closer than divSP to a normal distribution; in the
following chapter (section 7.1), we will modify a regression by changing divSP to
LdivSP.
6.4
COLLINEARITY
Unit 5 of the ‘Quantitative methods for financial management’ folder discussed the
22
assumptions required for OLS regression to produce ‘Best Linear Unbiased
Estimators’; several of these assumptions can be assessed using the diagnostic tests
produced by Microfit (as examined in the previous chapter). The same diagnostic
tests apply to this chapter, but there is now another complication: collinearity. The
word ‘collinearity’ describes a regression where two or more explanatory variables
are closely related to each other; the word ‘multicollinearity’ has the same meaning .
Suppose we measure the output of workers of different ages, and find their ages and
amount of work experience, We could run a regression with output as dependent
variable, with age and work experience as explanatory variables. But work
experience is likely to be closely correlated with age, so it is difficult to separate the
effects of these two factors.
Collinearity could not arise in chapter 5, because there was only one explanatory
variable. But in this chapter, we use more than one explanatory variable. Is
collinearity a problem in our latest regression? Look at the diagnostic statistics
(produced by Microfit) reported in table 6 (section 7.1): which of these warns us
about collinearity? The answer is none of them. This is not simply a weakness of
Microfit – econometricians have not yet agreed how to measure collinearity, or how
much collinearity is “too much” for regression results to be reliable. Bryman &
Cramer suggest measuring the correlation coefficient between explanatory variables,
and rejecting a regression if there is a correlation coefficient greater than 0.8 (or less
than –0.8) between any two of the explanatory variables (Bryman A. & Cramer D.,
1990, Quantitative data analysis for social scientists, Routledge: London, p.236).
Other writers (e.g. Pesaran M.H. & Pesaran B., 1997, Working with Microfit 4.0:
interactive econometric analysis, OUP: Oxford, p.191) would disagree, claiming that
when deciding if collinearity is a problem, we should consider not just correlation
coefficients between explanatory variables, but also the sample – size.
For this course, we expect you to understand the problem of collinearity, but we do
not require you to test for it (if you wish to test for collinearity, you can use the COR
command to find correlations between variables: see section 4.2). In extreme cases of
collinearity, Microfit cannot estimate a regression, and shows the message
Correlation matrix near singular (possible multicollinearity) !
**CALCULATIONS ABANDONED**
If you see the above message, you should drop one of your explanatory variables (or
obtain more data), and then run the regression again.
CHAPTER 7:
(UNIT 7) TOPICS IN MULTIPLE REGRESSION
This chapter focuses on some issues often encountered in
multiple regression, including serial correlation.
23
7.1
DUMMY VARIABLES
The regression in section 6.2 (using dataset C:\MFIT4WIN\TUTOR\PTMONTH.FIT)
didn’t pass all diagnostic tests. The problem is ‘outliers’, including the October 1987
crash (Pesaran M.H.& Pesaran B., 1997, Working with Microfit 4.0, p.243); such
outliers are ‘shocks’, when information suddenly becomes available to stock markets.
We will solve this using dummy variables where residuals indicate a shock; if you can,
it is better to include a variable, which measures the cause of shocks. I found that
three dummy variables are sufficient to correct for non–normality of residuals:
October 1974, January 1975 & October 1987. Create a dummy variable: click the
Process button & type: $ CREATE A DUMMY VARIABLE; ENTER
OCT74 = 0;
ENTER
SAMPLE 1974M10 1974M10;
ENTER
OCT74 = 1;
ENTER
SAMPLE 1948M1 1993M5;
ENTER
LIST OCT74;
ENTER
The first of the above six lines starts with a $ symbol, which tells Microfit to ignore
the rest of that line; it is just a comment, to tell you what the lines do. Always end a
comment with a ; (semicolon) or Microfit treats the next line as part of the comment.
The next line creates a new variable (Oct74) and sets it to zero. The next line uses the
Microfit SAMPLE command, to restrict the data to just month 1974m10 (type it
twice, to use data from 1974m10 to 1974m10). For 1974m10, Oct74 is set to 1. The
next line resets the sample to all available data. The final line lists this new variable
on your screen. Before going further, save the above six lines as a ‘.EQU’ file, in case
you want it later – click on the ‘save .EQU file’ button indicated below:
24
Microfit asks for a filename, so type C:\MFITDATA\PTMONTH.EQU ENTER and
click the OK button. Now run these six lines, by clicking the Go button. To check
they worked correctly, click the Data button and confirm that Oct74 is zero for every
month except October 1974.
You now need to create two more dummy variables, in the same way: call them
Jan75 (equal to 1 for 1975m1) and Oct87 (equal to 1 for 1987m10). If you cannot
6
create these two variables, look at answer { } at the end of this booklet.
Next, you should test the following regression equation:
vw     (month)   (divSP)   (Oct 74)   ( Jan75   (Oct 87)  u
[EQUATION3]
which is based on EQUATION 2 in section 6.2 (chapter 6); but the new regression
equation adds three dummy variables. You should get the results shown in table 6;
which of the four diagnostic tests does it fail?
25
Table 6: MULTIPLE REGRESSION
Table 6 indicated a problem with functional form: the probabilities were [.018] and
[.019] for the LM and F versions respectively (both are below 0.05, so the regression
“fails” the test). Hoe can we solve this?
For this regression, I have found (by experimentation) that using the LdivSP (Log of
divSP) instead of divSP seems to solve this problem. As indicated in section 6.3
above, divSP has a skewed distribution; whereas LdivSP is closer to a
normally–distributed variable. In general, it is desirable for all variables in a
regression to be approximately normally–distributed. So, now, estimate a new
regression based on EQUATION 3; but this time replace divSP by LdivSP (the Log
of divSP) to solve the ‘functional form’ problem indicated in table6.
vw     (month)   ( LdivSP)   (Oct 74)   ( Jan75)   (Oct 87)  u
[EQUATION 4]
26
You should obtain the results shown in table 7. Have we solved the functional form
7
problem, and are there any other problems? { }
Table 7: REGRESSION COEFFICIENTS
7.2
INTERPRETING REGRESSION COEFFICIENTS
What do the results in table 7 tell us? Focus near the top–left corner of table 7. The
third line tells us the name if the dependent variable (vw). Below this, under the word “Regressor”,
is a list of dependent variables used in the regression. To the right of this list (under the word
“Coefficient”) is a column of numbers, which are the coefficients estimated by Microfit. The first
number (.077305) corresponds to
 in EQUATION 4. The second number represents  in
EQUATION 4; it is written by Mictofit as -.1195E-3 which is a shorthand for -.1195 multiplied
3
by 10
but would be better represented as -.0001195 in a report. The fact that this is negative
indicates that vw tends to decrease as month increases, if all other variables in the regression
remain the same (so vw appears to have a downward trend). The following coefficient is .027632,
which corresponds to  in EQUATION 4. The fact that this is positive tells us that increases in
27
LdivSP tend to be associated with increases
in vw (if all other variables are unchanged). We
can also tell from regression results such as table 7 which coefficients are statistically significant;
this issue is explored in section 7.5 below.
7.3 SERIAL CORRELATION
We will now return to the dataset you typed in earlier, and saved as the Microfit-format file
C:\MFITDATA\SHARES3.FIT (last used in chapter 5). When you used the same dataset earlier,
you discovered a problem of serial correlation with the regression specification (EQUATION 2);
see table 5. This is a very common problem with time-series data (such as share prices), so we
need to solve it. Thankfully, there is a relatively simple solution; but before moving to this, let’s
look more carefully at one variable.
Look at variable Zambia, as shown in chart 1 of this document (section 4.3). There is a large fall
in the share-price at day 15; but apart from that, the price remains fairly steady over time – if you
think of Zambia as today’s share-price, then today’s share-price will be similar to yesterday. Let’s
investigate this, by looking at the correlation between today’s and yesterday’s price, by creating a
lagged variable. The new variable you should create is Zambia 1, which is Zambia lagged by one
day. To do this, click the Process button in Microfit, and type:
ZAMBIA1 = ZAMBIA(-1);
ENTER
LIST ZAMBIA ZAMBIA1;
ENTER
Table8: A LAGGED VARIABLE
where ”(-1)” after the variable name Zambia means ‘lag
this variable by one observation’. The second of the
above lines will list these two variables on your screen,
so you can check that variable Zambia1 has been created
successful. You should obtain the results shown in table 8
on the right of this page. Notice that Zambia1 data is
missing for the first day, and that the second observation
of Zambia1 is identical to the first observation of
Zambia.
Now, find the correlation coefficient between Zambia
and Zambia1, using the method explained in section 4.2
above. Check your answer with {8} at the back of this
booklet. The value of the correlation coefficient is near
+1, which indicates a strong positive correlation between
Zambia and Zambia1; it is evidence of ‘serial
correlation’, which is also called “autocorrelation”
(meaning correlated with itself).
We seem to have found serial correlation: the correlation between Zambia & Zambia1 is near 1.
But let’s be scientific: is this correlation statistically significant? We will use the Microfit
command COR in a different way to section 4.2 above: this time, type just one variable name
(rather than two, as you did in section 4.2). Type COR ZAMBIA; into the Microfit Process window
28
(removing anything you typed earlier), and click the Go button; Microfit then reports statistics
such as the mean & standard deviation, and you should then click on the Close button; Microfit
then indicates the extent of autocorrelation of variable Zambia (you should obtain the same
results as are shown in table 9 below). Microfit also creates a chart, which you can ignore.
Look at the top of table 9. Microfit reports a coefficient of 0.77943 for the first-order serial
correlation; this must have the same sign as the correlation coefficient between Zambia &
Zambia1 you produced earlier in this section. Table 9 also indicates two statistics which we can
use to assess if this serial correlation is statistically significant: the ‘Box-Pierce’ and ‘Ljung-Box’
statistics. If the numbers in the square brackets is less than 0.05 for either of these, then the serial
correlation is statistically significant (see section 5.4). In this case, each of these probabilities is
[.000] so we conclude that the first-order correlation is statistically significant. For this booklet,
you can ignore all of the rows below this: we are not concerned with second-order (or higher-order)
serial correlation.
Table 9: AUTOCORRELATION OF VARIABLE ZAMBIA
7.4 DIFFERENCING A VARIABLE
Having established that Zambia does indeed show serial correlation, we now consider a solution.
The standard approach for this problem in time-series data is to “difference” the variable: this
means calculating the difference between the value of the variable on one day and the value of the
same variable on the previous day. In Microfit, this is done by clicking the Process button, and
typing:
DZAMBIA = ZAMBIA – ZAMBIA (-1); ENTER
which will create a new variable called dZambia (include a letter “d” in the name dZambia to
remand you that this variable is differenced). Has this new variable solved the serial correlation
problem? Calculate the serial correlation of dZambia (see {9} if you cannot do this). Is the serial
correlation for this new variable statistically significant? Look in the [] brackets on your screen, at
the row representing first-order serial correlation: the values are [.667] and [.655] so both are
above 0.05 and hence not statistically significant. We can be reassured that differencing has solved
the serial correlation problem. Does the other variable in table 5 (Nippon) show autocorrelation?
Carry out an autocorrelation test on Nippon; is it a problem? {10}
Even if there were no autocorrelation in Nippon, it would be better to use a differenced version of
Nippon in a regression with dZambia. We cannot use Zambia because of serial correlation (and
29
hence risk of spurious results: see the tree example in section 5.1). but if we regress dZambia on
Nippon (differencing one variable but not the other), we may fail to detect a genuine relationship.
If there is a linear relationship between Zambia & Nippon, we should find a significant
relationship between dZambia and a differenced version of Nippon.
Calculate a variable called dNippon, equal to the first difference of Nippon (see {11} if you
cannot do this). I have found that dNippon does not show significant autocorrelation (you do not
need to test this). Next, run a regression with dZambia dependent on dNippon and the constant.
Does this regression still have a problem with serial correlation? {12} However, there is now a
problem with normality (probability [.000]). The problem that residuals of the new regression are
not normally-distributed suggests that we cannot rely on the results. I experimented with taking
logs of Zambia and Nippon (and then differencing to remove autocorrelation), but even this
regression still had non-normally-distributed residuals. To see why (the residuals of) Zambia
shows non-normality, create a histogram of Zambia by clicking the Process button and typing:
HIST ZAMBIA; ENTER
You should obtain a histogram like chart 4 on the right. Looking at the histogram, we can see why
dZambia is not normally-distributed (and hence why the previous two regressions do not have
normally-distributed residuals): there is an outlier (a value very different from most observations)
between -16.42 & -9.75. In a large sample-size, a few outliers need not prevent the distribution
from being approximately normal; but we have few observations. If you found more observation
on Zambia (and the explanatory variable) to make dZambia closer to a normally-distributed
variable, then the regression residuals might become normally-distributed. A second way to
produce normal residuals with this dependent variable is to limit the sample, to exclude the
outlying observation; but I would not recommend doing so here, due to the small sample size, and
because the outlier is not near the start or end of the data. A third option is to transform the data,
but I am not aware of a transformation which would create a normally-distributed variable from
Zambia.
Chart 4: A NORMAL DISTRIBUTION?
Let’s go back to the residuals of the latest regression (dZambia on dNippon). Chart 5 shows the
residuals for each observation. You do not need to replicate this chart; I created it in Microfit as a
“3-dimensional” image, to make it look different to other charts in this document. There is a large
30
negative residual at day 15, which corresponds to the drop in the Zambia share-price at day 15:
this negative value at day 15 is
Chart 5: RESIDUALS
visible in the residuals (in chart
3) because it was not explained by the explanatory variable (dNippon). This negative value at day
15 is the outlier on the
left-hand-side of chart 4. You can
also see this as a sudden drop at
day 15 in table 1 (section 2.1).
In section 6.1 above, we
discussed four possible problems
with
regressions:
serial
correlation; functional form;
normality; and heteroscedasticity.
Before trusting the results from a regression equation, you should check that the regression passes
all four tests. If a regression fails any test, then there may be a risk of spurious results. So far, we
cannot tell if there is a link between the share prices for Nippon & Zambia; even after
differencing, we were unable to produce a regression which satisfied all diagnostic tests, so the
above regression results cannot be relied on. The following section will produce a regression
equation which does pass all the diagnostic tests.
7.5 WHICH VARIABLES SHOULD BE INCLUDED IN A REGRESSION?
The problem with the regression of dZambia dependent on dNippon (in the previous section) was
that dZambia was not normally-distributed, so the regression did not have normally-distributed
residuals. Ideally, we would like to know the cause of the sudden drop in the share price at day 15,
and add this variable to the regression as an explanatory variable (as well as dNippon). The fall
may be due to an announcement of falling profits: or a drop in demand for copper; or an accident
in one of the firm’s factories. I do not know the cause of the price drop on day 15, but we can try
creating a new dummy variable set to 1 on the day of the sudden fall (and zero on other days); this
may give a satisfactory regression. To try this, click on the Process and type
$ CREATE A DUMMY VARIABE; ENTER
DAY15 = 0;
ENTER
SAMPLE 15 15;
ENTER
DAY15 = 1;
ENTER
SAMPLE 1 22;
ENTER
LIST DAY15;
ENTER
The above lines create a new variable (Day15), equal to 1 on day 15 and equal to zero on all other
days. Next, test the regression equation
dZambia = α(1) + β(dNippon) + ι(Day15) + μ
[EQUATION 5]
where ι is an extra coefficient for Microfit to estimate, and other symbols are as explained above.
To estimate this new regression specification, click on the button labeled Single and repeat the
process you used in section 5.2, but this time adding an extra explanatory variable. You need to
type:
31
DZAMBIA CONSTAN DNIPPON DAY15
and click the Start button on the right. Look at the regression results on your screen: does it pass
the diagnostic test? You should find that this new regression still fails the normality test. After
experimenting, I found that one way to produce an acceptable regression is to create two more
dummy variables: Day5 and Day8, each defined similarly to Day15 (Day5 equal to 1 for day 5,
and zero for all other days; Day8 equal to 1 for day 8, and zero for all other days). You should now
create these two new dummy variables. Next, try the following regression:
dZambia = α(1) + β(dNippon) + ι(Day15) + δ(Day5) + ε(Day8) + μ
[EQUATION 6]
where δ and ε are two additional coefficients to be estimated. You should obtain the results shown
in table 10; if you do not, check the specification you used, and run the regression again until you
get the results shown here.
Table 10: REGRESSION WITH THREE DUMMY VARIABLES
Now save your data file, as C:\MFITDATA\SHARES3.FIT (if you can’t do this, see section 2.4).
Microfit will warn you that there is already a file of the same name; you should save it again (with
the same filename) so click on Yes. By saving this file again, you will keep the changes you have
made to the dataset (such as the dummy variables Day15, Day5, and Day8 you created). To keep a
32
copy on diskette, save it again using the filename A:\SHARES3.FIT and keep your diskette some
where safe.
We now have a regression which passes all four diagnostic tests (see section 6.1), so we can
consider the findings. The sample-size is small, and we had to add three dummy variables to
correct for the ‘shocks’ in Zambia prices, so our results should not be taken too seriously.
Nevertheless, let us look at the results (in table 10): what have we learnt? The first place to look is
to see which variables are statistically significant, by looking at the “T-ratios”. This ratio is simply
the coefficient divided by the standard error of that estimate. In the case of dNippon, for example,
the T-ratio is (-.016800/.032019) = -0.5426885; this is very close to the value -.52471 reported by
Microfit (the slight difference is due to rounding errors). A T-ratio of more than about 2 (or less
than about -2) is statistically significant at the 5% level (see section 6.1); but I use the phrase
“about 2” because the point at which a T-ratio is statistically significant depends on the number of
observations. You can look up this value in tables of the T distribution (in the back of many
statistical textbooks), but there is an easier way. At the far right-hand-side of table 10, we see a
number in square brackets next to the T-ratio: this tells us the probability of a variable having that
T-ratio. In the case of dNippon, for example, the probability that a T-ratio is -.52471 (with the
sample size of 21 observations) is [.607] (check that you can locate this number in table 10). As
discussed in section 5.4 above, there is a social science convention that a probability of under 5%
(i.e. 0.05) is ‘statistically significant’. Because the probability for dNippon is above 0.05, we can
say that this is variable is not statistically significant. A variable with a probability less than 0.05
(based on a T-ratio) does not significantly improve the regression. We can say that taking account
of its standard error (.32019), the dNippon coefficient (-.016800) is close to zero. If we created a
variable made up of random numbers, this should be unrelated to dNippon; but it is quite likely
that such random numbers (when used in a regression) would produce a T-ratio of -.52471 or
lower, or a positive T-ratio of +.52471 or higher.
So this is the test for whether or not to include an explanatory variable: is the probability (based on
the T-ratio) less than 0.05? If so, then that variable appears to be a significant influence on the
dependent variable; whereas if the probability is not under 0.05, then it does not seem to be
significantly linked to the dependent variable, and can be dropped from the regression. Now, look
at all variables in table 10: which of these are statistically significant? Check your answers with
{13} at the end of this booklet. Does this regression suggest that dZambia and dNippon are
significantly related to each other? They do not appear to be linked, because the T-ratio for
dNippon is nor statistically significant.
Some researchers like to have an “objective” test as to how many explanatory variables should be
included in a regression; it may seem possible to use the R2 statistic (called “R-Squared” in
Microfit) for this purpose, because any variable which increase the R2 value helps explain more of
the variation in the dependent variable. In fact, the R2 value is not appropriate for this task: almost
any explanatory added to a regression will increase the R2 value, even if it has little connection
with the dependent variable. The R-Bar-Squared statistic is better: this is related to the R2 statistic,
but the definition of R-Bar-Squared is such that it is reduced as more explanatory variables are
added to the regression.
33
Table 11: COMPARISON WITH TABLE 10
The regression results in table 11 allow us to compare R2 values with the previous regression
(there is no need to carry out this new regression yourself). The only difference between the
regression used to produce tables 10 and 11 is that the table 10 regression includes an extra
variable: dNippon. Compare the R2 values of these two tables: the R2 value in table 11 is .96579,
which is increased slightly to .96636 in table 10 by adding variable dNippon. This might suggest
that the table 10 regression is better, because it explains a (slightly) higher proportion of the
dependent variable than the table 11 regression. Yet we found earlier in this section that dNippon
is not significantly linked to dZambia. In deciding where table 10 regression is better or worse
than that for table 11, comparing R2 values gives the ‘wrong’ answer (‘wrong’ because dNippon is
not statistically significant).
Now, compare the R-Bar-Squared values of tables 11 and 10. The value falls from .95975 in table
11 to .95975 in table 10, so adding the extra variable has given us the ‘correct’ answer: that we
should not include variable dNippon in this regression. Nevertheless, you should not use either R2
or R-bar-squared to decide which variables to include in a regression: it is more appropriate to use
the method explained in this section, based on the T-ratio.
For many research projects, the aim is to try to “explain” variation in a key variable; there may be
a number of variables which are thought likely to influence this key variable. One approach often
used is to include all possible explanatory variables initially, and then remove each variable which
is not statistically significant (using the probability of the T-ratio). If you use the approach, it is
advisable to keep a constant term in every regression even if it is not statistically significant. In
general, the ‘answer’ you seek is a list of “cause” of the key variable, and you would only report
regression results when all explanatory variables are statistically significant. However, other
research has different aims: you may be testing a specific claim, such as “increasing the money
supply causes inflation” – in this case, finding that a particular explanatory variable is not
34
statistically significant is an ‘answer’, so you could report results of a regression containing a
non-significant variable.
CHAPTER 8:
(UNIT 8) REGRESSION AND THE CAPM
This chapter looks at CAPM and portfolio theory: it explains how to
calculate the beta of an asset, and how to test CAPM.
8.1 ESTIMATING ‘RISK’ AND ‘RETURN’ FROM SHARE DATA
This chapter discusses the CAPM (Capital Assets Pricing Model), which is explained in unit 8 of
the ‘Quantitative methods for management’ folder. This topic is relevant to the “best” way to
invest in stocks & shares. This choice depends partly on how much risk an investor will accept to
obtain a higher return; but CAPM suggests that some portfolios are ‘better’ than others, regardless
of how risk-averse you are.
The concept of a ‘risk-free’ asset is central to this chapter. Government stocks are often described
as “risk-free”, because most people trust the government to repay the loan; but such stocks are not
literally free of risk. Consider the 21/2% Treasury stock price, shown in table 2 (section 2.3): you
would lose money if you bought on day 6 (at 53.1915 pence) and sold on day 15 (at 51.5014
pence). There is a way to make a risk-free purchase (Brealey & Myers, 1996: p.144): on day 1,
buy a government stock with an expiry date, and hold it until its expiry date. In this case, the
interest-rate is fixed, because we will not sell it before the expiry date. For this chapter, we will
make two assumptions: that there is a Treasury stock which expires on day 22; and that this (dated)
stock has the same return as the (undated) Treasury stock Treasur. I will refer to this (imaginary)
government stock with an expiry date as rskFree; it has the same average return as Treasur
(i.e..0013941 per day), but no risk.
One of the key ideas in this topic is the ‘portfolio’, which means placing your savings in a mixture
of assets. Suppose you have some money to invest (students may find this difficult to imagine!).
You could spend all of your savings on shares in a firm which sells umbrellas; if there is a lot of
rain next year, the firm may have high profits, and pay a large dividend - so you will make money.
But this is risky – next year may be sunny, in which case few umbrellas will be sold, so your
dividends will be low. A safer way is to “diversify” your savings, which means buying more than
one type of asset. For example, you could put half of your savings in the umbrella firm, and the
rest in a firm selling ice-creams – that way, you should get a reasonable return on your savings,
whatever the weather.
In financial management, ‘return’ on investment is usually measured as the proportionate increase
in value over a fixed time (per day, in this dataset). So we use the following definition of return:
r = [p – p(-1)]/p(-1)
[EQUATION 7]
35
where r is return on an asset; p is the price of the asset, and p(-1) the price on the previous day.
Let’s now calculate the return for one of the firms in the dataset C:\MFUTDATA\SHARES3.FIT
you created in section 4.1 above. We will look at share price Allied, and use the name returnA for
the return on Allid (it is important to be systematic when choosing variable names). In Microfit,
create the new variable by clicking the Process button, and then typing
RETURNA = (ALLIED – ALLIED(-1))/ALLIED(-1); ENTER
You should now work out the returns on the two other shares (Zambia and Nippon) in the same
way: call these new variables returnZ and returnN. We want to know the average return for these
three variables; the easiest way is to use the COR command, as follows:
COR RETRUNA RETURNZ RENTURNN;
ENTER
You should now have four lines visible in the Process window: three lines to calculate returns on
different shares, and the above COR line. The COR line must appear after the other three lines.
You can now run these four lines, by clicking the Go button. You should obtain the following
results:
Table 12: SUMMARY STATISTICS
On your computer, you should see
the same results as shown in table
12. Microfit also computes a
correlation matrix, but this is not
helpful (so ignore it). The first line
of table 12 tells us that Microfit
used observations from 2 to 22;
why did Microfit not use
observation1? (check your answer with {14}).
Table 12 shows that the average for returnZ as -.0069281 (a fall of about two-thirds of 1% per
day). The fact that Zambia copper has a negative return is consistent with the data in table 1
(section 2.1): the Zambia share price fell from 95 pence to 80.75 pence. If we are looking for an
investment, the falling price over these 22 days might put us off buying Zambia shares. We have
data on Allied & Nippon shares; which of these is a better investment? Nippon has a higher
growth-rate than Allied, because the average value of returnN is more than that of returnA (see
table 12). But as well as the return, we should consider ‘risk’.
The conventional definition of the “risk” of owning a share is the standard deviation of the return
of the return on that share; we can use this definition to compare the riskiness of different shares.
The standard deviation of the return on each share is shown in table 12, on the line below ‘Mean’:
returnA (standard deviation 0.018175) has a lower risk than returnN (standard deviation
0.032514). some investors may prefer Allied, to reduce the risk; others might prefer Nippon with
a higher return. Or, we could combine Allied & Nippon shares in a portfolio.
8.2 CHOOSING THE BEST PORTFOLIO
Suppose we invest one penny in a portfolio, on day 1.If we spent the whole penny on Allied, we
would get (1/1026.5) shares (the price was 1026.5 on day 1: see table1, section 2.1). Or we could
spend one-fifth (0.20) of a penny on Allied to buy 0.20 (1/1026.5) shares; this would leave
36
four-fifth of a penny, which would buy 0.80(1/181.75) of the Nippon shares. The value of our
portfolio would be:
PORTFOLO [20% Allied, 80% Nippon] = (0.20(1/1026.5)Allied) + (0.80(1/181.75)Nippon)
The value of this portfolio would change from day to day, as the values of Allied and Nippon vary;
the return on the portfolio is a weighted average of returns on Allied & Nippon. The best way to
calculate the return of a portfolio is by computing the value of this portfolio as a new variable:
PORTFOL20 = ((0.20/1026.5)*ALLIED) + ((0.80/181.75)*NIPPON);
RETURNP20 = (PORTFOL20 – PORTFOL20(-1))/PORTFOL20;
In Microfit, the above asterisk * means ‘multiplied by’. The “20” in Portfol20 refers to the fact
that 20% of the portfolio consists of Allied shares. The formula for the return on Portfol20 is
similar to that for a share, in section 8.1 above. Type the above two lines into Microfit to create
Portfol20 and returnP20; and then type the equivalent pair of lines for a portfolio containing
40% (rather than 20%) Allied shares: call these Portfol40 & returnP40. Do the same for a
portfolio containing 60% Allied shares, and for a portfolio containing 80% Allied shares.
Compare what you typed with answer {15} at the end of this booklet, and run them in Microfit by
clicking the Go button. There is a relationship between Nippon and Allied, and the four portfolios
you have created: you can thick of Nippon as a portfolio with 0% shares in Allied, and Allied as a
pportfolio with 100% shares in Allied. So in the rest of this chapter, I use the phrase ‘six portfolio’
to include Allied and Nippon. Now use the COR command (as in section 8.1) to find the mean &
standard deviation of the six portfolios: return, returnP20, returnP40, returnP60, returnP80,
returnA (you should get the results shown in table 13).
Table 13: STATISTICS ON 6 PORTFOLIOS
Microfit then produces a correlation matrix, which you can ignore. Look at table 13, and focus on
the row labeled ‘Mean’: which of the six portfolios gives the best return? And which is the lowest
risk, based on the standard deviations? Check your answer with {16} at the end of this booklet. The
choice of which portfolio is “best” may seem arbitrary, because some investors are more
risk-averse than others. But we can say that returnP80 is better than returnA: returnP80 has a
higher return and a lower risk than returnA. To compare the risk and return for all these six
portfolios, look at chart 6.
Chart 6: SIX PORTFOLIOS, AND A “RISK-FREE” ASSET
Chart 6 above cannot be produced in Microfit; you do not need to replicate it. The chart presents
data from the mean & standard deviation rows of table 13 for the six portfolios. For example, the
risk & return for Nippon shares are shown in the returnN column of table 13. Chart 6 is in a form
37
similar to Brealey & Myers (1996: figure 8-5, p.177; but we would need to multiply all numbers
by 100, if we wished to use risk & return in percent as Brealey & Myers do). In chart 6, each of
the six portfolios is represented as a point; they are labeled according to the name of the share or
portfolio. Nippon (nearest to the top of the chart) has the highest return; the lowest risk is
portfol80, which is nearer the left-hand side of the chart than the other portfolios. I connected
these six points by a smooth curve; other portfolios of these two shares (such as one containing
50% Allied shares) would lie on this curve. The section at the top-left part of this curve (between
portfol80 and Nippon) is called the set of “efficient” portfolios; what does this mean? {17}
On chart 6, I added a point labeled rskFree (see section 8.1). Consider the continuous straight line
starting at rskFree, and touching the portfolios curve near portfol40: what is this line called? {18}
Of the six portfolios we studied in this section, portfol80 has the lowest risk. But chart 6 shows
that investors can do better than this: for example, the point on the security market line vertically
above portfol80 has a higher return than portfol80, but the same risk (investors can reach this
point by a portfolio of Allied & Nippon in the same proportions as portfol40, with some rskFree
stock). Portfolio portfol40 is a good combination of shares because it lies (approximately) on the
security market line – it has relatively high return but low risk. Many investors would prefer a
safer investment than portfol40, and would choose a point on the security market line closer to
rskFree.
8.3 ESTIMATING THE BETA OF AN ASSET
Consider the ‘beta’ of an asset: this compares the behaviour of one firm’s shares, with the
behaviour of the stock exchange as a whole. We need a measure of the rate of return for the entire
stock market. There are various possible data sources on groups of shares, such as the ‘FTSE-100’
index of the largest 100 firms on the London stock exchange (produced by the Financial Times
newspaper). But to save you typing in more data, we will construct our own (very limited) market
index from the four data series you typed earlier. Go to the Microfit Process window, and type the
following:
SET = ((ALLIED/1026.5) + (ZAMBIA/95) + (NIPPON/181.75) +(TREASUR/51.8))/4;
ENTER
RETURNS = (SET – SET(-1))/SET(-1); ENTER
We treat this new variable returnS as an approximation of the return for the whole stock market
(we should really use an efficient portfolio, and there is no reason to suppose Set is efficient).
Now look at the performance of Allied relative to Set. To calculate this ‘beta’, type (in the
Microfit Process window):
COR RETURNA RETURNS;
ENTER
When you click the Go button, Microfit should produce the results shown in table 14; and then
when you click the Close button, you should see the correlation matrix shown in table 15.
Table 14: MEAN & STANDARD DEVIATION
38
Table 15: CORRELATION
Pause for a moment: you now have enough information to calculate the beta of Allied (using a
calculator): can you do so? Microfit does not make it easy to work out beta, so I will explain the
method. Using formula (8.2) in the ‘Quantitative methods for financial management’ folder (unit 8:
p.5), the beta of stock X is
betaX = cov(X,S) / var (S)
[EQUATION 8]
where cov(X,S) is the covariance between returns on stock X and the average return for an
efficient portfolio, such as the entire stock market; and var (S) is the variance of the average rate
for the efficient portfolio. We are studying Allied shares, so we want betaA rather than betaX.
Microfit reports the correlation coefficient which I will write as cov(A,S) but we need the
covariance cov(A,S) so we need a little more work. We can caluculate the covariance from the
correlation, using the formula from Brealey & Myers (1996: p.158), which I present using the
notation used in the ‘Quantitative methods for financial management’ folder):
cov(A,S) = (cor(A,S)·(SD (A))·(SD(S))
[EQUATION 9]
where SD() indicates the standard deviation of a variable. Noting that var(S) is equal to SD(S)
squared, we can then substitute this calue of cov(A,S) into EQUATION 9, to give
betaA=(cor(A,S))·(SD(A))·(SD(S))/(SD(S))2
which simplifies to
betaA =(cor(A,S))·
(SD(A)
)/(SD(S)
)
[EQUATION 10]
Now substitute results from tables 14&15. The correlation coefficient cor(A,S) between Allied &
Set is 0.48224; and the standard deviations of Allied and Set are 0.018175 and 0.015435
respectively. Hence betaA =(048224)(0.018175)/(0.015435)=0.5678 approximately.
Now let’s try a different approach. Look at section 8 of the ‘Quantitative methods for financial
management’ folder (p.5): we can calculate the beta of stock X using equation 8.3:
rx-rf =βx(r m – r f)
where r indicates the return on an asset. In the above equation, βx is closely related to betax (as
explained below). What is r f in the above equation? Earlier, we calculated the return on various
assets: for example, returnA is the return on share Allied. However, the CAPM requires us to
39
study the “risk premium”, defined as the return on a share minus the “risk-free” return. Using the
“risk-free” return discussed in section 8.1, you can calculate the risk premium for each share, by
typing the following into the Microfit Process window:
PREMIUMA=RETURNA-0.0013941; ENTER
PREMIUMS=RETURNS-0.0013941; ENTER
Where 0.0013941 is the average return of the “risk-free” asset (see section 8.1). Now run an OLS
regression. Click the button labeled Single (near the top right of the screen), and type the
following regression specification:
PREMIUMA CONSTAN PREMIUMS; ENTER
And click the START button on the right of the screen. Your results should be the same as table
16 below.
Table 16: REGRESSION RESULTS
In the case, you can ignore the diagnostic statistics produced by Microfit. We are interested in the
coefficient of premiumS, which is 0.56786 (make sure you can find this coefficient in table 16).
This is very close to the value of beta we calculated earlier in this section, using the correlation
coefficient and standard deviations of returnA and returnS. Does this prove that the CAPM
theory is correct? No, it doesn’t prove anything (Berndt E.R., 1991, The practice of econometrics:
classic and contemporary, Addison-Wesley: Reading Mass., p.35). Perhaps by accident, the
economists who invented beta chose a formula which could be estimated by OLS regression, as
we have just done. But there is another way of calculation beta, using covariance and standard
deviations, which we did earlier in this section. To prove that these two are equivalent, start with
EQUATION8 (above):
beta x 
cov(X, S)
var (S)
substitute the covariance and variance the terms, by the formulas for covariance and variance:
beta X 
 ((X - Xbar)(Y - Ybar))
 ((X  Xbar )(X  Xbar ))
i
i
i
i
where Xbar represents the mean of X, and Ybar the mean of Y. Focus on the top lines: multiplying
the two inner brackets gives
40
Σ(Xi·Yi - Xbar·Yi - Xi·Ybar + Xbar·Ybar)
We can re-write this line as four separate summations:
Σ(Xi·Yi) –Σ(Xbar·Yi) – Σ(Xi·Ybar) + Σ(Xbar·Ybar)
The symbol Σ means «add this term for each value of I from 1 to n, where n is the number of
observations; so any term which does not vary with I can be moves outside the summation, to
give
Σ(Xi·Yi) - Xbar·Σ(Yi) - Ybar·Σ(Xi) + Xbar·Ybar·Σ(1)
By definition, Xbar=(ΣXi)/n so we can rewiteΣXi as n·Xbar; similarly for Y. Note also that the
Σ1=n. So we can rewrite the above line as
Σ(Xi·Yi) - Xbar·n·Ybar - Ybar·n·Xbar + Xbar·Ybar·n
The third and four the terms cancel each other, to give
Σ(Xi·Yi) - Xbar·n·Ybar
We can carry out the same process on the bottom line of EQUATION 11, to produce
Σ(Xi·Xi) - Xbar·n·Xbar
So, EQUATION 11 becomes
beta X 
 (X
 (X
i
 Yi )  n  Xbar  Ybar
i
 Yi )  n  Xbar  Xbar
The formula for betax is identical to the formula for βin OLS regression (equation 4.15a, p.8 in
unit 4 of the ‘Quantitative methods for financial management’ folder). The above mathematics
mean that the beta (calculated from covariance and variance) must be the same as the β
(calculated by OLS regression); so the fact that both methods gave the same value (about 0.5678)
does not “prove” that the CAPM theory is true.
Note one more complication. The latest regression used premiumA & premiumS to find βA ;
but to calculate betaA you used the covariance and variance of returnA & returnS (rather than
premiumA & premiumS). There is a difference of 0.0013941 between returnA and premiumA
(and the same difference between returnS and premiumS). This constant difference of 0.0013941
has no effect: subtracting any constant from returnA & returnS has no effect on the standard
deviations, or on the correlation between them.
It is desirable to have a clear idea of what beta means. The aim of beta is to assess if return on one
stock varies less, or more, than the stock market as a whole. So what does the Allied beta value of
0.5678 mean? The first point is that this beta is less than 1; this suggests that it is less risky than
most assets. Risk-averse investors would prefer Allied to riskier shares. Now calculate the beta of
the other three assets (Nippon, Zambia & Treasur), using the regression method, and check your
answers with those at the end of the book {19}. Also, use these beta vales to list the four assets in
order from lowest to highest risk (check your answers with {20})). Two assets (Zambia &
Nippon) have higher-than-average risks, and the other two have lower-than-average risk. Note
that the average of these four beta values is 1. we should always get an average of 1 if we look at a
large number of assets, ad compare each asset with a portfolio like Set which is an average of all
assets.
8.4 TESTING CAPM
In unit 8 of the ‘Quantitative methods for financial management’ folder (p.6), we read “the more
risky the stock is the higher are the returns required by investors”. The previous section found that
41
the four assets have different beta values, and hence different levels of risk. The CAPM theory
implies that there should be a strong link between risk & return: every share must lie on the
‘security market line’, and this security market line must slope upwards.
Chart 7: TEXTING THE CAPM
Remember that this booklet is only a training exercise; we cannot carry out a serious test of
CAPM, because that would require long-term data on prices (and dividends paid) for a large
number of shares. Because we have no data on dividends, we were forced to assume that the only
reason to hold stocks & shares is because their prices are expected to rise (whereas in reality, the
main reason for buying such assets is the dividends which you expect the firm or the Treasury will
pay).
All we can do here s to illustrate an approach we could use to test CAPM. We can draw a graph of
risk against return, and place the four assets on it: this is shown as chart 7 above (there is no need
for you to replicate it). I display the real Treasur stock, rather than the imaginary rskFree stock
(discussed in section 8.1). is the CAPM theory supported by our current data?Not really. One
obvious problem is that one share (Zambia) has a negative return. This is because of the short
time-span of this data (22days), and because we assume no dividend was paid during the period; if
we looked at a longer time span (say a few years), we could be confident between risk & return:
they seem to lie approximately on an upward-sloping line, which (CAPM suggests) we could call
the ‘security market line’, which could be taken as support for the CAPM theory. I suspect it is
just coincidence that three of these four assets lie approximately on an upward-sloping line: we are
unlikely to find clear support for CAPM in such a small sample-size note that the apparent
‘security market line’ would cut the vertical axis (i.e. the part of chart 7 where risk is zero) appears
to show a zero or negative return, which is not plausible.
8.5 FINAL COMMENTS
In section 8.3, we were able to estimate beta values of assets without using regression. But if you
test CAPM for almost any other theory in finance), you should expect to use regression. So do not
forget the comments of previous chapters in this booklet: that if there are departures from the
assumptions associated with the GaussMarkov theorem (see unit 5 of the ‘Quantitative methods
for financial management’ folder), then OLS regression is unreliable. In the case of CAPM, OLS
regression seems appropriate because of the definition of beta; but you might obtain very
unrealistic estimates in some datasets (it is not possible to explore such issues in this MSccourse).
In general, do not restrict yourself to OLS regression: in particular, the issue of autocorrelation
(discussed in unit 7 of the ‘Quantitative methods for financial management’ folder, and chapter 7
of this booklet) is a very important issue for time-series data, which are typical of datasets used in
financial management. The problem of autocorrelation in share prices is partly, but not entirely,
solved by calculating the return on shares (as we did in section 8.1): this is because the formula for
return on a share, which you used in section 8.1 of this booklet, is fairly similar to the first
difference of the share price.
Gook luck with your research.
42
43
Download