# Lecture 5 GLMs and Event Studies 23.10.23

```Trinity Business School
Econometrics
Lecture 5: Generalised Linear Models
and Event Studies
Lecturer: Niamh Wylie
Michaelmas Term 2023/24
Non-Linear Dependent Variables
Logit, Probit and Tobit Models
Discrete Choice Models
 Sometimes we need to work with models where the dependent variable
is not continuous.
 Where the dependent variable is a choice of two or more options from
a menu, the dependent variable is said to be discrete.
 In such an instance OLS will be inappropriate for estimations.
 Greene (2011) states that discrete choice models in general are
appropriate when the economic outcome to be modelled is a discrete
choice among a set of alternatives, rather than a continuous measure
of some activity.
 Set of alternatives is called the Choice Set and must be mutually
exclusive, (can select only one alternative), and exhaustive.
 Discrete choice variables are also known as limited dependant
variables.
Binary Models
 Many of the choices that firms, individuals and Governments make are
of an ‘either/or’ nature.
 Such choices can be represented by a binary variable that takes up the
value of 1 if one outcome is chosen and the value of 0 otherwise.
 When the dependent variable is a binary variable OLS cannot be used.
Assume an individual has to make a choice between two
alternatives. Let Uji be the utility that individual ‘i’ (i = 1, 2,..., N) gets
if alternative j (j = 0 or 1) is selected.
The individual will make choice 1 (j = 1) if U1i ≥ U0i and makes choice
0 otherwise.
The choice depends on the difference in utilities across the two
alternatives: Yi* = U1i - U0i
Binary Models
If the choice is to use public or private transport, then the
individual will choose whatever yields the highest utility to
them.
We know Yi* that will depend on the characteristics (Xi) of
each individual.
Yi* = α + βXi + εi
Yi* is unobservable but we can observe the choice made by
the individual.
Binary Models
In particular, we observe Yi = 1 if individual ‘i’ makes choice 1
and Yi = 0 if choice 0 is made.
The relationship between Yi* and Yi is:
Yi = 1 if Yi* ≥ 0,
Yi = 0 if Yi* &lt; 0.
This can be modelled using either a Probit or Logit model.
Limited Dependent Variable Models
 Dummy variables used as explanatory variables to numerically capture
qualitative effects. (gender, day of the week)
 When the explained variable is a qualitative effect, the qualitative
information is coded as a dummy.
 Discrete choice variable where the Y value may take certain discrete
integers.
 Binary choice variable where the Y value may take only 0 or 1.
Examples:
 Where firms choose to list their shares. (Nasdaq or NYSE)
 Stocks that pay dividends while others do not
 Factors affecting sovereign debt defaults
Linear Probability Model
 Simplest way of dealing with binary dependent variables.
 Probability of event occurring p is linearly related to a set or
explanatory variables
𝑃𝑖 = 𝑝(𝑦𝑖 = 1) = 𝛽𝟏 + 𝛽𝟐 𝒙𝟐𝒊 +𝛽𝟑 𝒙𝟑𝒊 + ⋯ +𝛽𝒌 𝒙𝒌𝒊 + 𝑢𝑖
𝑖 = 1,2, . . 𝑁
 Fitted values of the regression are the estimated probabilities for
𝑝(𝑦𝑖 = 1) for each observation i. Slope estimates are the change in
probability that 𝑦𝑖 = 1 for a unit change in the given explanatory
variable, holding all other 𝒙′s fixed.
Linear Probability Model
Eg. Model that firm 𝑖 will pay a dividend as a function of its market
capitalisation.
𝑷𝒊 = −0.3 + 0.012𝒙𝟐𝒊
 𝑷𝒊 denotes the estimated probability for firm 𝑖 Interpreted as;
For every \$1m increase in size, the probability that the firm pays a
dividend increases by 1.2%.
 A firm whose stock is valued at \$50m will have a -0.3 + .012 x 50 = .30 or
30% probability of making a dividend payment.
𝒚𝒊 = −0.3 + 0.012𝒙𝟐𝒊
Probability
of paying a
dividend
v
v
Market Cap
Simple to estimate &amp;
intuitive to interpret
But what is the problem??
For &lt; \$25m or &gt; \$109m to probability will be &lt; 0 or &gt; 1!
Linear Probability Model
Solution : Use truncation. Set =-.30 to 0 and 1.2% to 1, however..
 If truncation is used, too many observations of exactly zero or 1.
 Not plausible that firm’s probability of paying a dividend is exactly zero or
one. Can we be that certain?
 Econometrically, if the dependent variable can only take on a binary
value, then the disturbance term can only take on one of two values,
hence cannot plausibly assumed to be normally distributed.
 If 𝑦𝑖 = 1 , then 𝑈𝑖 = 1 − 𝛽𝟏 − 𝛽𝟐 𝒙𝟐𝒊 −𝛽𝟑 𝒙𝟑𝒊 − ⋯ −𝛽𝒌 𝒙𝒌𝒊
 But if 𝑦𝑖 = 0 then,𝑈𝑖 = − 𝛽𝟏 − 𝛽𝟐 𝒙𝟐𝒊 −𝛽𝟑 𝒙𝟑𝒊 − ⋯ −𝛽𝒌 𝒙𝒌𝒊
Disturbances are heteroscedastic and robust standard errors should always be used.
The Logit Model
 Overcomes the limitations of the Linear Probability Model by transforming
the regression model such that the fitted values are bound within a (0,1)
interval.
 The logistic function 𝑭 of a random variable 𝑧
𝐹(𝑧𝑖 ) =
1
1+𝑒 −𝑍𝑖
As 𝑧𝑖 → ∞, 𝑒 −𝑍𝑖 → 0,
1
1+𝑒 −𝑍𝑖
𝐴𝑠 𝑧𝑖 → −∞, 𝑒 −𝑍𝑖 → ∞,
→1
1
1+𝑒 −𝑍𝑖
→0
 Where 𝑒 is the exponential under the logit approach.
 𝑃𝑖 =
1
1+𝑒 −(𝛽𝟏 −𝛽𝟐 𝒙𝟐𝒊 −𝛽𝟑 𝒙𝟑𝒊 +⋯+𝛽𝒌 𝒙𝒌𝒊 +𝒖𝒊 )
where 𝑃𝑖 is the prob 𝑦𝑖 = 1
0 and 1 are asymptotes
to the function.
Probabilities will never be
exactly 0 or 1, but can come
Infinitessimally close.
Probability
Market Cap
Clearly this model is not linear, so not estimable by OLS, instead we use Maximum Likelihood
Maximum Log Likelihood
Further to least squares, there are two other broad approaches for parameter estimation:
Maximum Likelihood: Chooses a set of parameter values that are most likely to have produced the observed
produced the observed data.
First form a likelihood function (LF). This is a multiplicative function of the actual data, which will be difficult to
maximise w.r.t the parameters.
Therefore, the log is taken to turn the LF into an additive function, the Log Likelihood Function.
𝑛
𝑛
𝐹𝑥 (𝑥𝑖 )
𝑖=1
ln[𝐹𝑥 (𝑥𝑖 )]
𝑖=1
functions
Where x1, x2,…. xn isJoint
a setdensity
of i.i.d.
observations with individual probability density
functions F 𝑥 .
Maximum Log Likelihood
The value of the parameter 𝜃 that maximises the probability of observing the data is called a Maximum Likelihood
estimate:
Denote the maximised value of the LLF by unconstrained ML as L(𝜃)
At the unrestricted MLE, L(𝜃), the slope of the curve is zero.
All methods work by searching over
the parameter space until the
values of the parameters that
maximise the LLF are found.
Most software packages employ
an iterative technique
Drawback: In the case of GARCH
(non-linear models) the LLF can have
many local maxima.
Example: Pecking Order Hypothesis
 Theory suggests that corporations, when financing activities, should use
the cheapest methods of financing first and switch to more expensive
methods when the cheapest has been exhausted. (Myers,1984)
The Pecking Order Theory of Capital
Structure: Firms prefer to issue debt
rather than equity if internal finance is
insufficient
Internal
Funding
Debt
Equity
Example: Pecking Order Hypothesis
• Helwege &amp; Liang (1996) used a LOGIT model on new US firm listings in
1983, and tracked their additional funding decisions from 1984-1992.
• They determined the factors that affected the probability of raising
external financing.
• Dependent variable 1- firm raises external capital, 0 firm does not raise
external funds.
• Independent variables reflect degree of information asymmetry, and
degree of riskiness of the firm.
Findings:
 The probability of obtaining external funding does not depend on the size
of the firm’s cash deficit.
 The larger the firm’s surplus cash the less likely it is to seek external
financing, providing limited support for the Pecking Order Hypothesis
Probit Model
Similar to the Logit model, but instead of using the cumulative logistic
function to transform the model, the cumulative normal distribution for a
standard normally distributed random variable is used instead.
𝐹(𝑧𝑖 ) =
1
𝑧𝑖
√2𝜋
−∞
𝑒
−𝑧𝑖 2
2
𝑑𝑧
As with the logistic approach, this function provides a transformation to
ensure the fitted probabilities lie between 0 and 1.
As also with the logit model, the probit model is interpreted as a unit change
in 𝒙𝟑𝒊 will be given by 𝛽𝟑 𝑭(𝒛𝒊 ), where 𝛽𝟑 is the parameter attached to 𝒙𝟑𝒊
and
𝒛𝒊 = 𝛽𝟏 + 𝛽𝟐 𝒙𝟐𝒊 +𝛽𝟑 𝒙𝟑𝒊 + ⋯ +𝛽𝒌 𝒙𝒌𝒊 + 𝑢𝑖
Logit vs. Probit Model
• Both will give very similar characteristics of the data because the densities
are similar. Fitted regression plots will be virtually indistinguishable.
• The implied relationships between the explanatory variables and the
probabilities of 𝑦𝑖 = 1 will be very similar.
• Both models preferred to linear probability model.
• The only time when they might give marginally different results occurs
when the split between 𝑦𝑖 = 0 and 𝑦𝑖 = 1 are unbalanced.
• Traditionally logistic regression professed as integral calculus more time
consuming, but this is now solved by statistical software. (Stock &amp;
Watson, 2011)
Parameter Interpretation Logit Model
Non-linear therefore cannot be estimated by OLS.
Incorrect to say a 1-unit increase in 𝒙𝟐𝒊 causes a 100 x 𝛽𝟐 % increase in the
probability of the 𝑦𝑖 = 1 outcome. (Correct for linear only )
i.e. 𝑃𝑖 = 𝛽𝒊 + 𝛽𝟐 𝒙𝟐𝒊 + 𝒖𝒊
𝑃𝑖 = 𝐹(𝛽𝒊 + 𝛽𝟐 𝒙𝟐𝒊 + 𝒖𝒊 ) where 𝐹 represents the non-linear logistic function
To obtain the required relationship between changes in 𝒙𝟐𝒊 and 𝑃𝑖 , we
differentiate 𝐹 w.r.t. 𝒙𝟐𝒊 and this derivative is 𝐹(𝒙𝟐𝒊 ) (1 - 𝐹(𝒙𝟐𝒊 ) )
Therefore a 1 unit increase in 𝒙𝟐𝒊 causes a 𝛽𝟐 𝐹(𝒙𝟐𝒊 ) (1 - 𝐹(𝒙𝟐𝒊 ) ) increase in
probability.
Parameter Interpretation Logit Model
Take a numerical example;
𝑃𝑖 =
1
1 + 𝑒 −(𝟎.𝟏+𝟎.𝟑𝒙𝟐𝒊 −𝟎.𝟔𝒙𝟑𝒊 +𝟎.𝟗𝒙𝟒𝒊 )
𝛽1 = 0.1, 𝛽 2 = 0.3, 𝛽 3 = -0.6, 𝛽 4 = 0.9,
Calculate 𝐹(𝒛𝒊 ) from the means of the explanatory variables.
Suppose 𝑥2 = 1.6,
𝐹(𝒛𝒊 ) is given by;
1
𝑥3 = 0.2,
1
𝑥4 = 0.1, then the estimate of
i.e. P(𝑦𝑖 = 1 )
𝑃𝑖 = 1+𝑒 −(𝟎.𝟏+𝟎.𝟑(𝟏.𝟔)−𝟎.𝟔(𝟎.𝟐)+𝟎.𝟗(𝟎.𝟏) = 1+𝑒 −(𝟎.𝟓𝟓) = 0.63.
Therefore, a 1 unit increase in 𝒙𝟐 causes an increase in the probability that
the outcome 𝑦𝑖 = 1 will occur by 0.3 x 0.63 x 0.37 = 0.07.
𝒙𝟑 = -.14 and 𝒙𝟒 = 0.21
These estimates are called marginal effects.
A more intuitive interpretation is odds
ratios, see stata output!
Odds Ratios
1
′
How do we estimate 𝑝𝑖 =
−𝑍𝑖 as it is non-linear not only in the 𝒙 𝒔 but
1+𝑒
also in the parameters 𝛽s.
We can use a simple transformation to make the model linear by taking the
probability of 𝑝𝑖 / (1- 𝑝𝑖 ). This is the odds that 𝑦𝑖 = 1 over the odds that 𝑦𝑖 = 0.
i.e. the odds ratio.
We obtain =
𝑝𝑖
1−𝑝𝑖
=
1+𝑒 𝑍𝑖
1+𝑒 −𝑍𝑖
= 𝑒 −𝑍𝑖
𝑝
𝑳𝒊 = ln(1−𝑝𝑖 ) = 𝑍𝑖 = 𝛽𝟏 + 𝛽𝟐 𝒙𝟐𝒊 +𝛽𝟑 𝒙𝟑𝒊 + ⋯ +𝛽𝒌 𝒙𝒌𝒊 + 𝑢𝑖
𝑖
𝑯𝒆𝒏𝒄𝒆 𝑳𝒊 is the logit model (log of the odds ratio)
It is interesting that the linear probability model assumes 𝑝𝑖 is linearly related to 𝑥𝑖 ,
whereas the logit model assumes the log of the odds ratio is linearly related to 𝑥𝑖 .
This makes estimation of the regression parameters much simpler!
Goodness of Fit Measures:
• Standard RSS, 𝑅2 or adjusted 𝑅2 will cease to have any real meaning.
Fitted values can take on any value but actual values can only be 0 or 1.
• Maximum Likelihood maximises the value of the LLF, instead of
Two goodness of fit measures are used instead.
1: % of correct 𝑦𝑖 values predicted, = 100 x no. of observations correctly
predicted divided by the total no. of observations. The higher the number, the
better the fit of the model.
2; Pseudo 𝑅 2 = 1 – LLF / LLFo where LLF is the maximised value of the log
likelihood function and LLFo
is a restricted model with slope
parameters =0.
As model fit improves, LLF will become less negative and pseudo 𝑅2 will
rise.
Tobit Model
 In certain applications the dependent variable is continuous, but its
range may be constrained.
 The dependent variable might be zero for a substantial part of the
population but positive for the rest of the population.
 Examples include consumption of goods, and hours of work.
 OLS is not appropriate in this instance, instead an approach base don
maximum likelihood must be used.
 The Tobit model is used in such situations. (Tobin, 1958)
Tobit Model
Example: (Right censored or upper limit)
Model of demand for IPO shares as function of income (𝒙𝟐𝒊 ), age (𝒙𝟑𝒊 ),
education (𝒙𝟒𝒊 ), and region of residence (𝒙𝟓𝒊 ) where no. of shares per
individual is capped at 250.
𝑦𝑖 * = 𝛽𝟏 + 𝛽𝟐 𝒙𝟐𝒊 +𝛽𝟑 𝒙𝟑𝒊 + 𝛽𝟒 𝒙𝟒𝒊 +𝛽𝟓 𝒙𝟓𝒊 + 𝑢𝑖
For the Tobit model, the relationship between 𝑦𝑖 * and 𝑦𝑖 is:
𝑦𝑖 =
𝑦𝑖 ∗ 𝑓𝑜𝑟 𝑦𝑖 ∗&lt;250
250 𝑓𝑜𝑟 𝑦𝑖 ∗≥250
Example: (Left censored or lower limit)
Model of charitable donations made by individuals
For the Tobit model, the relationship between 𝑦𝑖 * and 𝑦𝑖 is:
𝑦𝑖 =
𝑦𝑖 ∗ 𝑓𝑜𝑟 𝑦𝑖 ∗&gt;0
0 𝑓𝑜𝑟 𝑦𝑖 ∗ ≤ 0
In essence, the
individual will
either donate
€Yi* or zero.
Multinomial Choice
 In Probit and Logit models the individual chooses between two
alternatives.
 Often faced with choices involving more than two alternatives.
 These are called Multinomial Choice situations.
 Previously with only two choices, we used one equation to capture the
probability that either one would be chosen.
 With three choices, we estimate two equations with the third choice
acting as a reference point.
 For m possible choices, we use m-1 equations.
Pecking Order Hypothesis Revisited
 Previous hypotheses considered the choice between external finance
or not. But about the type of financing – equity, public debt or private
debt.
 Instead of a binary logit model, a multinomial model is more
appropriate.
 Returning to Helwege and Liang’s study, estimate equations for equity
and public debt, and private debt becomes the reference point.
 For e.g.. A positive equity parameter means an increase in the
probability the firm would choose to issue equity over private debt.
 Results suggest firms in good financial health are more likely to issue
equities and bonds, rather than private debt.
 Riskier or venture backed firms and non-financial firms more likely to
issue equities or bonds.
 Larger firms more likely to issue bonds.
Ordered Choice Models
 The choice options seen in the Multinomial Logit Model have no natural
ordering.
 In some cases, choice are ordered in a particular way.
 Bond ratings (AAA, AA), student grades (A, B) and employee
performance (good, poor) are ordered in a hierarchy.
 When modelling these types of outcomes, numerical values are
assigned to the outcomes, but the numerical values are ordinal and
reflect only the ranking of the outcomes.
 As with all the earlier models examined, OLS is not suitable for such
data as it would treat the dependent variable as having some numerical
meaning when it does not.
Ordered Choice Models
The most common example in finance is credit ratings, where the is a
monotonic increase in the credit quality.
Denoted by ordinal numbers.
Model is set up such that the boundary value between each rating are
estimated along with the model parameters.
Poon (2003) investigated bias in unsolicited vs solicited ratings using an
Ordered Probit Model.
Pooled sample of the annual issuer list of the S&amp;P 500 from 1998 -2000.
295 firms across 15 countries, 595 observations.
Findings: Half of sample ratings were unsolicited with lower ratings on
average.
Financial characteristics of firms with unsolicited ratings were significantly
weaker than for those with requested ratings.
Count Data Models
Often the dependent variable in a model is a count of a number of
occurrences.
Given the following Count Data Model:
Yi = α + βXi + εi
Where:
 Yi = Number of visits to a GP by individual ‘i’ in a year and;
 Xi is a vector of explanatory variables.
We may be interested in explaining a probability such as the probability
that an individual will take two or more trips to GP in the year.
A Poisson Regression Model would be used to estimate this model.
Stata Codes
Logit Models
logit y x1 x2 x3
Heavier cars are less likely to be foreign
at the 1% significance level
Cars yielding better gas mileage are less likely
to be foreign at the 10% level
•
Stata Codes
•
•
Odds Ratios &gt; 1 correspond to positive
effects because they increase the odds.
Odds Ratios between 0-1 decrease the odds
Odds Ratio = 1 corresponds to no
association
Logit Models, using Odds Ratio for easier interpretation
•
•
The odds of the car being foreign are
predicted to shrink by a factor of .004 for
each unit increase in weight.
The odds of the car being foreign are
predicted to shrink by a factor of .155 for
each unit increase in mileage.
Stata Codes
Use Robust Standard Errors
Probit Models
probit y x1 x2 x3
Pr (foreign = 1) = N(𝛽𝟏 + 𝛽𝟐 𝒘𝒆𝒊𝒈𝒉𝒕 + 𝛽𝟑 𝑴𝑷𝑮) where N is
cumulative normal dist. function.
Stata Codes
Tobit Models
Create a censored variable , mileage ranges from 12-41, let’s only observe
mileage ratings &gt; 17.
Replace mpg=17 if mpg &lt;=17
Tobit mpg weight, ll
Stands for lower limit
ul = upper limit
ll (0) if lower limit = 0
Stata Codes
Poisson Models
Stands for incidence rate ratios
Smokers have 1.43 times the mortality rate
of non-smokers.
poisson deaths smokes i.agecat, exposure(pyears) irr
Simulation Methods
Monte Carlo Simulations and Bootstrapping
Monte Carlo Simulations
 Real data is messy
 Existence of fat tails, structural breaks and bi-directional causality.
 A simulation enables an econometrician to determine the effect of changing one factor leaving all others equal.
Offers complete flexibility.
 Simulation can be useful to
I.
Quantify the simultaneous equations bias by treating an endogenous variable as exogenous.
II.
Determining the critical values for a Dickey Fuller test
III. Determine the effect of heteroscedasticity upon the size and power of an autocorrelation test.
IV. Useful in finance, exotic options pricing, effects of macro changes on financial markets, stress testing
risk models.
Monte Carlo Simulations
 Monto Carlo simulation – central idea is random sampling from a given distribution and replicate N times.
1.
Generate the data according to desired Data Generating Process and decide distribution of errors.
2. Run the regression and calculate the test statistic
3. Save the test-statistic or parameter of interest
4. Repeat N times.
Variance Reduction Techniques
• Say an average value of a parameter 𝒙𝒊 is calculated for a set of 1000 replications, and another
researcher conducts an almost identical study, with different sets of random draws, a different
average value for 𝒙𝒊 is sure to result.
• The sampling variation in a Monte Carlo study is measure by the standard error estimate
• 𝑺𝒙 =
𝑽𝒂𝒓 𝒙
𝑵
where Var(x) is the variance of the estimates of interest over N replications
• To reduce the standard error by a factor of 10, the number of replications must by increased by 100
• To achieve acceptable accuracy, N may need to be set to infeasible high levels – an alternative way
is to use a variance reduction technique.
Antithetic Variates
• It may take many many repeated sets of sampling before the entire probability space is adequately
covered.
• What is really required is for successive replications to to cover different parts of the probability
space – to span the entire spectrum of possibilities.
• The antithetic variate technique involves taking a complement of a set of random numbers and
running a parallel simulation on those in order to reduce the sampling error.
• If the driving force of a set of T draws is 𝑼𝒕 for each replication, an additional replication with
errors −𝑼𝒕 can be used.
i.e purposely try to induce negative correlation amongst the variables, and this reduces the standard
error resulting in a smaller confidence interval.
•
Control Variates
 The application of control variates involves employing a similar variable to that used in the simulation 𝒙 but
whose properties are known prior to the stimulation 𝒚.
 The simulation is conducted on 𝒙 and also on 𝒚 with the same sets of random draws being employed in both
cases.
 Denoting simulation estimates 𝒙 𝒂𝒏𝒅 𝒚 a new estimate of 𝒙 can be derived
𝒙* = 𝒚 + (𝒙 - 𝒚 )
 It can be shown that the sampling error of 𝒙* is less than for 𝒙 provided a certain condition holds.
 Control variates help to reduce the Monte Carlo variation of particular sets of random draws, by instead using
the same draws on a problem where the solution is known.
Bootstrapping
 Bootstrapping is related to simulation but with a crucial difference.
 Instead of constructing the data artificially, bootstrapping involves sampling
repeatedly with replacement from the actual data.
 Bootstrapping is used to obtain a description of the estimator properties using the
sample points themselves.
 Advantages in allowing the researcher to make inferences without making strong
distribution assumptions.
 Applications in finance and econometrics have increased rapidly in recent years.
 Re-sampling the data
1. Generate a sample size T from the original data by sampling with replacement
2. Calculate β ∗ the coefficient from the sample
3. Repeat stage 1 with a different sample N times and a distribution of β ∗ will result
 Computationally expensive: No. of replications might be very large/complex.
 Results might not be precise, if unrealistic assumptions.
 Results hard to replicate. Results are usually specific to the investigation.
 Simulation results are experiment-specific, results only apply to exact type
of data. Solution is to run simulations using as many different and relevant
data generating processes as possible.
Stata Practice: Monte Carlo Simulations
Permutation tests determine the significance of the observed value of a test
statistic in light of rearranging the order (permuting) of the observed values
of a variable.
Let’s test the t-statistic, beta coefficient and standard error for regressing
webuse auto.dta
Let’s check the t-statistic
Stata Practice: Monte Carlo Simulations
permute foreign t=_b[foreign]/_se[foreign], rep(1000): reg headroom
foreign
Only 4 permutations had a lower t-statistic!
Stata Practice: Bootstrapping- ShuffleVar
ssc install shufflevar
No significance for the headroom randomly shuffled
Event Studies
Market Models and CAPM
Event Study
 Very useful in finance research and extremely common in the literature
 Gauge an effect of an identifiable event on a financial variable, usually stock returns. (dividends,
stock splits, listing/delisting on stockmarket)
 Simple to understand and conduct but approach must be rigorous.
 A multitude of approaches, but the main groundwork was established b y Ball and Brown (1968) and
Fama et al. (1969)
Basic Approach
 Define precisely the date/s on which the event occurred, and then gather the sample data
 For N events we define an ‘event window’ which is the period of time over which we investigate the
impact of the event. i.e 10 trading days before vs 10 trading days after. Long term could be months
or years.
 Data frequency is important. Daily data carry greater power than weekly/monthly (MacKinlay,
1997).
 Define the return 𝑹𝒊𝒕 for each firm i on each day t during the event window.
 To separate the impact of the event from other unrelated movements in process, the abnormal
return 𝑨𝑹𝒊𝒕 is calculated, by subtracting the expected return from the actual return.
𝑨𝑹𝒊𝒕 = 𝑹𝒊𝒕 - E(𝑹𝒊𝒕 )
Basic Approach – Event Study
 The expected return E(𝑹𝒊𝒕 ) can be calculated from a sample of data before the event. 100-300
days or 24-60 months (Armitage, 1995)
 Longer estimation windows can increase precision of estimation parameters but need to balance
against possibility of structural breaks in the data.
 If event window is short, 1-few days, expected return likely to be close to zero, could be acceptable
to use the actual return only.
 Usually there is a gap between the estimation period and event window to take account of
anticipation /leakage of the event.
 Simplest method for E(𝑹𝒊𝒕 ) is the mean return for each stock over the estimation window. Brown
and Warner(1980, 1985) found that historical return averages outperform many more complicated
approaches.
Market Model
 The most common approach for computing expected returns is the market model. Regress return of stock i
𝑹𝒊𝑡 = 𝜶𝒊 + 𝜷𝒊 𝑹𝒎𝑡 + 𝒖𝒊𝒕
The expected return for firm i on any day t = beta estimate x the market return on day t.
Should we include alpha? Yes, as per Fama et al. (1969) however need to exercise
caution. Problematic if too high or too low, may be preferable to assume alpha = 0.
FTSE All Share or S&amp;P 500 used as market proxy, could also add firm size or other
characteristics.
Alternative would be to set up a control portfolio with characteristics close to the event
firm - size, beta, industry, price to book ratios etc and use this as expected returns.
Armitage (1995) used Monte Carlo simulations to compare various models for event
studies.
Event Study Hypothesis
𝐇𝟎: Event has no effect on the stock price i.e. Abnormal return = 0
It is likely there will be variation in returns across the days within the event window,
therefore we may calculate the cumulative abnormal return over a multi-period
event window, say T1 to T2
𝑻𝟐
𝑪&Acirc;𝑹𝒊 (𝑻𝟏 , 𝑻𝟐 ) =
&Acirc;𝑹𝒊𝒕
𝒕=𝑻𝟏
Test statistic 𝑺𝑪&Acirc;𝑹𝒊 (𝑻𝟏 , 𝑻𝟐 ) =
𝑪&Acirc;𝑹𝒊 (𝑻𝟏 ,𝑻𝟐 )
𝝈𝟐 𝑪&Acirc;𝑹𝒊 (𝑻𝟏 ,𝑻𝟐 )
~ N(0,1)
Sum of daily variances
It is usual to examine a pre-event window (e.g. t-10 to t-1) and a post-event
window (e.g t+1 to t+10) with t as the event.
Event Study Complications
Cross-Sectional Dependence : Assumes returns are independent across firms.
Solution is not to aggregate firms and construct test statistics on an event-by-event
Changing Variances of Returns: Variance (Volatility) of returns will increase over the
event window, but measured variance is based on estimation window which is
incorrectly rejected too often. Solution is also to estimate variance of abnormal
window.
Weighting the Stocks: Approach above will not give equal weight to the stocks,
therefore standardise each individual firm’s abnormal return, or take un-weighted
standardised cumulative abnormal return.
𝑺𝑪&Acirc;𝑹𝒊 𝑻𝟏 , 𝑻𝟐
𝟏
=
𝑵
𝑵
𝑺𝑪&Acirc;𝑹𝒊 𝑻𝟏 , 𝑻𝟐
𝒊=𝟏
Long Event Windows: Can lead to large errors in the calculation of the abnormal
return and the impact of the event.
Event Study Stata
It is common to use Excel for an event study however Pacicco, Vena and Venogoni (2018) devised a method
using Stata.
The command estudy, performs an event study permitting the user to
i)
work with multiple varlists, computing the abnormal returns (ARs),
average abnormal returns (AARs), cumulative abnormal returns
(CARs), and cumulative average abnormal returns (CAARs);
ii) specify up to six event windows;
iii) customize the length of the estimation window;
iv) select the model for the calculation of normal or abnormal returns;
v) specify the diagnostic test, among the parametric and nonparametric
ones that are most commonly used in the literature;
vi) customize the output table and store the results in an Excel file and in
a Stata data file.
Event Study Stata
Let’s test the effects of the announcement of the Covid pandemic (15th Mar 2020).
estudy ln_SPX ln_Gold ln_Bitcoin (ln_Tesla ln_Apple ln_Microsoft), datevar(Date) evdate(03152020)
dateformat(MDY) modtype(HMM) lb1(-3) ub1(3) lb2(-2) ub2(2) lb3(-1) ub3(3) eswlb(-30) eswub(-5)
Event Study Stata
Now for the reaction to the announcement of the Covid vaccine (9th Nov 2020)
estudy ln_SPX ln_Gold ln_Bitcoin (ln_Tesla ln_Apple ln_Microsoft), datevar(Date) evdate(11092020)
dateformat(MDY) modtype(HMM) lb1(-3) ub1(3) lb2(-3) ub2(7) lb3(-3) ub3(10) lb4(-2) ub4(3) lb5(0) ub5(7) lb6(1) ub6(10) eswlb(-30) eswub(-5)
Event Study Stata
estudy ln_SPX ln_Gold ln_Bitcoin (ln_Tesla ln_Apple ln_Microsoft), datevar(Date) evdate(04142022)
dateformat(MDY) modtype(HMM) lb1(-3) ub1(3) lb2(-3) ub2(7) lb3(-3) ub3(10) lb4(-2) ub4(3) lb5(0) ub5(7) lb6(1) ub6(10) eswlb(-30) eswub(-5)
CAPM Testing
The CAPM states that the expected return on any stock i is equal to the risk-free rate of interest 𝑅𝒇 plus a risk
premium. [𝐸(𝑹𝒎 ) − 𝑅𝒇 ]. 𝜷𝒊 𝒊𝒔 𝒕𝒉𝒆 𝒓𝒊𝒔𝒌𝒊𝒏𝒆𝒔𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒔𝒕𝒐𝒄𝒌.
𝐸(𝑅𝑖 ) = 𝑅𝒇 + 𝜷𝒊 [𝐸(𝑹𝒎 ) − 𝑅𝒇 ]
Tests of the CAPM done in two steps: i) Estimate the stock betas ii) Test the model
CAPM is an equilibrium model, therefore we would not expect it to hold in every time period, but if it is a good
model, it should hold on average.
Stock market index : proxy for the market return
Yields of short-term treasury bills: Risk-free rate
CAPM Testing
Calculate the betas
To calculate beta, we can run a simple time series regression of the excess stock returns
on the excess market returns, and the slope estimate will be beta.
𝑅𝑖,𝑡 = 𝛼𝒊 + 𝛽𝑖 𝑅𝑚,𝑡 + 𝑢𝒊,𝒕
i=1,2,..N; t=1,2..T
N = total no. of stocks.
T = number of time series observations
𝛼𝒊 = ‘Jensen’s Alpha’ which measures how much the stock underperformed or
given the level of market risk.
CAPM Testing
Test the CAPM: N = 100, T = 60 (5 years monthly data)
1. Run 100 regressions (one for each stock)
2. Run single cross-sectional regression on the average stock returns
𝑹𝒊 = 𝝀𝟎 + 𝝀𝟏 𝛽𝑖 +𝑣𝒊 .
i=1,..N
Stage 2 involves actual returns not excess returns. Essentially CAPM says that stocks
with higher betas are riskier and should command higher average returns.
If the CAPM is a valid model, 𝝀𝟎 should be close to the risk-free rate, and 𝝀𝟏 close to the
Two other validations are: (i) Linear relationship between a stock’s return and its beta,
should explain the cross-sectional variation in returns.
Can test these by running the augmented model;
𝑹𝒊 = 𝝀𝟎 + 𝝀𝟏 𝛽𝑖 + 𝝀𝟐 𝛽𝑖 2 + 𝝀𝟑 𝜎𝑖 2 + 𝑣𝒊
𝜎𝑖 2 is the idiosyncratic risk and 𝛽𝑖 2 the squared beta for stock i. Both 𝝀𝟐 and
CAPM Test and Fama-French Methodology
Research indicates that the CAPM is not a complete model of stock returns.
It has been found that returns are systematically higher for small cap and value stocks.
Can test this using augmented model, (Fama &amp; French, 1992)
𝑹𝒊 = 𝝀𝟎 + 𝝀𝟏 𝛽𝑖 + 𝝀𝟐 𝑀𝑉𝑖 + 𝝀𝟑 𝐵𝑇𝑀𝑖 +𝑣𝒊
Where 𝑀𝑉𝑖 is the market value and 𝐵𝑇𝑀𝑖 is the price-to-book ratio of stock i.
The test for the CAPM to be supported is that both 𝝀𝟐 = 0, and 𝝀𝟑 = 0.
Three issues:
• Non-normality of returns in finite samples, need normality for valid hypothesis testing
• Likely to be heteroscedasticity in the returns. (recent CAPM testing has used GMM (Cochrane, 2005)
• Measurement error in beta (to minimise, prefer to base on portfolios than single stocks)
Extreme Value Theory
• Much of classical statistics is focused upon accurate estimation of the ‘average’ value of a series or average
relationship between two or more series, (OLS).
• Central Limit Theorem is all about the sampling distribution of the means .
• However, in many cases, it is the extreme or rare events that are of interest.
• Extreme Value Theory (EVT) was adopted in finance in the 1990s because of the realisation that asset returns
deviate systematically from normality, and assuming normality can lead to severe underestimates of large
price movements, and consequently severe losses.
• Example: Levine (2009) looked at monthly returns of medium maturity A rated corporate bonds, from Jan 1980
to Aug 2008. Lowest return was -10.84% and using an extreme value distribution, the probability of a return
less than or equal to this was 1.4%.
• However, under the normal distribution, the probability would be 𝟖𝒙𝟏𝟎−𝟕 , more than 16000 times smaller!
• Note: EVT should only be applied to the tails and not to the centre of the distribution.
Block Maximum Approach
Take a series y of total length T observations.
Separate the series into M blocks of data, each of length n, so that m x n = T.
Let the maximum value of the series in each block be 𝑴𝒌
The distribution of the normalised version of the maxima in each block converges asymptotically to a generalised
extreme value (GEV) distribution as m, n tend to infinity. (Fisher &amp; Tippett, 1928) ,Gnedenko (1943)
Drawbacks: How to fit data into blocks ?
If blocks are too long, no. of maxima small and lead to inaccurate estimation of parameters with high variance
(standard errors)
If blocks are too short, potential for maxima not to be extreme values, causing a bias in the parameter estimate.
Hence trade-off between bias and efficiency. Long blocks less bias, more inefficiency, Short blocks more bias,
less inefficiency.
Peaks Over Threshold Approach
• Preferred approach in empirical finance.
• An arbitrary threshold U is specified and any observed value exceeding this is defined
as extreme.
• As threshold U tends towards infinity , the distribution of a normalised series tends
towards the generalised Pareto distribution (GPD)
• The generalised extreme value (GEV) and generalised pareto distributions (GDP) are
closely related. In the limit in both cases, the GEV is the distribution of the
standardised maxima and GDP is the distribution of the standardised data over a
threshold, and the parameters from one approach can be estimated from another
and vice versa.
Value at Risk
Popular method for measuring financial risk.
 Estimates expected losses for a change in market prices.
 More formally, loss in monetary terms that is expected to occur over a predetermined time horizon with a pre-determined degree of confidence.
 i.e. A company states its one-day 99% VaR is \$10,000,000, meaning the company is
99% confidence the maximum loss on its portfolio of assets in a one-day period is
\$10,000,000.
 Simplicity of calculation
 Ease of Interpretation
 Can be suitably aggregated across a firm to produce a single figure to encompass the
risk positions of the firm as a whole.
Value at Risk
 The calculated VaR is used as a way to select the most appropriate minimum capital
risk requirement (MCRR) to ensure they hold enough liquid assets to cover the
expected losses should they arise.
Calculating VAR
1. Delta Normal Method: Assume losses are normally distributed.
 VaR = 𝜎𝑍𝛼
 Take the appropriate critical value at the alpha significance level. Eg. Alpha = .5,
 𝜎 = the standard deviation of the data.
 Multiply VaR by the value of the portfolio
2. Historical Simulation
 Sort the portfolio returns and select the appropriate percentile.
 Multiply by value if the portfolio for 95% and 99% respectively.
Value at Risk
Calculating VAR
3. Extreme Value Theory
Uses all the data points defined as being in the tail.
Use Peaks over Threshold approach to calculate VaR as a factor of the shape and scale
parameters from the sample data, and the ratio of total observations to observations
exceeding the threshold.
• It is possible to construct confidence intervals for VaR estimates arising from extreme
value distributions (Mc Neil 1998)
• Assumes data is I.I.D. If not, can estimate an ARMA-GARCH model and take the I.I.D
residuals, and estimate parameters of extreme value distribution on those.
• Can be extended to multivariate case using Copula functions.
Generalised Method of Moments
 Generalisation of conventional method of moments estimator which has widespread use in finance, for asset
pricing, interest rate models and market microstructure (Jaganathan, Skoulakis &amp; Wang, 2002).
 GMM can be applied to time series, cross section and panel data.
 OLS, GLS, instrumental variables, two-stage least squares, and ML are special cases of the GMM estimator.
special cases of the GMM estimator.
 Method of moments dates back to Pearson(1895) and works by computing the moments of the sample data
the moments of the sample data and setting them equal to the population values based on an assumed
values based on an assumed probability distribution.
 For a normal dist., we calculate the mean and variance.
 By the law of large numbers, the sample moments converge to their population counterparts asymptotically.
population counterparts asymptotically.
Method of Moments
For observed data (𝑦) and population mean 𝜇0 ,
E[𝑦𝑡 ] = 𝜇0 . The 1st moment
1
i.e. 𝑇
𝑇
𝑡=1 𝑦𝑡
- 𝜇0 → 0 as T → ∞
𝑦
E[(𝑦𝑡 −𝜇0 ) 2 ] = 𝜎 2 . The 2nd moment
The estimator 𝜎 2 can be obtained as
1
𝑇
𝑇
𝑡=1(𝑦𝑡
− 𝑦)2
E[(𝑦𝑡 −𝜇0 ) 4 ] = 3𝜎 2 . The 4th moment
The estimator 𝜎 2 can be obtained as
1
𝑇
𝑇
𝑡=1(𝑦𝑡
− 𝑦)4 /3
Generalised Method of Moments
But how to determine the best estimator 𝜎 2 ?
We have more moment equations than unknowns (𝜎 2 ) therefore the system is overidentified. There are multiple
solutions for 𝜎 2 and how to determine the best one?
A natural way to do this is to choose the the parameter estimates that minimise the variance of the moment
conditions.
Effectively, a weighting matrix gives higher weight to moment conditions with lower variance.
This necessity to choose a weighting matrix is a disadvantage of GMM.
The method of moments relies on the assumption that the explanatory variables are orthogonal to the
disturbances in the model.
E[𝑢𝑡 𝑥𝑡 ] = 0.
The method of moments is a consistent estimator but is sometimes not efficient.
Summary of Learnings
• Discrete Choice Models
• Logit and Probit Models
• Multinomial choice, Ordered choice and Poisson Models
• Monte Carlo Simulations
• Bootstrapping
• Event Studies
• CAPM Testing
• Extreme Value Theory
• Maximum Likelihood
• Generalised Method of Moments