by Emmanuel Farhi Three Essays in Macroeconomics

Three Essays in Macroeconomics
by
Emmanuel Farhi
Submitted to the Department of Economics
in partial fulfillment of the requirements for the degree of
MASSACHUS•-TS INB11TUT,
OFTECHNOLOGY
Doctor of Philosophy in Economics
SEP 2 5 2006
at the
LIBRARIES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2006
ARCHIVES
@ 2006 Emmanuel Farhi. All rights reserved.
The author hereby grants to Massachusetts Institute of Technology permission to
reproduce and
to distribute copies of this thesis document in whole or in part.
Signature of Author.
...
.
...
.
..
...
..
..
..
.
..
.
.
..
..
...
I
,-
.o ..o°...
..
.....
..
..
.
.
.
.
..
.
.
..
..
....
Department of Economics
-
.....
J....
Certified by.........
20 August 2006
.•.. . . ..... ................. ..........
.
Ricardo Caballero
Ford International Professor of Economics
Thesis Supervisor
Certified by .........
o
... ............................
......
Ivan Werning
Assistant Professor of Economics
Thesis Supervisor
Accepted by ...................
: ....................................................
Peter Temin
Chairman, Departmental Committee on Graduate Studies
Three Essays in Macroeconomics
by
Emmanuel Farhi
Submitted to the Department of Economics
on 20 August 2006, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy in Economics
Abstract
Chapter 1 analyzes the theoretical and quantitative implications of optimal fiscal policy in a
business cycle model with incomplete markets. I first consider the problem of a government
facing expenditure shocks in an economy where the only asset is a real risk-free bond. The
model features a representative-agent economy with proportional taxes on labor and capital.
Taxes on capital must be set one period in advance, reflecting inertia in tax codes. This rules
out replication of the complete markets allocation. In the model, capital taxation and capital
ownership provide a state contingent source of revenues - creating a new potential role for capital
taxation and ownership as risk sharing instruments between the government and private agents.
For a baseline case, I show that the optimal policy features a zero tax on capital. Numerical
simulations show that the baseline case provides an excellent benchmark: the average tax on
capital, while not theoretically zero, turns out to be small. The volatility of capital taxes
decreases sharply as the period length is increased. I then allow the government to hold a non
trivial position in capital. Capital ownership allows the government to realize about 90% of
the welfare gains from moving to complete markets. Large positions are typically required for
optimality. But smaller positions achieve substantial benefits. In a business-cycle simulation, I
show that a 15% short equity position achieves over 40% of the welfare gains from completing
markets.
Chapter 2 is the product of joint work with Ivan Werning and analyzes how estate taxes
should optimally be set.For an economy with altruistic parents facing productivity shocks, the
optimal estate taxation is progressive: fortunate parents should face lower net returns on their
inheritances. This progressivity reflects optimal mean reversion in consumption, which ensures
that a long-run steady state exists with bounded inequality-avoiding immiseration.
Chapter 3 is the product of joint work with Stavros Panageas. We study optimal consumption and portfolio choice in a framework where investors save for early retirement and assume
that agents can adjust their labor supply only through an irreversible choice of their retirement time. We obtain closed form solutions and analyze the joint behavior of retirement time,
portfolio choice, and consumption. Investing for early retirement tends to increase savings and
stock market exposure,
and reduce the marginal propensity to consume out of accumulated personal wealth. Contrary to common intuition, prior to retirement an investor might find it optimal to increase
the proportion of financial wealth held in stocks as she ages, even when she receives a constant
income stream and the investment opportunity set is also constant. This is particularly true
when the wealth of the investor increases rapidly due to strong stock market performance, as
was the case in the late 1990's. We also show that the model can potentially provide a rational
explanation for the paradoxical fact that some investors saving for retirement chose to increase
their allocation to stocks as the market was booming and reduce it thereafter.
Thesis Supervisor: Ricardo Caballero
Title: Ford International Professor of Economics
Thesis Supervisor: Ivan Werning
Title: Assistant Professor of Economics
Acknowledgments
I am deeply indebted to Ricardo Caballero for his constant guidance, support and advice. His
enthusiastic passion for research, his desire for understanding the world, his intelectual openness
and his willingness to chalenge existing theories have been an invaluable example throughout
my Ph.D. experience. Collaborating with him has been, and I hope, will continue to be, a
fascinating experience. I will never forget that on my first day back to MIT after a one year
leave in France, he was the first one to tell me, in his inimitable style: "welcome back, come to
my office".
I cannot overstate my debt to Ivan Werning. Ivan is one of the deepest minds I know
in economics. Above all, I think I learnt from him some great princinples about economic
research which will stay with me forever: always listen to the model, never settle for half-baked
intuitions, always reach for parsimony, and most importantly, always keep pushing. I learnt
an immense deal from him, without ever taking a class from him, through discussions and
arguments at a coffee place or at the blackboard. I cannot thank him enough for his generosity
with his time. This learning experience has evolved into a fruitful collaboration which has been
one of the great intelectual experiences of my life, and also, a warm friendship.
I owe a great deal to Marios Angeletos. Marios was the first one to introduce me to ChicagoMinnesotta macroeconomics. His engaging teaching got me started on research. His ability to
draw connections, his curiosity, his tenacity, his patience, his intelectual honnesty and his
passion for economics were great examples. Many times, he also took the time to seat and read
what I had written line by line. I cannot thank him enough for that.
I also want to thank Jean Tirole, who was the first to take my interest for Industrial
Organisation seriously. Jean's generosity is famous: He invited me many times to Toulouse.
Our common work will I hope, continue to grow in the future. His dedication to research is
forever an example.
I owe special thanks to Daron Acemoglu, Abhijit Banerjee, Olivier Blanchard, Mike Golosov
and Guido Lorenzoni for animated discussions, from which sprang many ideas. One of the great
things about MIT is this collegial atmosphere that makes you feel every professor is your advisor.
I will always remember this fondly.
I am also very grateful to Daniel Cohen who advised me during my undergraduate studies,
who first spurred my interest in economics, and who first encouraged me to pursue graduate
studies. His constant encouragement and support were invaluable.
Many other of my friends have contributed to this work. In particular, my roomates David
Abrams, Jeremy Tobacman. I also want to thank my officemates, Thomas Chaney, Shawn Cole,
Francesco Franceo, Andrei Levchenko and Veronica Guerrieri. Many thanks to my friends in
Cambridge, who made these years such a pleasant experience: Raphael Auer, Herman Benett,
Marius Hentea, Florent Massou, Jean-Jacques Palombo, Andre Puong, Leonide Saad.
This work would not have been possible without the support and love of my mother. Marion
stood by me through all these years. I will always remember it. I dedicate this thesis to my
father.
A mon pere,
Chapter 1
Capital Taxation and Ownership
When Markets are Incomplete 1
1.1
Introduction
In this paper, I explore the issue of capital taxation and government ownership when fiscal
expenditures and aggregate income are random, and the government can only trade a limited
number of assets. Markets are incomplete, and hedging of the government budget is limited
in scope. This creates a new potential role for capital taxation and ownership as risk sharing
instruments between the government and private agents. In the model, capital taxation and
capital ownership provide a state contingent source of revenues. For example, if the marginal
product of capital is positively correlated with adverse shocks to the government budget, positive
capital taxes, or a long position in the capital stock, might provide the government with a good
hedging instrument.
I approach these issues by taking a minimal step away from the complete markets case: I
consider the stochastic neoclassical growth model with homogenous consumers and a benevolent
government facing fiscal expenditure shocks. I allow the government to trade a risk free bond,
'My debt to my advisors Ricardo Caballero, George-Marios Angeletos and Ivan Werning cannot be overstated. For helpful discussions and insightful comments, I thank Daron Acemoglu,
Abhijit Banerjee, Olivier Blanchard, V.V Chari, Xavier Gabaix, Mike Golosov, Patrick Kehoe,
Narayana Kocherlakota, Guido Lorenzoni, Veronica Guerrieri, and seminar participants at the
Minneapolis Fed and MIT.
to levy fully state contingent linear taxes on labor, and to levy linear taxes on capital that
are not fully state contingent. Specifically, I assume that taxes on capital are set one period
in advance, at the time investment decisions are made. This assumption is meant to capture
inertia in fiscal policy. With fully state contingent taxes on capital prevents the government
would be able to replicate the complete markets allocation, as shown by Chari, Christiano and
Kehoe (1994). 2 With a lag in capital taxes, the complete markets allocation is not achievable.
An important insight in this paper is that when contemplating the hedging consequences of
a marginal increase in capital taxes, the government needs to take two effects into consideration.
First is the direct effect in the form of increased revenues in proportion to the marginal product
of capital. Second is an opposing indirect effect through the adjustment of capital: Lower
capital accumulation reduces the revenues from labor and capital taxes. The hedging benefits
of capital taxation depend only on the covariance of these two effects with the government's
need for funds across states of the world on a given date. An important benchmark is the
baseline case where the production function is Cobb Douglas and preferences are quasi-linear. I
show in this case that taxes on capital should be set to zero from the first period on. With these
specifications, the covariances of the direct and indirect effects with the government's need for
funds exactly cancel out, leaving no role for capital taxes.
For general preferences, I show that optimal taxes on capital can be decomposed into two
terms. The first "hedging" term - the only one present when preferences are quasi-linear reflects the hedging role of capital taxes, and would be zero if markets were complete. The
second "intertemporal" term corresponds to the motive for capital taxation when markets are
complete: It might be optimal for the government to distort capital accumulation to smooth
available resources in the economy across time. Echoing the result in the previous paragraph,
I show that the hedging term is zero when the production function is Cobb Douglas. Regarding the intertemporal term, the optimal policy prescribes a one time capital tax (respectively
subsidy) following a high (respectively low) government expenditure shock in order to reduce
the variability of the net present value of labor tax surpluses across states.
2
An intuition for this result can be given along the following lines: Since investment only reacts to average
taxes on capital, the government can vary capital taxes across states while keeping the average constant, leaving
investment in physical capital unaffected. This endows the government with enough degrees of freedom to
perfectly shift the tax burden across states and to replicate the complete markets outcome as long as long as it
can also trade a risk free bond.
Numerical simulations show that the baseline case provides a good benchmark: the hedging
term, while not theoretically zero, turns out to be negligible. Incomplete markets do not seem
to make a case for using capital taxation as a hedging instrument. The intertemporal term is
approximately zero mean. Its volatility decreases sharply with the period length.
In contrast to capital taxes, capital ownership may provide the government with a powerful
hedging instrument. The reason is that unlike taxing, trading does not introduce additional
distortions: The indirect effect on labor and capital tax bases arising with capital taxation is
absent for capital trading. Indeed, when preferences are quasi-linear and the only disturbance
in the economy takes the form of government expenditures shocks, I show that the government
can perfectly approximate the complete markets allocation by taking a very large position in
capital, counterbalanced by an equally large opposite position in the risk free bond.
Outside of this benchmark case, numerical simulations show that capital ownership allows
to realize about 90% of the welfare gains from moving to complete markets. Government
expenditure shocks tend to call for a long position, while productivity shocks typically require a
short position. In business cycle simulations, productivity shocks dwarf government expenditure
shocks as a source of variation in the government's budget, calling for a short position.
Optimality typically requires large position. The magnitude of the position decreases sharply
with the period length. Moreover, smaller positions allow to reap substantial benefits: In a
business cycle simulation with a five years period length, I show that a short position of 15%
of the capital stock achieves more than 40% of the gains from completing markets.
I then characterize the optimal holding of capital by the government in a more general
portfolio problem with additional assets. I derive the government's optimal liability structure
in a unified framework, that I term the GCAPM - Government Capital Asset Pricing Model
- and that resembles the CCAPM: Assets that covary with the government's need for funds
command a lower expected return. I am able to explicitly derive the pricing kernel of the
government. This can be used to price non-traded assets, and provides theoretical foundations
3
for capital budgeting rules in public, non traded companies.
Related Literature.
3
This is also potentially useful to compute the value of non-traded government liabilities, as for example
dispensing with future nuclear waste, honoring implicit social contracts etc.
An extensive literature on capital taxation with complete markets has emerged from the
celebrated zero long run capital tax result established by Chamley (1986) and Judd (1985).
This paper adds to this literature by studying the case of incomplete markets.
Chamely (1986) and Judd (1985) showed that in all steady states of the economy, taxes on
capital are optimally set to zero. Zhu (1992) proved a stochastic analog of the Chamley-Judd
result: In every stochastic steady state of the neoclassical growth model, depending on the
underlying parameters of the model, taxes on capital are either zero or take both signs with
positive probability. I prove that in my model, an analogous result holds for the intertemporal
term. Moreover, as Zhu (1992) and Chari, Christiano and Kehoe (1994) pointed out, if preferences are homogenous and separable between consumption and leisure, a stronger version of
the zero capital tax result applies: Taxes on capital should be zero from the second period on.
Echoing this result, I show that if preferences are quasi-linear, the intertemporal term is zero.
However outside of this case, homogenous preferences do not imply that the intertemporal term
should be zero in my model.
The paper also contributes to the literature on fiscal policy in incomplete markets pioneered
by Barro (1979). Barro considered a deterministic, partial equilibrium environment and associated an exogenous convex deadweight cost to taxes. Variations in the deadweight costs are
costly; a benevolent government should therefore seek to smooth taxes across time. Barro
showed that just like consumption smoothing problems, tax smoothing by the government imparts a random walk component to taxes and public debt. Most closely related to this paper is
Aiyagari, Marcet, Sargent and Seppala (2002) and, more recently, Werning (2005). They studied fiscal policy in general equilibrium under incomplete markets. They analyzed a version of
the no capital economy in Lucas and Stokey (1983) with only risk free debt. They showed that
Ramsey outcomes display features of Barro's model. In particular, labor taxes inherit a near
unit root component. I extend this line of research by introducing capital along with a more
general asset structure, and studying capital taxation in addition to labor taxation. I demonstrate that the results of Aiyagari, Marcet, Sargent and Seppala (2002) carry through when
capital is introduced: Labor taxes and government debt inherit a random walk component.
Finally, this paper is related to literature studying the optimal liability structure of the government under incomplete markets. The foundational paper is Bohn (1990). Bohn considered
a stochastic version of Barro's model with risk neutral consumers and an ad hoc convex cost
for distortionary taxes. The literature on the optimal portfolio of the government under incomplete markets has entirely focused Bohn's model, maintaining the assumption of risk neutrality
and adopting an ad hoc deadweight cost for taxes. My model provides microfoundations for
Bohn's findings, and shows that important caveats to his theory need to be introduced. I also
analyze explicitly the situation where consumers are risk averse. This goes beyond existing
contributions.
Angeletos (2002) and Buera and Nicolini (2004) assume that markets are incomplete: the
government can only trade risk free debt with multiple maturities. They show that generically,
the government can replicate the complete market allocation when enough maturities are traded,
and characterize the optimal maturity structure of government debt in such instances.
The central idea in this paper that non-state contingent taxes have important risk sharing
implications can be traced back to Stiglitz (1969)4. Stiglitz considered the partial equilibrium
problem of a consumer that must decide how to allocate a given amount of wealth between
a riskless asset and a risky asset. He examined how the government can affect the allocation
of wealth and welfare using a variety of linear taxes. My model can be seen as a general
equilibrium risk sharing model between the government and the agents. As in Stiglitz (1969),
linear taxes on assets - here capital - have hedging consequences. However, I show that taking
into account the distortionary consequences of linear taxes by analyzing the savings margin taken as inelastic in Stiglitz's analysis - is crucial.
On the methodological side, this paper builds on Werning (2005). Existing models either
adopt a Lagrangian approach - as for example Aiyagari, Marcet, Sargent and Seppala (2002)
- or develop a recursive representation by incorporating some multipliers in the state space
- an approach initiated by Marcet and Marimon (1998). Werning (2005) revisits Aiyagari,
Marcet, Sargent and Sepphla's model and develops a recursive representation with three state
variables: debt, expected marginal utility one period ahead, and the state of the Markov shock
process. I develop a parsimonious recursive representation of the Ramsey problem with only four
state variables, which are directly related to the allocation: the present value of government
liabilities, capital, past marginal utility and the state of the Markov shock process. In the
4I thank Abhijit Banerjee for pointing this analogy to me.
particular case where preferences are quasi-linear, I am able to reduce the state space to two
state variables: the present value of government liabilities and the state of the Markov shock
process. Developing a recursive approach using only variables directly linked to the allocation
allows a better understanding of the properties of Ramsey outcomes, is useful for developing
intuition, simplifies calculations, and in some cases, permits easier numerical simulations.
The rest of the paper is organized as follows. Section 2 introduces the economic environment,
sets up the Ramsey problem, and develops a recursive representation. Section 3 presents the
properties of debt and taxes in the quasi-linear preferences case. I analyze the general case
in Section 4. In Section 5, I study capital ownership by the government and characterize the
optimal liability structure of the government. All proofs omitted in the main text are contained
in the Appendix.
1.2
The Economy
The model is a neoclassical, stochastic production economy. The economy is populated by a
continuum of identical, infinite-lived individuals and a government.
Time is discrete, indexed by t e {0, 1, ...}. The exogenous stochastic disturbances in period
t are summarized by a discrete random variable st E S ={1, 2, ..., S} : the state at date t. I let
st{so, 81, ... , st} e St denote the history of events at date t. I assume that st follows a Markov
process with transition density P(s'ls) and initial distribution ro = P(.Is-1).
In each period t, the economy has two goods: a consumption-capital good and labor. Households have access to an identical constant returns to scale technology to transform capital kt-1
and labor It into output via the production function kt-
1
+ F(kt-1 , It, st) 5 . The production
function is smooth in (kt- 1 , It) and satisfies the standard Inada conditions. Notice that this
formulation incorporates a stochastic productivity shock. The output can be used for private
consumption ct, government consumption gt, and new capital kt. Throughout, I will take government consumption gt = g(st) to be exogenously specified. Therefore, the resource constraints
in the economy are
5This formulation allows for capital depreciation. The depreciation of capital is subsumed in the production
function F(kt, 1, l, st).
ct + gt + kt 5 F(kt-1, 1t, st) + kt-1,
Vt > 0 and st E St
(1.1)
Households rank consumption streams according to
00
EI E tu(ct, 1t, SO
(1.2)
t=0
where 0 e (0, 1) and u is smooth and concave in (ct, It), increasing in consumption, decreasing
in labor, and satisfies the standard Inada conditions. Note that this formulation incorporates
a stochastic preference shock.
The government raises all revenues through a tax on labor income ri and a tax on capital
income -tk. Except for taxes on capital r k , households and the government make decisions whose
time t components are functions of the history of shocks s t up up to t. By contrast, I assume
that taxes on capital are predetermined: the government makes decisions on rt one period in
advance. Hence Trt is a function of the history of shocks up to t - 1, s t- 1. The capital stock k0
is inelastic, hence providing a non distortionary source of revenues to the government. In order
to limit the amount of revenues the government can extract at no cost, I assume that the date
0 tax rate on capital rok is exogenously fixed.
The assumption that taxes on capital are set one period in advance deserves some discussion.
It is more efficient to postpone this discussion below after some definitions have been introduced.
Incomplete markets and debt limits. Households and the government borrow and lend
only in the form of risk-free one period bonds paying interest rt(st- l) in every state at date t.
The government budget and debt limit constraints are 6 :
st) + b,
(1 + rt)b-_l 5 -rTItF(kt-,It, st) + -kt-lFk(kt-1, It,
st),st),
st)bt 5 MK(kt,uc(ct, It,
M(kt,uc(ct, It, st),st) 5 uc(ct, It,
Vt > 0 and st e St
(1.3)
t > 0 and st e St
(1.4)
Here bt is the amount of government debt outstanding at date t. When (1.3) holds with
6
Note that in a specification with capital depreciation, this formulation supposes that capital depreciation is
deductible.
strict inequality, I let the difference between the right hand side and the left hand side be
a nonnegative level of lump sum transfers Tt to the households. The upper and lower debt
limits M(kt, uc(ct, It, st), st), M(kt, uc(ct, It, st), st) in (1.4) influence the optimal government
plan. In full generality, I allow the debt limits to depend on the capital stock of the economy
and the current marginal utility of consumption. I discuss alternative possible settings for
M(kt, uc(ct, It, st),st), M(kt, uc(ct, It, st),st) below. Note that I define debt and asset limits on
uc(ct, It, st)bt instead of b4. This is natural given my definition of debt: bt is the amount of debt
issued at the end of period t. The quantity uc(ct, It, st)bt is therefore just debt weighed by the
state price density.
The representative household operates a firm and supplies and hires labor at wage wt on a
competitive market. The household's problem is to choose stochastic processes {ct, it, Id , kt} to
maximize (1.2) subject to the sequence of budget constraints
1 tt+ (1_+r)1,St
Ct
+ b
1(-
wt
+
td
(F(kt-1 , I ,St) - wt) + kt-1 - kt + (1 + rt)bt
-
1
+ Tt (1.5)
taking wages, interest rates and taxes {wt,rt, 7t, rt }t>0 as given. Here b1 represents the
household's holding of government debt, ld the household's labor demand and It the household's
labor supply. The stochastic processes {ct, It, Id , kt}t>o must be measurable with respect to st .
d.
The labor market clears if It = l
The household also faces debt limits analogous to (1.4), which I assume are less stringent
than those faced by the government. Therefore, in equilibrium, the household's problem always
has an interior solution. The household's first order conditions that a stochastic {ct, It, Id , kt}t>o
require that two Euler equations hold, one for the risk free rate and the other for the net return
on capital, in addition to a labor-leisure arbitrage condition and a the condition that labor is
paid its marginal product.
1= (1+rt)Et 0Uc,t+1•,
Vt > 0and st ESt
(1.6)
uc,t J
1 = Et {
7'
= 1+
ct+
1 + (1
,
Vt
Ul',t
- T)
Fk,t+1]
> 0 andst
E St
,
Vt > 0 and st E St
(1.7)
(1.8)
Definition 1 Given bV , ro, ko, O and a stochastic process {st}t>o, a feasible allocation
is a stochastic process {ct, lt,
kt}t>o satisfying (1.1) whose time t elements are measurable with
respect to st. A risk free rate process {rt}t>o , a wage process {rt}t>o and a government
policy
is a set of stochastic processes sucht,that
poic{r, 'r, ~b}to
k
1 wt, -rt and bg are measurable with
respect to st,rTand rt are measurable with respect to st- 1
Definition 2 Given bV,
ro, ko,
r7k and a stochastic process {st}, a competitive equilibrium
is a feasible allocation, a risk free rate process, a wage process and a government policy that
solve the household's optimization problem, clear the labor market and satisfy the government
budget constraints (1.3) and (1.4)
Definition 3 The Ramsey problem is to maximize (1.2) over competitive equilibria. A
Ramsey outcome is a competitive equilibrium that attains the maximum.
Discussion of debt limits. By analogy with Aiyagari, Marcet, Sargent and Seppalt
(2002), henceforth AMSS, I shall study two kinds of debt limits, called natural and ad hoc.
Natural debt limits amount to imposing that debt be less than the maximum debt that could
be repaid almost surely under an optimal tax policy. Following AMSS, I call a debt or asset
limit ad hoc if it is more stringent than the natural one. In this model, natural debt limits,
which depend on the capital stock kt in the economy, are in general difficult to compute.
But as mentioned above, it is easy to see that they are of the form Mn(kt, uc(ct, It, st), st) <
uc(ct, It, st)bg <• -I(kt, uc(ct, It, st), st). Imposing that debt limits are weakly tighter than the
natural ones rules out Ponzi schemes.
Discussion of the measurability assumption for r-. It is well known since the work
of Chari, Christiano and Kehoe (1994) that with complete markets, taxes on capital are indeterminate: the government faces an embarrassment of riches, with too many instruments to
implement the Ramsey outcome. In fact, the complete markets optimum can be implemented
with complete markets and taxes on capital set one period in advance, or under incomplete
markets with only a risk free bond but fully adjustable taxes on capital. The reason for this
is that investment depends only on the average tax rate on capital, and not on the particular
way it is spread between states. When only a risk free bond is traded, the government can
take advantage of this and adjust taxes on capital to hedge its burden across states - thereby
replicating the complete markets outcome.
Inertia in fiscal policy - captured here by the assumption that taxes have to be set one
period in advance - restricts the state-contingency of capital taxes, prevents replication of
the complete markets allocation, and requires analyzing optimal taxes on capital in a truly
incomplete markets environment.
Because they are not the focus of this paper, taxes on labor, on the other hand, are left
fully adjustable. One may wonder whether this asymmetric treatment of labor and capital
taxes doesn't bias the results in favor of labor taxation.
This intuition is wrong.
In fact,
most of the insights are still valid if additional restrictions were put on labor taxes. Of course,
the exact results depend on the particular form of these restrictions. But for example, if the
production function is Cobb Douglas (with or without depreciation), the formulas for taxes on
capital (1.22) and (1.28) are still valid if taxes on labor are also restricted to be set one period
in advance. The reason is that in this case, capital can be factored out from the additional
restrictions imposed on the planing problem.
1.3
A Recursive Representation for the Ramsey Problem
The following lemma characterizes the restrictions that the government budget and behavior
of households place on competitive equilibrium feasible allocations, risk free rate processes and
government policies.
Lemma 4 A feasible allocation {ct, lt, kt}t>o, a risk free rate process {rt}t>o, a wage process
{wt}t>o, and a government policy {T1,1Tt, b }t>o constitute a competitive equilibrium if and only
if(1.1), (1.3), (1.4), (1.6), (1.7), (1.8) and (1.9), hold, with It
= ld.
I develop a recursive representation of the Ramsey problem from t = 1 on. This recursive
representation uses four state variables: the value of the capital stock k inherited from the
previous period, the value of government debt from the previous period b, the marginal utility
of consumption in the previous period 0 - uc(c_, l_, s ) and the shock that hit the economy
in the previous period, s_. It is easy to see that the planning problem is recursive in these four
state variables, and that the value function of the government satisfies the following Bellman
equation.
Bellman equation 1
V(k, b, , _) = max E{u(cs, I, s)+ #V(k',, bu,
uc(c, 1, s), s) •s }
subject to
(1 + r)E {fuc(c8, 1, s)Is- } = 0
E{
Iuc(cs,1s,s) [1 + (1-rk) Fk(k,l1S, s)]
8
(1+r)b+gs
Is_- = 0
au (cs, 1,, s)
,
1+
i,
, )
VseS
r'lSFIF(k, 1, s) +rkkFk(k,S1, s) +b',
cs +8 g + k' < F(k,18,, s) + k,
Vs eS
Vs E S
M(k', u,(c., I.,s), s) < uc(cs, 1,, s)b' < M(k', uc(c,, 15,
s), S),
Vs e S
The constraints on the problem are, in order of appearance: (i) that the risk free rate
satisfies the usual Euler equation; (ii) that the net return on capital satisfies the usual Euler
equation; (iii) that agents equalize their marginal rates of substitutions between leisure and
consumption to the net real wage; (iv) that the budget constraint of the government is satisfied
in each state s E S; (v) that the resource constraint holds in each state s E S and (vi) that the
amount of government debt issued in each state s E S satisfies the debt and asset limits.
The initial period must be treated in isolation since there marginal utility of consumption
in the previous period is not defined. Equivalently, one can think of the problem at date t = 0
as solving V(ko, bý, 0o, s-1) with the additional constraint that the date 0 tax on capital is
given by To and that O0 is such that the implied date 0 risk free rate is equal to ro. Hence it is
straightforward to obtain the entire solution for the Ramsey problem once the solution to the
Bellman equation above has been found.
It will prove convenient to replace b by a new state variable b = bO representing debt weighed
by the state price density. I can then define the corresponding value function V(k, b, 0, s) =
V(k, bO, 0, s_). In order to write the Bellman equation satisfied by V, I first rearrange the constraints. I use the first constraint to substitute r, the third to substitute
T.,
and I multiply the
fourth constraint by uc(c,, 1, s). To save on notation I write X8 for any function X(k, k', b, 1, s)
instate s.
Bellman equation 2
V(k,b, o,s_) = maxE us + I3(k', bs, uc, s)Is-
(1.10)
subject to
E uc, [1 + (1 - Tk) Fks]
b
ucS
ME {ueIs -}
} 0
+ gsuc,s • lFi,8 ue,s + lsu, 8 + rkkFk,suc,s + b',
cs + g+
k'
F + k,
Vs ES
M-(k',uc,8 ,s) - b5' - M(k', uc, 8,s),
111)
Vs E S
(1.12)
(1.13)
Vs ES
(1.14)
The presence of capital, capital taxes, and marginal utility make the constraint set in (1.10)
non-convex. This poses two kinds of problems.
First, first order conditions are necessary
but not sufficient for characterizing the solution. Second, it considerably complicates the task
of establishing the differentiability of the value function V - which is required to partially
characterize the solution by a set of necessary first order conditions.
All the properties of Ramsey outcomes I derive can be established using either a Lagrangian
approach, or expanding the Bellman equation (1.10) over two periods - hence bypassing this
technical difficulty, but at the cost of heavier notations and poorer intuition. I therefore proceed
assuming the value function V is differentiable in (k, b, 0), and refer the reader to the appendix
for an approach that does not rely on this assumption.
1.4
The Quasi-Linear Case
In the Ramsey problem, the government simultaneously chooses taxes and manipulates intertemporal prices. Manipulating prices substantially complicates the problem, especially with
incomplete markets. Here I simplify by adopting a specification of preferences that eliminates
the government's ability to manipulate prices. As in AMSS, this brings the model into the
form of a consumption smoothing model and allows me to adapt results for that model to the
Ramsey problem.
I assume that u(c, 1,s) = c + H(1,s), where H is a smooth, decreasing and concave function.
I assume H'(0) = oo in order for labor supply to be interior. Making preferences linear pins
down intertemporal prices. This allows a drastic simplification of (1.10), as two state variables
b (which is equal to b in this case) and s_ are now sufficient to describe the state of the
economy. Intuitively, the reasons for this simplification are twofold. The first reason is that 0
is now fixed and equal to 1 - hence 0 can be dropped as a state variable. The other reason is
that, intertemporal prices being entirely pinned down, I can perform a change of timing in the
recursive approach: the optimal investment in capital k can now be thought as being chosen
simultaneously with the the tax rate on capital rk - which allows to turn the state variable k
into a control variable.
Under this specification, natural debt limits are independent of the capital stock, and marginal utility is constant: Mn(k, uc,s) = M
n
Consistent with this
and -nM(k, uc, s) = -Vn.
property, in this setup, I consider only fixed debt limits M. and M,. After further simplifying the constraints by using the resource constraint to substitute cs, I derive a new Bellman
equation.
Bellman equation 3
V(b, s_) = maxE
F, + k
1-
- g8 + 3V(b's,s)Is}
(1.15)
subject to
IE 1{[1 + (1 - rk) Fk,a] IS- = 1
b+ g < IsFI,,S+ 1,HI,s +rkkFk,s +b,
M <
<'
M,
Vs E S
(1.16)
Vs ES
(1.17)
(1.18)
1.4.1
Stochastic properties of Ramsey outcomes
I attach a multiplier Mto (1.16), v, to (1.17) and v2,s and vl,, to the two constraints in (1.18).
I can then form a Lagrangian associated with the right hand side of (1.15)
L(6,s-)
=
E F9+k k
+{
1
) -g"+
0/(bjs)+p
[+
V5 1,Fis + 18Hi,+kkFk,s
7- + k +
98]
(
rk)
+ V2,s
Fk,s]] I18
]
,8
-b]
where 1, v, vi,, and v2,s are functions of b and s-.
The Envelope condition delivers
Vb(b, S ) = -E{jv Is}
(1.19)
and the first order condition for b6' gives
/V(S',, s) = -v, + u1,8 - v2,s
(1.20)
Combining these two equations, I find that, denoting by vs-, vl,,_ and v 2,,_ the corresponding
multipliers in the previous period, the following martingale equation hold
/-
= E{V• ,1s-}+v 1 ,,_ - V2,,
(1.21)
In the rest of the this section, I will often, with some abuse of notation, switch from recursive
notations to sequential notations.
Therefore, off debt-limits, v- = E {vIs -}. Using sequential notations, the process {uvt} is a
positive martingale. Equation (1.20) then shows that debt bt is a non-linear invariant function
of -V t + Vl,t - V2,t and st, and hence inherits a near random walk component. The policy
functions {1s, k, " k} associated with (1.15) show that for a Ramsey outcome, ct, It and r' are
invariant functions of bt- 1 , st-1 and st, while kt-
1
and Tk are invariant functions of bt-1 and
St-1.
Debt and taxes - on both labor and capital - therefore inherit a random walk component,
reflecting the desire to smooth distortionary taxes across states and time. This tax smoothing
s}
intuition is familiar in incomplete markets environments since the work of Barro (1979) and
AMSS. Note that under complete markets, a similar Bellman equation would hold, but vt would
be constant across time and states, and not a mere martingale. Therefore debt and taxes would
depend only on the current shock st affecting the economy as well as on st-i, and would hence
inherit the serial correlation properties of {st}, as in Lucas and Stokey (1983).
The fact that capital taxes, when set optimally, have a random walk component - and
hence are persistent - is new, and might come as a surprise when confronted with the results of
Chamley (1986) and Judd (1985). This reflects the fact that hedging needs of the government
depend on the level of government debt, which has a random walk component. When public
debt is low, the government is free to raise debt when confronted with an adverse shock : debt is
then a good shock absorber. By contrast, when public debt is close to the debt limit, the ability
of the government to shift the tax burden to the future is restricted. Hedging through capital
taxes is then more attractive. As the simulations below will show, rtk is an extremely non-linear
function of bt-1- in all simulations increasing and convex. This reflects the non-linearity in
government hedging needs, and is a contrast with labor taxes.
1.4.2
Taxes on capital
Manipulating the first order conditions, it is possible to derive a formula to characterize taxes
on capital
E{-(1-rk)kFkk, 5sI
S-
k
E{Fk,,s8-}
1 + E{FIaVa1-}
E{Fk, is-I
[
Cov {kFk,,v,lss- }_
E{kFk,,Is_}
The left hand side of (1.22) is increasing in
The first term
Tk.
(1.22)
The right hand side comprises three terms.
E{-(1-rk)kFkk,aI8•
1+ E{Fk,,s
81
Cov {kFkk, 8 , vs81}
E{kFkk,,Is}
(1.23)
+ E{FktaJd8-}
E{Fk,. 81-}
has as its numerator the inverse of the elasticity of capital k to taxes on capital rk. This inverse
elasticity factor is standard in the taxation literature. The higher the elasticity, the lower the
absolute value of the tax rate.
The second term
Coy {kFk,,vls
}
(1.24)s
E{kFk,s s-(
represents the direct effect of an increase in r k : it relaxes the budget constraint of the government (1.17) in state s in proportion to the tax base of Tk, kFk,s . The more kFk,s is correlated
with vu, the higher the optimal rk, as taxes on capital pay better in states where the budget
constraint of the government is more binding, i.e. where the need for funds is higher.
The third term
Cov{kFkk,s,Vs
{kf k l -}(1.25)
E{kFkk,,
(1.25)-
I-}
reflects the indirect effect of an increase in Tk. Increasing .k affects investment k and hence
the capital tax base kFk,s and the revenues from labor taxation 1,Fl,, + 1,HH,, in each state
s E S. The formula makes use of the constant returns to scale assumption to replace 1,Fkl,, by
-kFkk,s. How adverse these effects are depends on the correlation between kFkk,s and V,. The
higher the correlation, the bigger the effects in states where the need for funds is high, and
hence the lower T k
Hence, (1.22) brings together a standard inverse elasticity factor with two terms that have
an asset pricing feel, and reflect the government's desire to use taxes on capital set one period
in advance to hedge its need for funds across states. This illustrates that the government uses
capital taxes not to levy funds on average, but only to smooth its need for funds across states
- a stark difference with labor taxes.
In a complete market environment, (1.22) still holds, but v, is constant across states, so that
r k is equal to 0. This outcome is then a particular case of the classical uniform taxation result
by Atkinson and Stilgitz (1972), transposed to this Ramsey setup by Zhu (1992) and Chari,
Christiano and Kehoe (1994), which holds more generally for preferences which are CRRA and
separable between consumption and leisure.
As the following proposition shows, this zero tax result carries through in a particular case.
Proposition 5 If F is Cobb Douglas, Trk = 0 for all t > 1.
Therefore, for the strong Cobb Douglas benchmark, taxes on capital are 0 from period 1
on. In this case where the elasticity of substitution between capital and labor a is equal to 1,
the hedging benefits from the direct effect of a marginal increase in rk atT k = 0 are exactly
offset by the marginal hedging cost from the indirect effect through the capital tax base and
labor revenues.
Remark 6 Equation (1.22) and Proposition 2 are valid irrespective of the asset structure of
the economy. It holds for example if the government is required to balance its budget in every
period, if it is restricted to trade only a perpetuity etc.
In the general non Cobb Douglas case, the sign of Tk is ambiguous - it may be optimal to tax
or subsidize capital. The sign of rk will in general depend on the way productivity shocks and
preference shocks interact with government consumption shocks and on the particular functional
form for the production function. Under technical conditions, It is possible to characterize the
sign of rk in special cases.
Proposition 7 1) Assume that F is CES with elasticity of substitution a. Consider the case
where productivity shocks are Hicks neutral. The following holds: (i) if ao> 1, and if 1, > 1,
if and only if vs < vs,, then rk > 0; (ii) if a > 1, and if 1, < ls, if and only if vs < vs,, then
rk < 0; (iii) if a < 1, and if 1l > ls, if and only if vs < vs,, then Tk < 0; (iv) if a < 1, and if
1, < l,i if and only if vs < v,i, then r k > 0. 2) Assume that F(k, 1) = A(s)kal l -
a
- 6k: then
rk has the same sign as Cov {Fk,s, vs Is-}.
This proposition is partly unsatisfactory as it relies on an assumption on endogenous objects
1, and vs. It is natural, for example, to expect case (i) in the absence of productivity or preference
shocks: it is reasonable to expect that in this environment with no wealth effects, a higher need
for funds calls for higher taxes on labor resulting in lower labor supply. Non-convexities in
7
(1.15) considerably complicate the task of establishing how 18 co-varies with Vs.
Remark 8 Note that taxes on capital are zero when a = 1 and when a =
0o
: absent produc-
tivity shocks, the marginal product of capital is fixed if capital and labor are perfect substitutes,
and hence capital taxation provides no hedging benefits.
7
In the case where F is CES with only government expenditure shocks or only technological shocks, for
all the numerical simulations that I have performed, only cases (i) and (iii) ever occured. In the case where
F(k, 1) = A(s)k'l 1- a - 6k, with only technological shocks or only government expenditure shocks, for the
numerical simulations that I have performed, Cov {Fk 8 , v8 Is- }-was always negative.
1.4.3
Long run behavior
The long run behavior of Ramsey outcomes is similar to AMSS. I refer the reader to this paper
for an extensive discussion, and sketch the principal properties. Here the difference between
natural and ad hoc debt limits is marked.
Under natural asset limits, the multiplier v2,t is zero throughout. The natural asset limit
-M n is the amount of assets that allows the government to withstand any sequence of shocks
with zero taxes. It makes no sense for the government to accumulate more assets than -M
When favorable shocks cause government assets to grow beyond -M_,
n.
it is optimal for the
government to rebate consumers the difference via a lump sum rebate. Therefore (1.21) becomes
Vs- = E{vsIs-}+vi,,
so that the stochastic process {vt} is a nonnegative supermartingale. Therefore, the supermartingale convergence theorem (see L6eve (1977)) asserts that vt converges almost surely to
a nonnegative random variable. As in AMSS, there are two possibilities:
(i) If the Markov process {st}t>o is ergodic - so that in particular M n
=
M n and Mn =
n
do not depend on s for every s in the ergodic class of {st}t>o- then the lemma below shows
that under the condition that V is concave in b and that the policy functions in (1.15) are
continuous, vt converges almost surely to zero. In that case, taxes rt and r~ converge to the
first best levels r k = 0 and Tr = 0. The level of government assets converges to -M
n
sufficient
to finance the worst possible sequence of shocks forever from interest earnings.
(ii) If the Markov process {st}t>o has an absorbing state, then vt can converge to a strictly
positive value; vt converges when st enters the absorbing state. From then on, taxes and all
other variables in the model are constant. Taxes on capital are zero.
Lemma 9 Consider the case of natural debt and asset limits. Assume that the Markov process
{st}t>o is ergodic, that the value function V is continuously differentiable and concave in b, and
that the policy functions in (1.15) are continuous. Then vt converges to zero almost surely.
When the asset limit is more stringent than the natural one, convergence to the first best
can be ruled out. In this case, the lower debt limit occasionally binds. This puts a nonnegative
multiplier V2,t in (1.21), and {vt} ceases to be a supermartingale. This fundamentally alters
the limiting behavior of the model in the case where the Markov process {st}tto has a unique
invariant distribution. In particular, rather than converging almost surely, vt continues to
fluctuate randomly. Off debt limits, vt behaves like a martingale, and capital taxes do not
converge to 0.
In addition, if the range of the policy functions b' can be restricted to a compact set, one
can show that an invariant distribution for government debt exists.
1.5
The general case
The insights from the simple example examined in the previous section largely carry through to
the case where preferences are not risk neutral and separable. But the possibility to manipulate
intertemporal prices brings about the traditional motive for taxes on capital, which interacts
with the motive uncovered in the previous section.
1.5.1
Stochastic properties of Ramsey outcomes
Let us attach a multiplier 1L to (1.11), v, to (1.12), v2,, and vil,, to the two constraints in (1.14),
and 0 8 to (1.13). I can then form a Lagrangian associated with the right hand side of (1.10)
L(k, b, 0, s)
=
E u8 + IfV(k, jb', uc,8 , s) +
+E
{,s
[
E uc,s}
+E {,05 [F8 + k-c.
where
-
-
,
i+ (
-
k Fk,] _-
IS
gucS, + 1sF,s uc ,8 + lsul,s + TkkFk,Suc,s +
Ys - k'] + v 2,,
-M(k,
8Is -
u, ,, 8 S) + Vis [MI(k, Uc,, s)
p, us, Vl,s, V2,s and 0, are functions of k, b, 0 and s.
Using the Envelope condition for b, a martingale equation similar to (1.21) can be derived
S E{VUc, 818-} +vl,s_ - V2,8
IEIue,.ql-}
(1.26)
is a now a risk adjusted martingale, imparting, as in the
Off debt limits, the multiplier tus
simple example in the previous section, a unit root component to the solution of (1.10). The
-
Ib
s}
condition
Vb(k8, ,b, uCs,
8 s)) = -vs +
Vl,s -
V2 ,s
(1.27)
shows that the vector of endogenous state variables {kt- 1, bt, uc,t-1} inherits a near random
walk component. The policy functions {cs, 1, k, 7k} associated with (1.10) imply that for a
Ramsey outcome, kt, bt, ct, lt and
r• are invariant functions of kt-1, bt-1, uc,t-1, st-1 and st,
while . is invariant function of kt- 1 , bt, uc,t-1 and st-1. Therefore, kt, bt, ct, It, T" and -rkinherit
a unit root component.
As in the risk neutral case, debt and taxes display a random walk component, reflecting the
desire to smooth distortionary taxes across time, a sharp contrast with the complete markets
Ramsey outcome, where a similar Bellman equation holds, but with vt constant across time
and not a mere martingale.
Hence, the stochastic properties of Ramsey outcomes are similar to those discussed in the
quasi-linear example. The analysis is only made more difficult by the need to keep track of
additional state variables kt-
1.5.2
1
and uc,t-1.
Taxes on capital
A formula similar to (1.22) can be derived
rk
-
Th(k, b, 0, s-) + Ti(k, b, 0, s-) + Tb(k, b, 0, s)
(1.28)
To save on space, I report the formulas for Th, T i and Tb in the first part of the Appendix. It
should be noted that these are valid from t = 1 on. There are three motives for taxing capital,
corresponding to the three terms Th, T i and Tb on the right hand side of (1.28).8
The first term, Th is the "hedging" term and reflects the hedging motive discussed in
the previous section. Two differences with (1.22) should be emphasized. First, the formula
is adjusted for risk through uc,,. Second, the multiplier
8,
appears. In the quasi-linear case, 0, is equal to one.
When risk aversion is introduced,
on the resource constraint (1.13)
the stochastic process {0tot}t>o represents the intertemporal prices the government would be
8
The superscrpits "h", "i" and "b" stand respectively for "hedging", "intertemporal" and "bounds".
willing to pay for additional resources at different dates. The process
{O
t>
converts these
prices in consumption equivalent units.The presence of 0, is natural since taxes on capital affect
capital accumulation and hence resources available.
The second term T' is the "intertemporal" term and corresponds to the traditional motive
for capital taxation: The government can induce intertemporal resource transfers by affecting
capital accumulation through capital taxation to reduce the burden taxation. The formula calls
for subsidizing capital between t and t + 1 when resources are expected to be scarcer at t + 1
- - especially if the net
ot- is expected to be larger on average than Uc't
than at t - i.e. when Uct+1
marginal product of capital
1 + (1 - Tk+)
Fkt+1
or marginal utility uc,t+l is positively correlated with
Uc,t+l
This formula (1.28) for .k is valid under complete markets, with the only difference that
v, is constant. Under complete markets, Th is equal to zero, but not T i in general. The
well known case where taxes on capital are zero under complete markets is the case of CRRA
preferences, separable between consumption and leisure. Indeed, under complete markets,
S= 1+ /
Uc,8
so that if u(c, 1,s) =
c s u cc s
8
Ucs
Ud's
il
Ucs
+ H(1, s),
uc,8
is constant along the optimal path, and T' is equal to zero from period 2 on.
When markets are incomplete, '0
is not constant anymore at a Ramsey outcome when
preferences are CRRA and separable, so that T i is not equal to zero even in this particular
case - at the Ramsey outcome, the government would be willing to pay a different price - in
consumption equivalent units - for additional resources at different dates and in different states,
so that the traditional motive for capital taxation remains. In fact, as already mentioned, the
zero capital taxation result under CRRA and separable preferences is an application of the
uniform commodity taxation result of Atkinson and Stigitz (1972). This result relies crucially
on the assumption of complete markets.
The last term Tb imparts a role for relaxing debt limits to capital taxes. For example, if the
maximum debt limit is increasing in the capital stock in the economy, it is optimal to subsidize
capital when the limit is binding to relax this constraint and allow for more debt accumulation.
This term is zero if the imposed debt limits do not depend on capital. Note that this term
would have been present in (1.22) if I had allowed debt limits to depend on capital.
The case where the production function is Cobb Douglas still provides a useful benchmark.
Proposition 10 If F is Cobb Douglas, then Tth = 0. for t > 1.
As in the quasi-linear case, Th is zero as soon as the need for hedging disappears.
Remark 11 If the Markov process {st}t>o enters an absorbing state at to, then Th = 0 for
t > to + 1.
Remark 12 As in the quasi-linearcase, equation (1.28) and Proposition3 are valid irrespective
of the asset structure of the economy.
It is also possible to give a characterization of T' along the lines of Zhu (1992).
The stochastic process xt = {kt, bt, uc,t, st} is a stationary, ergodic, first order Markov
process: there exists a probability measure POO such that for all t, and measurable set A,
P"{xt E A} = P¶{A}
POO 3j--+00
lim Pr{xt+j E Alxt} = P¶{A}
=1
The policy functions in (1.10) are continuous.
P00{1 + (1 - Tk) Fk,t > 0} = 1.
Proposition 13 If assumptions 1, 2 and 3 hold, then one of the following holds: (i) PI {Tti = 0} =
1; (ii) P {t
17 > 0} > 0 and Po {Tt < 0} > 0.
This proposition shows, that the insights of Zhu (1992)
are still valid for the traditional
motive for capital taxation embodied in T i .At a stochastic steady state, T i cannot be always
positive or always negative.
1.6
Capital Ownership and the Structure of Government Liabilities
So far, I have restricted the government to trade only a risk free bond with consumers, whereas
consumers faced a non-trivial portfolio decision in allocating their savings between government
debt and capital. Preventing the government from trading on capital is without loss of generality under complete markets. But in environments with incomplete markets, this arbitrary
restriction regains bite. Allowing the government to trade on the capital gives the government
more ways to smooth taxes across time and states, and to hedge government expenditure shocks.
1.6.1
The Government Capital Asset Pricing Model
I now remove this restriction. To illustrate that physical capital is an asset among others that
the government can trade with consumers, I also introduce exogenous assets indexed by i E I,.
The subscript s allows for the possibility to let the investment opportunity set formed by the
exogenous assets to vary with the state variable s.
I now allow the government and consumers to trade three kinds of assets: (i) a risk free
bond; (ii) capital - an asset whose return in state s is 1 + (1 - rk) Fk,8 ; (iii) and #1,_ assets
in zero net supply indexed by i E Is_- whose return in state s when the shock in the previous
period was s_ is R' 8 •
Generically, if the number of traded assets is less than the number of shocks, markets are
truly incomplete and the complete markets Ramsey outcome is not attainable. I will maintain
this assumption throughout.
It is easy to see that the planning problem is still recursive with the same state variables
k, b, 0, s , where b is now the value of the government's net financial position vis-a-vis the
private sector. Denoting by xi government's holdings of asset i e L_ and by kg government's
holdings of capital, the government's value function satisfies a modified version of (1.10):
Bellman equation 2'
V(k, b, 0, s) = max E us +
1
5
uc,s, s) sb3V(ks,
1r,,
8
(1.29)
subject to
xi(R'1,F 8,,uu,,
(1.30)
Ef3ue,R'8is_-} = 0
(1.31)
)uc, + kg(1 +
-
1-
)ucS +
Fk,s -
k
8E {ue,8 }
E {ue,8 }
iEl,-
<
1+ 1- Tk Fk,s] 8- = 0
E{U 1,s
+ lsul,8 + 7kkFk,sUc,s + b1,
c, +g
+k
Ucs
gsuCs
O3E {ueS}
Vs E S
< Fs +k,
(1.32)
VseS
M(k' ,uc,S, s) < '2< (k', uc,s, s),
(1.33)
Vs ES
(1.34)
There are two differences between (1.10) and (1.29). First, there is now one Euler equation
for each exogenous asset (1.31). The second difference is in the budget constraint of the government (1.32) , where the total liability the government has to repay or refinance in state s is
now
xi(R'8- -,)
a,
i• -- _
OE {ucls _}I
+k,(1+
+
1-
Tk
Fk,s - _
s ) +b3 IE
1
-s,{ue,s
J{uels_}
}
The government therefore faces a non-trivial portfolio decision. It must decide not only the
level but also the composition of its liabilities.
As is clear from (1.29), introducing more assets only relaxes the constraints in the planning
problem. Therefore, as in the quasi-linear case, introducing more traded assets, or allowing the
government to trade capital, improves the Ramsey outcome.
Remark 14 Expanding the set of assets the government can trade with consumers improves
the Ramsey outcome
Here, more traded assets only give the government more flexibility to smooth distortionary
taxes. This result hinges crucially on the absence of consumer heterogeneity. In this case,
the introduction of additional traded assets would impose new constraints since heterogenous
agents will generally engage in trades between themselves - which could potentially render the
task of the planner more difficult.
Assuming an interior solution exists and that debt limits do not bind in state s_, the
following set of first order conditions characterize the optimal asset and liability structure of
8the government. For this, it is convenient to label the risk free rate R
E {ORS-uc0 ,8
[1+
1 (1 -
E
E
- 3E{uo, 8ls-}"
-1s
} = Ov_
(1.35)
-
k) Fks] c, 8 v 8 s} =
1E { R3'ucvsl
-
}=
(1.36)
(1.37)
These equations form the government counterpart of the standard CCAPM Euler equations:
I refer to them as the Government Capital Asset Pricing Model (GCAPM). Assets that pay
well in states of the world where government funds are scarce require a lower expected rate of
return, as can be seen for example from rewriting (1.37) in the following way
- Cov R"9-, ueC'e8
E{R',- |Is} = Rs- C E{Rucvs Is- I}
Equations (1.35), (1.36) and (1.37) hold together with the standard CCAPM Euler equations
E {fR5-u,s 8s-} = 0
E
{0
[1 + (1
-
.rk) Fks] ucS 8 -
=
o
IEOR"- uc,, s- I = 0
Equations (1.35), (1.36) and (1.37) show that {Jtuc,tvt}t>o can be thought of a pricing
kernel for the government. This pricing kernel applies to assets traded with consumers - which
is the case for all assets considered so far. It characterizes the price the government is willing to
pay for an asset trade with consumers. Note that the result that v8 is a risk adjusted martingale
is simply a restatement of (1.35).
If consumers are not on the other side of the trade, the pricing kernel {/tuc,t(vt + -O,)}t>o
is different and involves the shadow costs of resources ?t. Indeed, the price the government
would be willing to pay for a non traded investment project paying out X, in state s is
E{XSuC,S(Vi + UOA)IS}
VS- +_
It should also be emphasized that the results on capital taxation still hold when more traded
assets are introduced. In particular, taxes are still given by (1.28). The only difference is in the
hedging term: the elasticity of capital to the tax rate has to be replaced by the elasticity of the
part of the capital stock held by the private sector to the tax rate
{f-(k - kg)(1 - rk)Fkk,sUc,s 8-}
E{Fk,sUc,8 IsThe GCAPM, together with the CCAPM, provides a simple and powerful framework for
addressing the optimal liability structure of the government in various contexts. In ongoing
work, Farhi and Werning (2005) study the optimal maturity structure of government expenditures when the government is restricted to issue only risk-free debt of a limited number of
maturities.
1.6.2
The GCAPM in The Quasi-Linear Case
I consider here the case where preferences are quasi-linear. This stripped down setup will allow
me to derive stark results on the optimal structure of government liabilities.
As in the previous subsection, I allow the government and consumers to trade three kinds
of assets: (i) a risk free bond; (ii) capital - an asset whose return in state s is 1 + (1 - Tk) Fk,s;
(iii) and #], _ assets in zero net supply indexed by i E I _- whose return in state s when the
shock in the previous period was s_ is R~'" . The definitions of feasible allocations, competitive
equilibria and Ramsey outcomes can be straightforwardly extended to this new setup.
It is easy to see that the planning problem is still recursive with the same state variables b
and s_, where b is now the value of the government's net financial position vis-a-vis the private
sector. Denoting by xi government's holdings of asset i E I,
and by kg government's holdings
of capital, the government's value function satisfies a modified version of (1.15) analogous to
(1.32).
Bellman equation 3'
V(b,s)
= maxIE {F, + k
1-
subject to
]E
{'a [1 + (1
-
rk) Fks
E { R,"-s_} = 1
Z-i(R'"
1
)+kg(1+1 -
k
1
Fkk
-r)+
1
- g +
}
=
V('b, s)s_}
(1.38)
1
(1.39)
Vi E Is_
+gs < 18 F1 ,s+lsH,8+,rkkFk,s+6's,
(1.40)
Vs E S
iEKs_
(1.41)
M < <b
MH
Vs ES
(1.42)
In a particular case, allowing the government to trade physical capital actually allows its to
reach the complete markets Ramsey outcome.
Proposition 15 Absent productivity and preference shocks, the government can perfectly approximate the complete markets Ramsey outcome with a larger and larger long or short position
in capital.
The intuition for this proposition is that absent productivity and preference shocks, the
complete markets Ramsey outcome features constant labor supply across states. Hence the
return on physical capital is risk free: physical capital and the risk free bond are colinear
assets. By commanding small deviations from the constant level labor supply of the complete
markets allocation, the government can align the variations of the returns on capital with its
need for funds. By taking extreme positions in capital, compensated by opposite positions on
the risk free bond, the government can then leverage these variations and smooth perfectly its
need for funds across states.
The extreme positions required for replicating the complete markets allocation are reminiscent of the findings of Angeletos (2002) and Buera and Nicolini (2004). Both contributions
analyze how the government can use different maturities of risk free debt to implement complete markets Ramsey outcome. They find that generically, if the number of maturities is larger
than the number of shocks, then the complete markets allocation can be implemented, but that
typically, very large positions in the different maturities are required. It is also worth noting
that in this example with quasi-linear preferences, different maturities of risk free debt would
not permit the government to replicate the complete markets Ramsey outcome since the entire
term structure is entirely pinned down by preferences.
With productivity and preferences shocks, the government can typically no longer replicate
the complete markets Ramsey outcome if fewer assets than states of the world are traded. The
GCAPM takes the particularly simple form
E 1[+
(1 -rk) Fk,9 -R] V8 1s} = 0
E { [R', 9- - R] vls_} = 0
where R -
(1.43)
(1.44)
is the risk free rate.
Equations (1.43) and (1.44) can be compared to the standard CCAPM Euler equations
{ 1 + (1- rk Fk,s - RIs-} = 0
E{R'- - RIs -} =0
which characterize the optimal portfolio for consumers. The only difference between the two
sets of equations is that the marginal utility of consumption, 1, is replaced by v, - the multiplier
measuring the severity of the government budget constraint in state s. The stochastic process
{0t (1 + vt)}t>o can be interpreted as the pricing kernel of the government. A way to see
this is to introduce, instead of assuming an exogenous process for government expenditures,
a standard utility for government funds v(gt, st). In this case, the first order condition for
government expenditures is vg(gt, st) = 1 + Vut.
Note that the pricing kernel of the government is not trivial - i.e. has a positive variance despite the face that agents are risk neutral. The variation in the pricing kernel comes only from
the imperfect ability to smooth taxes across time under incomplete markets. Indeed, if markets
were complete, the pricing kernel of the government and the pricing kernel of the agents would
be colinear - in this instance, constant.
These equations can be compared to the results in Bohn (1990). Following Barro (1979),
Bohn considers an environment with incomplete markets, no capital and risk neutral consumers,
where the government must finance an exogenous stream of expenditures using distortionary
taxes. Taxes r are assumed to impose an ad hoc increasing convex deadweight cost h(r). Bohn
derives the following formula for the return of any traded asset R
E {[Rt - R] h'(7-t)} = 0
(1.45)
Hence (1.44) can be seen as a microfounded version of (1.45). Some important differences
are worth noting. In particular it is not generally true in my model that vt is a function of
Ir
and rt or even of tax revenues, as a perfect analogy with (1.45) would require. In fact, vt is
a function of the #S 2 + 1 variables st, st-1 and bt- 1 . Generically, ut is therefore not a function
of the #S + 1 functions -r and r k . In the specification with a utility v(gt, St) from government
expenditures, vt = vg(gt, st) - 1, so that vt is a function of gt and st - which can be expressed
as a function of st-1, St and bt-1. Importantly, the state st appears in this formula along with
government expenditures. Even if I were to assume that v(gt, st) is independent of st, this
discussion would suggest that a non linear function of government expenditures vg(gt) - 1 is
better suited for approximating the marginal cost of public funds that an increasing function of
taxes or tax revenues h'l(Tt) as in (1.45). This discussion is important, as tests and implications
of this theory rely crucially on sorting out correctly the time series properties of the pricing
kernel of the government.
I have focused so far on traded assets. Uncovering the pricing kernel of the government
allows me to determine the price the government would be willing to pay for non traded assets.
In particular, consider a marginal public investment project requiring an outlay I, and whose
payoff X, is not spanned by traded assets. Then the government should follow the following
capital budgeting rule: invest in the project if and only if
E{Xss-}
R-
>I
(1.46)
Coy {X,/1,1+Vs_}-
1+EJV.jIs}
where 1 + r = 4. This capital budgeting rule (1.46) shows that the government should
discount cash flows using the beta associated with its own pricing kernel, and not with that of
consumers
Coy {X8/I, 1+ V,ls}
1 + E{vss_}
Hence the government should attach a public risk premium to returns that covary negatively
with shocks affecting adversely its budget.
It is interesting to note that the asset holdings in the optimal portfolio of government
liabilities {x', kg,t} has a unit root component. Hence not only do the government's total
liabilities bt have a random walk component, but the composition of its liabilities display a
similar kind of persistence.
1.7
1.7.1
Numerical Simulations
Numerical method and parameter values
Numerical method. I approach the problem by solving the dynamic programming problems
(1.10), (1.29), (1.15) and (1.38), and then back out the optimal policies. In my calculations,
I restrict the state space to be rectangular and bounded. I check numerically that enlarging
the rectangle doesn't alter the results. The dynamic programming problem is then solved by
a value iteration algorithm with Howard acceleration. I approximate the value function with
cubic splines.
Calibration of the risk averse case. To permit comparability of my results to those in
Chari, Christiano and Kehoe (1994), I consider the same parameters and functional forms. I
assume that preferences are of the form
u(c,1) = (1- 7)log(c) + 7log(1 -1)
Technology is described by a production function
F(k, 1,z, t) = ka(exp(pt + i)1)1-a - Jk
This incorporates two kinds of labor augmenting technological change in the production function. The variable p captures deterministic growth in this technical change. The variable i is a
zero mean technological shock that follows a two-state Markov chain with mean 2 and autocorLet government expenditure be given by gt = G exp(pt + §) where G is a constant
relation Pz,.
and § follows a two-state Markov chain with mean # and autocorrelation pg. I take -y= 0.75,
1 = 0.98, a = 0.34, p = 0.016, G = 0.07, pg = 0.89, ag = 0.07, Pz = 0.81 and az = 0.04.
I impose fixed debt limits M = -0.2 GDfb and MI = GDP l b , where GDPfb is the mean
across states of first best level of GDP that would occur if the state were absorbing.
Notice that without technological shocks, the economy has a balanced growth path along
which consumption, capital and government spending grow at rate p and labor is constant. As
in Chari, Christiano and Kehoe (1994), it is straightforward to modify the model to allow for
exogenous growth.
The length of the accounting period where capital are held fixed by assumption is an important parameter. I analyze two series of simulations, one where the period length is one year,
and one where the period length is five years.
Calibration of the quasi-linear case. I also calibrate a model with quasi-linear preferences given by
u(c, 1) = c + y log(1 - 1)
To ensure that labor is stationary, I impose that growth is zero in this case. I also adjust Yto
0.5.
1.7.2
Results
Results in the quasi-linear case. Figure 1 displays several variables along a typical path,
in an economy with only government expenditure shocks and no capital ownership. The top
left panel displays the path of shocks, which take only two possible values, one and two, corresponding to low and high expenditures respectively. Note that debt, labor taxes, labor and
capital .all appear to be more persistent than the shock process, reflecting the fact that they
incorporate a random walk component. The bottom left panel represents taxes on capital.
Taxes on capital are negative - as predicted by Proposition 2- but are never larger than 10- 5.I
experimented with other specifications of the production function (CES) and never got a result
larger than 10- 4.Note that capital taxes also seem to incorporate some persistence, although
less than other variables in the economy. Capital taxes are relatively larger in absolute value
when debt approaches its upper limit.
Figures 2
(respectively 3) displays the policy functions when the previous government
expenditure shock was low (respectively high). On the horizontal axis of every graph is inherited
debt. On the vertical axis is the variable whose name indexes the graph. The solid blue
(respectively dashed green) lines correspond to the policy functions when the contemporaneous
shock is low (respectively high). The second graph on the top row is a zoom on the top left
graph representing the policy functions for debt. The thick red line is the 45 degree line. As is
apparent from this graph, a high shock is partially absorbed through an increased debt, and a
low shock through a decrease in debt. The bottom left graph displays the policy functions for
the multipliers. Both are increasing in the level of inherited debt. The multiplier associated with
the high shock is always larger than the multiplier associated with the low shock. Moreover,
the bottom right panel shows that the variance of v, across states for a given s is increasing in
the level of inherited debt: as inherited debt gets close to the upper debt limit, the government
is more limited in its ability to use debt as a shock absorber if the high shock hits. Note that
this variance is always minuscule here, reflecting the fact that with reasonable debt limits, risk
free debt is a very good shock absorber. Capital taxes are small for two reasons: first because
of the Cobb Douglas benchmark, second because the variance of vs is small. As government
debt approaches the upper limit, the variance of v, increases and with it the absolute value of
the tax rate.
Table 1 displays the optimal capital ownership level in the invariant distribution, depending
on the environment. I first discuss the results when the period length is set to one year.
With only government expenditure shocks, Proposition 5 shows that the government takes an
infinite long or short position in capital and replicates the complete markets allocation. With
only two possible productivity shocks, the government can also replicate the complete markets
allocation, with a finite - but large - short position of -295%
of kfb, .where is the mean across
states of first best level of capital that would prevail if the state were absorbing That a short
position is required is easily understood: the marginal product of capital correlates positively
with productivity shocks, and hence with government revenues. The magnitude of the position
results from the fact that capital is very colinear with risk free debt. Hence a big leveraged
position, short in capital, long in the risk free bond, is required to provide the government with
a state contingent source of revenues that matches the desired variations in the net present value
of government surpluses. With government expenditure shocks and productivity shocks, the
government cannot replicate the complete markets allocation anymore. The optimal government
capital ownership level is almost identical to the one that prevails with only technological shocks.
This reflects the fact that in this business cycle calibration, technological shocks are a bigger
source of variation in the government needs for funds than government expenditure shocks.
Increasing the period length to five years does not alter the result of Proposition 5. By
contrast, optimal government capital ownership in the case of productivity shocks, or both
productivity shocks and government expenditure shocks drops from -295%
to -59%.
The
reason is that as the period length is increased, shocks per period become less persistent getting closer and closer to i.i.d. Hence the variation of the net present value of government
surpluses that the government seeks to hedge becomes smaller - approximately 5 times smaller.
Hence a smaller position in capital is required.
In this model, the welfare gains from completing markets are small. In all simulations, I
compute them to be less than 0.01% of lifetime consumption. This confirms the finding of
AMSS for business cycle type calibrations. The size of welfare gains is well understood from
AMSS: they depend on the size and persistense of the shocks, the curvature in the uitlity
function, and the debt limits. The welfare gains are much larger in a war and peace exercise I
report below. Nevertheless, an interesting question is: How much of the gap between welfare
in the incomplete markets, no government ownership allocation and welfare in the complete
markets allocation can government ownership cover? In all simulations, I find this number to
be above 85%. Moreover, in the simulation with a 5 year period length and both government
and productivity shocks, I find that a short position of 15% of ifb allows to realize 47% - a
large, but more reasonable position- of these welfare gains.
I also calibrate a war and peace example, with a one year period length and only government
expenditure shocks. The parameters are the same as in the business cycle simulation, except
for ag which I take to be equal to 0.7 instead of 0.07. I compute the mean welfare gains to be
0.6% of lifetime consumption. The welfare gains reach 1.5% of lifetime consumption if debt is
close to the debt limit and the economy experienced a high government expenditure shock in
the previous period. With such large shocks, taxes on capital are larger in absolute value than
in the business-cycle simulations, but small - between 0 and -1% with probability 75%. Taxes
on capital become large - up to -20% - when debt is very close to the debt limit.
Results in the risk averse case. I first start with a limited experience: I start the
economy in period 0 with a given value of (bo, k0 , 0o, s-) with s_ = 1, corresponding to the low
government expenditure shock. In period 1, a permanent government expenditure shock hits
the economy. This shock can be high - correpsonding to s = 2 - or low - corresponding to
s = 1 - with probabilities 5% and 95% respectively. Capital ownership by the government is
disallowed. All the uncertainty in the economy is resolved in period 1. Figures 4 and 5 display
the impulse responses of several variables in the economy corresponding to the high and low
shock respectively.
First note that the hedging term is non-zero only in the first period, and is smaller than
10- 3 . Hence taxes on capital are dominated by the intertemporal term. Following a high shock,
taxes on capital spike at 150% and then fall back to 0. The interest rate drops to -2% and then
reverts almost immediately to 2%. This effect would be present even in an economy without
capital as in AMSS. Werning (2005) coins the term "interest rate manipulation".
Debt initially goes up to absorb the high government expenditure shock in period 1, and
then drops permanently to a level lower than b0 . From the government's budget constraint, it is
apparent that this is the mechanical result of the spike in capital taxation revenues and the low
interest rate that prevails between period 1 and period 2. The lower post period 2 debt level
helps reduce the burden of interest payments on the government's budget constraint. Capital
taxes therefore play a important role in absorbing the shock: they help reduce the debt burden
after a high shocks both by lowering the interest rate and by directly collecting revenues. Note
that in the case where the low shock hits, capital is subsidized between period 1 and period 2.
Capital taxes are therefore not used to raise revenues on average, but rather to help absorb the
variations in the net present value of government expenditures.
Consumption initially increases. This is just an intertemporal substitution effect as consumers face a low interest rate between period 1 and period 2. From period 2 on, consumption
is permanently lower. Labor and labor taxes also moves up permanently after period 2, hinting
at the random walk properties of these variables. The fact that labor and labor taxes increase
is just the result of a wealth effect, as consumption drops after period 2. Capital drops after the
shock. From the resource constraint, one can see that this drop in investment is the mirror image of the increase in consumption stemming from interest rate manipulation and the increase
in government expenditure.
Figures 6 and 7 display impulse responses for a similar exercise with respectively a low and
high productivity shock. The technological shock inperiod 1 can be low - correpsonding to
s = 2 - or high - corresponding to s = 1 - with probabilities 5% and 95% respectively.
Figure 8 displays a typical path for several variables in an economy with only government
expenditure shock and no capital ownership. After a transition from a low shock to a high
shock, the tax on capital spikes at about 200% and then reverts to a level of about 5%. That
taxes on capital are positive until a new low shock hits reflects the fact that resources are scarcer
today than tomorrow on average, given the possibility that a low shock occurs. Symmetrically,
after a transition from a high shock to a low shock, the tax on capital spikes at about -200%
and then reverts to a level of about -5%.
The volatility of capital taxes depends crucially on the period length. The reason can be
explained as follows. Ultimately, the welfare costs associated with capital taxes come through
the distortion of the path of consumption that they impose. The distortion associated with a
one time capital tax becomes larger as the period length is increased, because consumption is
distorted for a longer period. By contrast, the benefits in terms of reduced debt following a
high shock decrease as shocks become less persistent per period. In the continuous time limit,
the costs are zero, and the government is able to replicate the complete markets allocation with
infinite taxes or subsidizes during an infinitely small period following the shock.
Figure 9 displays a typical path for several variables in the economy when the period length
is set to five years. The positive and negative spikes in capital taxes are now 20% and -20%.
Table 2 summarizes the statistical properties of capital and labor taxes depending on the period
length. Standard deviations and autocorrelations are reported per period and not per year. The
standard deviation of labor taxes - 5.1% - is close to the number reported by Chari, Christiano
and Kehoe (1994) - 6%-for this case. Labor taxes are quite persistent, and more persistent
when the period length is 5 years: this is the consequence of less abrupt movements in capital
taxes and the capital stock when the period length is longer. Importantly, note that capital
taxes hardly display any persistence. This is due to the fact that the magnitudes of capital taxes
is essentially by a difference term
- --
, which tends to remove the unit root component
in
For an economy with only two government expenditure shocks and a one year period length,
the government can replicate the complete markets allocation with a capital ownership of about
2300% of kIfb. This position drops to 680% of -fb when the period length is extended to five
years. For an economy with only two productivity shocks and a one year period length, the
government can replicate the complete markets allocation with a capital ownership of about
-400% of ifb. This position drops to -157% of kfb when the period length is extended to five
years.The welfare gains from completing markets are small - 0.09% of lifetime consumption albeit larger than in the quasi-linear case.
1.8
Conclusion
I have characterized optimal capital taxation and government ownership when markets are
incomplete. In this context, capital taxation and ownership are two ways for the government to
collect state dependent revenues and hedge its burden from distortionary taxation across states.
Although I have focused on capital taxation, the insight that with incomplete markets and a
policy lag, taxes acquire a direct hedging role, and that the indirect hedging consequences of
the distortions they generate should be taken into account, is more general.
I have found that this hedging motive for capital taxation is always negligible: Capital
taxes come with distortions through the adjustment of capital. This affects the capital tax base
and labor tax revenues across states in a way that almost perfectly undoes the direct hedging
benefits. In a baseline case, capital taxes are exactly zero. Away from this benchmark, the
hedging component of capital taxes is always computed to be minuscule.
By contrast, capital ownership provides the government with a powerful hedging instrument.
The reason is that trading, unlike taxing, does not involve distorsions. In a baseline case, I
show that the government can perfectly approximate the complete markets allocation by taking
an infinitely long or short position in capital. Away from this benchmark, optimal positions
are large and decrease sharply when the period length is increased. Substantial benefits can be
reaped from smaller positions. Government expenditure shocks call for a long position while
productivity shocks push in the direction of a short position. In a business cycle calibration, I
show that productivity shocks is the leading force - resulting in an optimal long position.
Refining these propositions would require developing a more realistic model for investment
- incorporating adjustment costs and time to build - and asset valuation. It would also be interesting to move away from the representative agent framework I have analyzed. Unobservable
agent heterogeneity together with the government's concern for redistribution would provide an
endogenous reason for the use distorsionary taxes. Finally, the large capital positions called for
by the model put strain on the assumption of a benevolent government with full commitment.
In this light, incorporating relevant political economy constraints into Ramsey type models of
optimal taxation appears as a promissing research avenue. I leave these issues for future work.
1.9
Appendix
The three terms in (1.28)
The three terms in (1.28) are
rk = Th(k, 6,0, s)
+ Ti(k, 6,0,,s-) + Tb(k,b,, s-)
where
E{-k(1-rk )Fkk,aUc,aI&-}
T (k,
Cov {kFk,auc,s,Vs1s-}
E{kFk,8 uc,ss
E[Fkaa
0,
,s-)
Fs-
E{Fk[ss
E{Fksucs U
Is-}
E{FkEs
TF(k,b, ,s ) =-
Cov{kFkk,sUc,8,
8-I
E{FkusI8-}
E{Fk,,u,.s-}
pE{[Fka
IV2'8- M
MMk,
E{FksVsucs+
,.
-
-
l
k,s_
_
T b(k ,bj,0,s_ = re1F+ +u +..¢_=S
E.( F k f"l E a
}C,+
where
?s_
f k,1I3.}
is the multiplier on the resource constraint in the previous period.
Proof of Lemma 1
EkFkksUcs
,-
The consumer's problem is a convex program, and u and F are not satiated. Then the first
order conditions (1.6), (1.7) and (1.8) together with (1.5) holding with equality are necessary
and sufficient for an optimum in the consumer's problem. It is straightforward to see that (1.1)
and (1.3) imply that (1.5) holds with equality - a version of Walras law.
A derivation of (1.26) and (1.28) using a Lagrangian approach
The approach in the text relies on the assumption that the value function is differentiable.
An alternative to making that assumption is to approach the task of characterizing the Ramsey
allocation by composing a Lagrangian for the Ramsey problem. I keep notations as close as
possible to those in the text.
I attach stochastic processes {It Pr(st)pt,/t Pr(st)vt,1 t Pr(st)vl,t, ft Pr(st)V,2 ,t, fit Pr(st)Ot}t>o
to the constraints
}
Vt > 0 and st E St,
uC,t = Et {fIuC,t+ [ + (1 - Tt+) Fk,t+1] ,
bt-1
u c,t
ý,Et-i1{uc,t)
+ 9
tuC,t 5 ltFl,tuc,t + ltui,t + Tk kt-1Fk,tuc,t + bt,
M_(kt, uct, st) _ bt _ (k, uc,t st),
Vt > 0 and s t E St
Vt > 0 and st E St and
Vt > 0 and s t E St.
ct +9t <+ kt < Ft + kt-1,
Then the Lagrangian for the Ramsey problem can be represented as
Lb',ro,0ko,
k
00
E
Oftut
E- 1
t=O
E1
_tuet
A
+ At-1Uc,t
t=O
00
+I&-,fE Otitu
+IE- 1 >Efit [V2,t (6t -
1
+,t
~~~~~UC,t
gtuc,t + ltFi,tuc,t + ltui,l
0 ,t}
Tktktct+b
t-F~,t
+
-
~t
c~7s)
+ vl,8q (7J(kt uc
0 , ~st)
-
b)
t=0
OO
+Ex-1 E ft t (Ft + kt-1 - ct t=0
- kt)
where I have introduced for notational convenience the multiplier M-1, which I impose to
be zero: /M 1 = 0.
I am now in position to derive the first order conditions associated with this Lagrangian.
The first order condition for bt is
Et{ue,tut+l
}+
IEt {UctVt
V2,t - V1,t = 0
Et {IuC }
which proves equation (1.26).
The formula for the optimal tax on capital (1.28) can be derived by combining the first
order conditions for rk and kt-1.The first order condition for rk can be written as
(1.47)
-- tl1Et-1 {uc,tFk,t} + 1Et- 1 {uc,tkt-lFk,tvt} = 0
For t > 1,the first order condition for kt-1 is
0
=
~Lt_JEt-1 {Uc,t (1- rT) Fkk,t} + Et-i {3Vt (ltFkl,tUc,t +
-V2,t-1Mk,t-1 + Vl,t-lMk,t-1
+ Et-1 {/3)t(1 +
Fk,t)} -
Fktc,t + -kt-1Fkk,ttd1t)4
t-1
I use (1.47) to replace pt_ 1. I then use the constant return to scale assumption to replace
ItFkl,t by -kt-lFkk,t, to rearrange (1.47) as follows
0-
1 -Ik±
t-
{uE,tk-,Fk,t}j
1
k&_,
I
t-1
{-etFktl
+
0
{Uc,tFkk,t} + TEt-13VtFk,tUc,t}
(1 - Tk)Etl{Vtkt-1Fkk,tUc,t}
Fkttcj
+r •t-1 {14tFk,t} + Et 1 {-I3( 1 + (1 - rk)Fk,t)Uc,t
Uct
-V2,t-1M.k,tl-1+ V1,t-lM1k,t-1
which in turn can be rewritten as
Tt
-
= Th + Ti + Tb
-1
- Et_ 1 {I(1 + (1 - rk )Fk,t)Uc,t}
Uc,t-1
where
Et-1{f-kt-1
Tth -
Et-i{
r)FkktOct0
I I I
Et-1{Fk,tuc,t .±L }
t-
I
,tuet)
{Fk,tuc,t}
SEt-1{Fk,tVtuc,t}
t1E
[ Et-l
Et- 1{kt- Fkk,tUc,t, 1t}
Ut}
Et-1{kt-1Fktuct,
lEt-l{kt-1Fkk,tUc,t}
{kt-lFk,tuc,t}
I
{Fk,tUc,t}
O
Et•i I [1+(1-rk)Fk,t]Uc,(-t•-- t
t-t
{uC',t ,t1}
flEt-{uT,tFk,t
Et- {Fk,tuc,t Ot
Et-{Fk,tuc,t}
and
}
Et
-tl{Fk,tVtUc,t}
Et-l{fk,tu-uc,t
a d,t--1
v 1 ,t-3 Mk,t -1 - v 11'
~
-
OEt-1 {u,,t)
T bb
Et-{Fk,tuc,
Et-l
t
c
t
{fk,tue,t}
+
Et- 1{Fk,tVtUc,t}
Et-j {k,tUc,t}
This in turn shows that (1.28) holds.
Proof of Proposition 1
If F(k, 1, s)= A(s)kll 1l-, then kFkk,s = (ca - 1)Fk,s so that
Cov {kFkk,,, ,s18-}
E{kFkk,,s-
}
(a - 1)Cov {Fk,s, LsIs-}
(a - 1)E{Fk,Is
}-
Cov {kFk,s, vIs }Ef{kFk,,Is- }
By (1.28), this implies that rk = 0.
Since (1.28) applies from t = 1 on, this shows that
rTk = 0 for all t > 1.
Proof of Proposition 2
I first prove the following lemma
Lemma 16 Consider {xn,
Zn}l<n<N E RNN,
Zn,
and probability distribution {pn} <<N with
z, . The following holds:
pn > 0 for all n. Assume that there exists n and n' such that zn
Elyz} (ii asumetha
JEfxIynZIJ
Ey
<
(i) assume that xn < xn' if and only if zn > zn', then EXy
xn < xn' if and only if zn < zn', then
EXnYnZn}
Xnn}
E{YZ}"
EProof.Yn}
Proof.
E{xnynzn }
<E{ynzn}
E{xnyn}
IE{yn}
if and only if
IE{XnYnZn}E{Yn} < E{ynzn}E{XnYn}
if and only if
PnPn'xn Jn Zn'nI
<
PnPn'Yn znXn' Yn'
n,n'
n,n/
if and only if
0< ZPnPnYnznn•,[xn - x~n]
nfn,
if and only if
0<
PnPn'yUnyUn' [zn - znI, [Xn, - xn]
2non
which trivially proves claims (i) and (ii) in the lemma. s
Assume F is CES with elasticity of substitution o-with Hicks neutral technology shocks:
F(k, 1) = A(s) [ak
1
+ (1 - a)i-]
Then
A(s)k
Fk(k,)
=
[ak0- + (1 -
)
Fk (k,1)= aA (s)kfI-ak a + (1- a,)1
and
1
-kFkk(k,1)
-A 0
-
1
1
ak- + (1- a)1]
jak a+
0-1
-Ir
=
]
Y.
aA(s)k-W-
+1
so that
1
Fk(k, 1)
-kFkk(k,1)1-o=
1-aa\ll)Proposition 2 now follows. Consider for example case (i). Then
"
is a decreasing
function of uv. Lemma 3 shows that
Ej{-kFkk,s ,Vs 81-}
E{-kFkk,8 sI-}
E{kFk,,s s 8-}
E{kFk,,s}-
which proves that rk > 0. Cases (ii), (iii) and (iv) in Proposition 2 can be proved along the
same lines.
I then prove the second part of the proposition.
kFkk,s = (a-
If F(k, 1, s)
=
A(s)kall-a - 6k, then
1)(Fk,s + 6) so that
Cov {kFkk,s,
ss-}
_
Cov {kFk,s, vs I}
(a - 1) Cov {Fk,s, vsls-}
(a - 1)6 + (a - 1)E{Fk,sls_}
E{kFkk,s8s-}
E{kFk,js} + 6
Hence
Cov{kFkk,s, Vs8-}
E{kFkk,s4s }
Coy {kFk,s, vs
}
E{kFk,sls-}
if and only if Coy { kFk,s, vss_} > 0. This proves the second part of the proposition.
Proof of Lemma 2
It is clear that V is decreasing in b. Since V is differentiable, this is equivalent to Vb 5 0.
Since, 3Vb,t = -Vt + vi,t, this proves that vt - vl,t > 0.
Under natural debt limits, (1.21) becomes
we_ = E{v,|s_ }+vy,s_
which I can rewrite as
V8- -
',,s-= •IE{IV 8 - vi',,•S} + E{v1 ,,s
{ }t>l is a nonnegative supermartingale.
This proves that Vb,t
convergence theorem (see L6eve (1977)) asserts that
}
Therefore, the supermartingale
b,t converges almost surely to a finite
nonnegative random variable Vb,oo. Since V is continuously differentiable and concave, this
implies in turn that bt converges to a finite random variable boo. Since policy functions in (1.15)
are continuous, this implies that every point b in the support of boo is such that for every states
s and s_ in the unique ergodic set of {st}t>O, bP,(b, s-) = b. Clearly, this is only possible if
b
=
-M_.
This proves Lemma 2.
Proof of Proposition 3
If F(k, 1, s) = A(s)kll-a' , then kFkk,,
Coy {kFkk,.uc,8 , V s -}
IE{kFkk,8 uCc,.s_-}
=
(a - 1)Fk,s so that
(a - 1) Coy {Fk,8 Uc,, v, s }
(a - 1)E{Fk,uuc,8 s- }
Coy {kFk,suc,s, V81s-}
E{kFk,,u0 ,sIs_ }
Proof of Proposition 4
Note that T2 ,t > 0 if and only if
Et-
1
{[1 + (1-
Tk) Fk,t] U
It
t-1
Uc,t-1
k) Fk,t Uc,t
+ (1
Et- 1 {[1
( -- 1 k) Fkt]uc,t}
Zt- {1
T2 ,t < 0 if and only if
Et_- { [1 + (1 - rk) Fk,t] uc,t-}
Uc~c,t
-
2
< Vt-1
Et- 1 {[1 + (1 - rk) Fk,t] Uc,t}
Uc,t-1
and T2 ,t = 0 if and only if
lEt-
1
{[1+
(1- rk) Fk,t]
Et- 1 {[1 + (1 -
uc,tO}
rk) Fk,t] Uc,t}
t_ 1
Uc,t-1
Let me denote [1 + (1 - rk) Fk,t] uc,t by Kt and -Uc,t by ýt. Since policy functions in (1.10)
are continuous, Kt and (t are continuous functions K(xt) and (xt) of xt = {kt, bt, uc,t, st}. Call
ir(x', x) the transition function.
Let T be the operator mapping the space F of continuous functions of x into itself, defined
by
T(f)(x)
fff(x') K (x'),r(x', x)
f K(x')'7r(x',x)
Therefore, T2 (x) > 0 if and only if T(()(x) > ((x), T2 (x) < 0 if and only if T(()(x) < ((x)
and T2 (x) = 0 if and only if T(()(x) = ((x). Thus the sign of T2 is entirely determined by the
sign of T(ý) - (.
Suppose P"{T(()(x) • ý(x)} = 1. Let
S=
{x,
- sup{(, POc{((x) > (} = 1}. Define
lim Pr{xt e Aixo = x} = P¶{A}
At = {x,((x) Ž (t(x)}, Bt = {x,Pr{(t(x) Ž (} = 1}, A = nAtoAt and B = nt=oBt. Then
P¶{At} = P¶{Bt} = P°{F} = 1. Hence P¶{A n B n
x E An B
n F such that ((x') <
rF} =
1. For every e > 0, there exists
+ e. This implies that Pr{_ • (t(xe) <_ + e} = 1. Using
the ergodicity of xt, this implies that
P'{
•(x)-<5
<
+ e} = ~t--•
limoo-Pr{•
---±
-- <
t
+ e} = 1
(xe) <
Since this is true for all e > 0, this in turn implies that P"{((x) = _} = 1.
Similarly P"{T(()(x) > ((x)} = 1 implies P'{((x)= (} = 1 where (
inf{(, P1{((x) <
}= 1}. This proves Proposition 4.
Proof of Proposition 5
Consider the complete markets allocation {ke, Ic, b} in state (b, s ). Labor l1 and net government liabilities in the end of the period bP are constant across states of the world s. Define
b + gs - b'c.
Consider next the incomplete markets environment. Denote by R(k, 1s) - 1sFi,s + lsHi,s
the revenues from labor taxation in state s. Set Tk = 0 and solve, for a given kg, the following
system in {k, Is }.
E{R(k,1)jss}
(Fk (k, 1s)
- r)+
=
l)
kg
E{X 8Is_}
XS
kg'
(1.49)
Vs ES
(1.50)
Note that if (1.50) holds and Jkg| < oc, then (1.49) is equivalent to E{Fk(k,l s)|s} = r. If
(1.50) holds and Ikkg = oc, then E{Fk(k,ls) s_} = r holds automatically.
Denote by {k(kg),1 s(kg)} the solution: together with b'c,s(kg)
b' for all s, the variables
{k(kg), l.(kg),b'c,s} satisfy the constraints in (1.15). Then
lim {k(kg),ls(kg)} =
kg -- ý0oo
lim {k(kg),ls(kg)} = {k, Ic}
kg ---
00oo
Therefore, by choosing kg large and picking {k(kg),1s(kg),b'c,s}, the government can approximate as well as desired the complete markets allocation in state (b, s-). By doing this in
every state and date, the government can therefore perfectly approximate the complete markets
allocation. This proves proposition 5.
1.10
Figures and Tables
Table 1: Optimal capital ownership (as a fraction of k) in the quasi-linear model
Government shocks
Productivity shocks
Government and productivity
shocks
Period length = 1 year
infinity
-295%
-295%
Period length = 5 years
infinity
-59%
-59%
Table 2: Summary statistics in the general model
Period length = 1 year
Period length = 5 years
26.2%
26.9%
per period std of labor taxes (%of mean)
5.1%
5.2%
per period autocorrelation of labor taxes
0.62
0.94
-4.8%
1.2%
0.72
0.10
-0.06
0.03
mean of labor taxes
mean of capital taxes
per period std of capital taxes
per year autocorrelation of capital taxes
I~JhL
2 -
18
~
0.3
DA
02
-01
ol/41"
I 0W
0
100
',, ^ui "
200
3W
400
500
600
'
700
800
7o
B
900
100l
k
05$
161
0.554
3.52
0,55
35
0
0.55
2030
0
00
0051
70
80
00
100
0
20
3
O
. 10
M
t
O54
0
i1
-4.3
0
100
200
300
400
500
600
700
800
900
1000
Figure 1: A typical path, quasi-linear preferences, government expenditure shocks.
Ipol
Opol
7
0.32
Ipal
0.315
0.50-0.05'
0.54
0.31
0.305
0.031
0.300 0.31
• .,
V
0.315 0.32
t' Po.
0.16 --
.5
14.051
0.14
-0.51
-10
14
0.12
-1.5
0.1
-to
13.9
1
.5
0
0.5
1
1.0
.5
1
1.5
. ......... .......
..
-
0
1395L01
.13.15 -0
0.5
0.0
tkP
x 10'
- -
0
0
.15
05.
1
1.5
Figure 2: Policy functions, quasi-linear preferences, government expenditure shocks, s_ low.
b pal
1.5
-
kpol
______
pal
0.57
______
3.5~
0.56
3.45
0.55
\
05
3.4~
0.54
335
-0.5
-1
0
1
2
0
1
2
0.53
3.3
-1
0
1
2
052
-1
0
1
2
14.1
14.05
14
1395
13.9
13.85
-1
mean of v
=
10
var ofly
0.25
0.2
0.15
0.1
0.05
-1
0
1
2
Figure 3: Policy functions, quasi-linear preferences, goverment expenditure shocks, s
high.
4
hrT
X10
1.51.5
2
I'4
151
055
5-6
0
0
5
10
15
2
25
30
35
40
5
10
15
2
25
30
35
5
40
10
15
2
b
35
40
25
230
35
40
30
35
40
k
o;
27
1.415!
5
10
1.41
a016
]014
0I27
15 2
25 30
35
40
1.405
5
10
15
2
25 30
35
5
40
10
15
oL2
38 ....
.0234
...
.........
. ............................
]I
c
L
..
uc
L2341
L•3
0.22
1.0
1.07
0L232
...
.
2
I
.
OL221
1.06
0L23
a22
5
10
15
20
25 30
35
40
5
10
s
15
2
25 30
35
-,~~~
-
1o10
23
30
10
15
i
40
225
srlus
0o.
0
5
40
pifrry
sulIs
1.s
0i 'i
....
............
.........
.......
......
..2.....
~ . .........
'....................
5
10
15
-
--
25 30
35
40
....
......
...........
...................
. ................
...........
.....................
0+
5
10
15
2
253 30
r
0.04
A
.......................
0.02
0.02
0
3
oi
0L275]
O
25
10
20
30
40
Figure 4: Impulse responses, high permanent government expenditure shock
35
40
4
x10
42
42
-2
-Q4
44
-4,
46
46
-61
5
5
10
10
15 •n 25
15 2D 25
30
30
35
35
40
40
oL262
5
10
15
20
25 30
35
L0256
OL254
5
012
1.418
0.19
1.417
0.18'
10
15
20
25 30
5
35
10
15
2D
25
30
35
40
5
10
15
20
25
30
35
40
25
30
35
25
30
35
1.419 \
0121
032
0.25
5
40
10
15
2D
25
3D 35
40
1.416
C
I El•k
-
..
1.055
1.05
1.06
M223
OZ
OL222D
1.045
5
10
15
20
25 30
35
40
10
5
x 10
10
15
20
25
30
35
40
5
10
15
2D
suOus
pdwTy sISOhs
0
-5
2
-l0
-15
0
0
0.02
0
10
10
30
/I
20
r
30
30
...
....
........
...
'.........
...............
.........
....
10
2D
30
40
40
5 10 15
2D25
30 35 40
5
10
15
20
40
Figure 5: Impulse responses, low permanent government expenditure shock
40
x103
2
2
0
-2
-4
1,
-61
0o1...
0
.
5
10
15
2D
25
30
35
40
-8!
5
10
15
20
25
30
35
5
40
10
15
b
20
25 30
35
40
k
0.18
0.28
0.16
0.27
014
0.26
1.415
1.41
1.405
1.395
0.12
10
5
15
20
c
25
30 35
5
40
0238
0236
10
15
20
25
30
35
40
1.0
10
15
20
25 30
35
40
5
10
15
2D
25 30
35
40
30 35
40
022
1.08
0234
0.2328
0.23
5
0.218
1.07
0.216
1.05
5
10
15
20
25
30 35
40
5
10
15
20
25
30
35
40
sLIOL5.
mpny sous
I
2
0.06F
0.04
0.04
0.02
0.021
1.5
o
0
0 05
1o
10
20
2D
30
30
40
40
10
20
30
40
5
10
15
20
25 30
35
40
5
10
-
0
-005
0
Figure 6: Impulse responses, low permanent technology shock
15
20
25
Te
x106
0
0
-2
II
-42
0
44
42
-44
-6 1
.5'
5
10
15
20
25
30
35
5
40
10
15
20
25
30
5
35 40
10
15 20
b
30
35
40
k
3
1.42
0.2
2.5
0.26
25
1.42
0.25
O_245,
,,
5 10 153 2
3
5
5
40
18L
-
5 10 15 2
3035 40
1.415
5
10
15 20
25
30
35
40
10
15
20
25
30
35
40
uc
c
Q229
0.247
L238
5
10
15
20
25
3D
35
S5
40
s
surus
pdriwysOTus
x 10
2
.......
...........
, ........,.......
. ... .. . ........ ,...
0
1
.. ........
.. .............
-01
-10
04
0
10
0.01
0
30
40
30
40
5
10
15
20
25
30
35
40
5
10
15 30
r
004
0.02
20
~- .
10
20
Figure 7: Impulse responses, high permanent technology shock
25
30
35
40
2
f7
I
2
0
1.5
100
2
10O0 200
1
0
x 10
-4
30
300
400
0
500
0
00
10O0 200
0
300
3
400
4
500
0
100
200
300
400
500
0
5
tkhedge
100
200
300
400
500
100
200
300
400
500
0
100
200
300
400
500
0
100
200
300
400
500
1.2
1.18
1.16
1.14
1.12
0
Figure 8: A typical path: government expenditure shocks, one year period length
0
20
40
60
80
100
khedge
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
Figure 9: A typical path: government expenditure shocks, five years period length
Bibliography
[1] Aiyagari, S.R, Marcet, A., Sargent, T.J, and Juha Seppala (2002): "Optimal Taxation
without State Contingent Debt", Journal of Political Economy, 110, 1220-1254.
[2] Angeletos, George-Marios (2002): "Fiscal Policy with Non-Contingent Debt and the Optimal Maturity Structure", Quarterly Journal of Economics, 117, 1105-1131.
[3] Atkeson, A., Chari, V.V., and Patrick J. Kehoe (1999): "Taxing Capital Income: A Bad
Idea", Federal Reserve Bank of Minneapolis Quarterly Review, 23, 3-17..
[4] Atkinson, A.B and Joseph E. Stiglitz (1972): "The Structure of Indirect Taxation and
Economic Efficiency", Journal of Public Economics, 1, 97-119.
[5] Barro, Robert J. (1979): "On the Determination of Public Debt", Journal of Political
Economy, 87, 940-971.
[6] Bohn, Henning (1990): "Tax Smoothing with Financial Instruments", American Economic
Review, 80, 1217-1230.
[7] Buera, F. and Juan P. Nicolini (2004): "Optimal Maturity Structure of Government Debt
without State Contingent Bonds", Journal of Monetary Economics, 51, 531-554.
[8] Chamley, Christophe (1986): "Optimal Taxation of Capital Income in General Equilibrium
with Infinite Lives", Econometrica, 54, 607-622.
[9] Chari, V.V., Christiano, L.J. and Patrick J. Kehoe (1994): "Optimal Fiscal Policy in a
Business Cycle Model", Journal of Political Economy, 102, 617-652.
[10] Farhi, E. and Ivan Werning (2005): "The Optimal Maturity Structure of Government Debt
when Markets are Incomplete", mimeo MIT.
[11] Judd, Kenneth L. (1985): "Redistributive Taxation in a Simple Perfect Foresight Model",
Journal of Public Economics, 28, 59-83.
[12] Lobve, M. (1977): "Probability Theory II", Springer Verlag, 4th edition.
[13] Lucas, R. and Nancy Stokey (1983): "Optimal Fiscal and Monetary Policy in an Economy
without Capital", Journalof Monetary Economics, 12, 55-93.
[14] Ramsey, Frank P. (1927): "A Contribution to the Theory of Taxation", Economic Journal,
XLVIII, 47-61.
[15] Stiglitz, Joseph E. (1969): "The Effects of Income, Wealth, and Capital Gains Taxation
on Risk-Taking", Quarterly Journal of Economics, 83, 263-283.
[16] Werning, Ivan (2005): "Tax Smoothing with Incomplete Markets", MIT mimeo.
[17] Zhu, Xiaodong (1992): "Optimal Fiscal Policy in a Stochastic Growth Model", Journal of
Economic Theory, 58, 250-289.
Chapter 2
Progressive Estate Taxation 1
Introduction
Arguably, the biggest risk in life is the family one is born into. In particular, newborns partly
inherit the luck, good or bad, of their parents and ancestors, passed on by the wealth accumulated within their dynasty. This makes them concerned not only with their own uncertain
skills and earning potential, but also with that of their progenitors. They value insurance,
from behind the veil of ignorance, against these risks. On the other hand, altruistic parents
are partly motivated to work because of the impact their effort can have, through bequests, on
their children's wellbeing. The intergenerational transmission of welfare determines the balance
between insuring newborns and parental incentives.
One instrument societies use to regulate the degree of this intergenerational transmission
is estate taxation. This paper examines the optimal design of the estate tax by characterizing
'This Chapter is the product of joint work with Ivan Werning. Werning is grateful to the
hospitality of the Federal Reserve Bank of Minneapolis and Harvard University. This work ben-
efited from useful discussions and comments volunteered by Daron Acemoglu, Laurence Ales,
Fernando Alvarez, George-Marios Angeletos, Abhijit Banerjee, Gary Becker, Bruno Biais, Olivier
Blanchard, Ricardo Caballero, Francesca Carapella, Dean Corbae, Mikhail Golosov, Bengt Holmstrom, Chad Jones, Narayana Kocherlakota, Robert Lucas, Pricila Maziero, Casey Mulligan,
Roger Myerson, Chris Phelan, Thomas Philippon, Gilles Saint-Paul, Rob Shimer, Nancy Stokey,
Jean Tirole and seminar and conference participants at the University of Chicago, University of
Iowa, The Federal Reserve Bank of Minneapolis, MIT, Harvard, Northwestern, New York University, IDEI (Toulouse), Stanford Institute for Theoretical Economics (SITE), Society of Economic
Dynamics (SED) at Budapest, Minnesota Workshop in Macroeconomic Theory, NBER Summer
Institute and the Texas Monetary Conference at the University of Austin. All remaining errors
are our own.
Pareto efficient allocations in an economy featuring the tradeoff between incentives of parents
and insurance of newborns. Our main result is that estate taxation should be progressive:
fortunate parents should face a higher marginal tax rate on their bequests.
We begin with a two-period Mirrleesian economy with non-overlapping generations linked
by parental altruism; we then extend our analysis to an infinite horizon economy like Atkeson
and Lucas (1995) and Albanesi and Sleet (2004). In our simplest economy, a continuum of
parents live during the first period. In the second period each is replaced by a single descendent
and parents are altruistic towards this child. Parents work, consume and bequeath; children
simply consume. 2 Following Mirrlees (1971), parents first observe a random productivity draw
and then exert work effort. Both productivity and work effort are private information; only
output, the product of the two, is publicly observable. We study the entire set of constrained
Pareto efficient allocations and derive their implications for marginal tax rates.
For this economy, if the social welfare criterion is assumed to coincide with the parent's
expected utility, then Atkinson and Stiglitz (1976) celebrated uniform-taxation result applies,
and the optimal estate tax is zero. That is, when no direct weight is placed on the welfare of
children, income should be taxed nonlinearly Mirrlees (1971), but bequests should go untaxed.
This arrangement ensures that the intertemporal consumption choice made by parents-trading
off their own consumption against their child's consumption-is undistorted. As a result, the
inheritability of welfare across generations is perfect: the child consumption rises one-for-one
with parental consumption. In effect, efficiency dictates that altruism be exploited to provide
higher incentives for parents, by manipulating their children's consumption. Inequality for the
children's generation is created as a byproduct, since their expected welfare is of no direct
concern.
While this describes one efficient allocation, the picture is incomplete. In this economy the
parent and child are distinct individuals, albeit linked through parental altruism, a form of
externality. A complete welfare analysis then requires examining the ex-ante utility of both
parents and children. Our economy's Pareto frontier is peaked because the parent is altruistic
towards the child, so parental welfare decreases if the child is made too miserable. The allocation
2
Although some readers have remarked that they find this assumption realistic, it will be relaxed when we
extend the time horizon.
discussed in the previous paragraph is a particular point lying on the Pareto frontier: the peak
which maximizes the welfare of parents. In this paper we explore other efficient arrangements
representing points which lie on on the downward sloping section of the the Pareto frontier, to
the right of its peak.
A role for estate taxation emerges: efficient allocations which lie to the right of the peak
can be implemented by confronting parents with a simple tax system with separate nonlinear
schedules for income and estate taxes. Our main result is that optimal estate taxation is
progressive: fortunate parents face a higher marginal tax rate on their bequests. The estate-tax
schedule is convex.
Progressive estate taxation emerges to insure children against their parent's luck, it lowers
consumption inequality within the children's generation, while still providing some incentives
to parents. Child consumption still varies with their parent's, but less than one for one. Consumption mean reverts across generations, the inheritability of welfare is imperfect. The optimal
progressivity in taxes reflects this mean reversion: fortunate dynasties must face a lower net
return on bequests so that they choose a consumption path declining towards the mean.
We extend the two-period model to an infinite horizon economy, where everyone lives for
a single period, during which they observe a productivity draw and work, to be replaced by a
single descendant in the next period. With perfect altruism, dynasties behave as if they were
an infinite-lived individual.
This extension is important for at least two reasons. First, it provides a motivation for
focusing on efficient allocations which do not maximize the expected utility of the very first
generation-the analogues of the downward sloping section of the Pareto frontier. Indeed, for
the infinite horizon, the allocation that maximizes the welfare of the first generation features
everyone in distant generations converging to misery, with zero consumption Atkeson and Lucas
(1992). As we show here, by extending the analysis in Farhi and Werning (2005), this result
is special to placing no weight on future generations: when some weight is placed on future
generations a steady state exists. Second, an infinite horizon allows us to make contact with a
growing literature on dynamic Mirrleesian models, such as Golosov, Kocherlakota and Tsyvinski
(2003) and Golosov, Tsyvinsky and Werning (2006). In particular, our model environment is
identical to that of Albanesi and Sleet (2004).
The main difference between the two-period and infinite horizon economies is that tax implementations are more involved in the latter. We adapt Kocherlakota (2004) implementation.
The progressivity of estate taxes extends to the infinite horizon setup: fortunate parents face
a higher average marginal tax rate on their bequests. Indeed, the average marginal estate tax
rate formula is the same as in the two-period economy.
Our stark conclusion on the progressivity of estate taxation strongly contrasts with the
theoretical ambiguity in the shape of the optimal income tax schedule.Mirrlees (1971) seminal
paper showed that for bounded distributions of skills the optimal marginal income tax rates
are regressive at the top ( see also Seade 1982, Tuomala 1990 and Ebert 1992). More recently,
Diamond (1998) has shown that the opposite-progressivity at the top-is true with certain
unbounded skill distributions Saez (2001). In contrast, our results on the progressivity of the
estate tax do not depend on any assumptions regarding the distribution of skills.
Throughout this paper, we study an economy without capital, where aggregate consumption
equals aggregate produced output plus an endowment. This no-aggregate-savings assumption
allows us to focus on redistribution within generations and abstract from transfers across generations.
Farhi, Kocherlakota and Werning (2005) extend this model among several dimensionsincluding capital accumulation, life-cycle elements and general skill processes-and show that
the main results are insensitive to this assumption.
The rest of this paper is organized in the following way.Section 1 describes the two period
model environment and Section 2 introduces the associated planning problem. Our main results
for this two-period economy are in Section 3. In Section 4 we describe the extension to an infinite
horizon. The main results for that economy are contained in Section 5. We summarize our
conclusions in Section 6.
2.1
Parent and Child: A Two Period Economy
There are two periods labelled t = 0, 1. The parent lives during t = 0 and is replaced by a
single child at t = 1. The parent works and consumes, while the child only consumes. Thus,
an allocation is a triplet of functions (co(wo),cl(wo), yo(wo)), where co and yo represents the
parent's consumption and output, and cl represents the child's consumption.
Preferences. The parent is altruistic towards the child
vo = E u(co) - h
Yo
+ Ovi],
(2.1)
where the expectations is over wo and 3 < 1. The child's utility is simply
(2.2)
V1 = U(cl)
The utility function u(c) is increasing, concave and differentiable; the disutility function h(n)
is assumed increasing, convex and differentiable.
Substituting (2.2) into (2.1) yields the alternative expression for the parent's utility:
vo = [u(co) + 3u(c) - h O
(2.3)
As usual, the parent's expected utility can be reinterpreted as that of a fictitious dynasty that
lives for two periods and discounts at rate 3.
Technology. An allocation is resource feasible if aggregate consumption in both periods is
not greater than the sum of endowments and production.:
Sco(wo)dF(wo)
f000
jc•
< eo +
j
yo(wo)dF(wo)
(wo)dF(wo) < el
(2.4)
(2.5)
Incentives. Productivity is private information so incentives need to be provided for truthful revelation. We say that an allocation is incentive compatible if the parent finds it optimal
to reveal her shock truthfully:
u(co(wo)) +
u(cl(wo)) - h
for all productivity realizations wo.
yo(wo))
> u(co(w)) + 3u(cl(w)) - h (Y(W)o
(2.6)
2.2
Social Welfare and Efficient Allocations
We now study all constrained efficient allocations for the two-period economy introduced in the
previous section. We begin with by introducing and discussing our welfare criterion. Consider
the general welfare criterion
W - vo + aEv1 ,
(2.7)
which places some weight a > 0 on the expected utility of children. As we vary a we can trace
out the entire Pareto frontier, since the latter is convex.
Substituting equations (2.2) and (2.3) into (2.7) implies the alternative expression
W = E[u(co) + (3 + a)u(cl) - h(yo/wo)].
Thus, the social welfare function is equivalent to the parent's preferences but with a social
discount factor
/
= 6 + a that is higher than the private one as long as a > 0.
The planning problem maximizes the welfare criterion W over allocations that are resource
feasible and incentive compatible. Formally, the problem is
max
CO,el ,YO f
[u(co(wo0)) + 3u(ci(wi)) - h(yo(wo)/wo)]dF(wo)
subject to the resource constraints in equations (2.4)-(2.5) and the incentive compatibility
constraints in (2.6).
It is useful to divide the planning problem into two stages. In the first stage the planner
chooses the profile of output yo(wo) and a schedule of incentives A(wo), which is equal to utility
from consumption u(co(wo)) + 8u(cl(wo)) up to a constant. In the second stage, the planner
solves the subproblem of how best to provide the incentives A(wo), using co(wo) and Cl(wo).
The key feature is that the second stage involves no incentive constraints, these are imposed in
the first stage. Formally, by introducing A and U the full problem can be written as
00
max
cocbyoA,
[u(co(wo)) + iu(ci(wi)) - h(yo(wo)/wo)]dF(wo)
subject to A(w)+ U = u(co(wo))+ 8u(cl(wo)), the resource constraints in equations (2.4)-(2.5)
and the incentive compatibility constraints A(wo) - h(y(wo)/wo) > A(w) - h(y(w)/wo) for all
w0 . Note that the incentive constraint does not involve co, cl or U; only A and yo.
For our purposes, it suffices to focus on the second stage that takes A and yo as given, which
allows us to drop the incentive constraint:
max
CO)C1 ,V
)
[u(co(wo)) + 6u(cC(wi))]dF(wo)
subject to A(wo) + U = u(co(wo)) + Bu(ci(wi)) and the resource constraints in (2.4)-(2.5).
It is convenient to rewrite this problem by changing variables, from consumption to utility
assignments Uo(w) = u(co(w)) and Ui(w) = u(ci(w)), since then the objective is then linear
and the constraints strictly convex. After substituting Uo(wo) = A(wo) + L - /U (wl) out the
problem becomes
max
00
[U + (0 - O)Ui(wi)]dF(wo)
subject to
j
C(A(wo) + U - fU 1 (w1))dF(wo) < eo +
j
yo(wo)dF(wo)
fo00C(Ui(wo))dF(wo) < e
0loo
It is easy to see that both resource constraints must bind at an optimum.
2.3
The Main Result: Progressive Estate Taxation
In this section we derive two main results for the two-period economy laid out in the previous
section. We first show that implicit marginal tax rates on bequests must be progressive. We
then provide a simple tax implementation that relies on two separate schedules for labor income
and estates.
2.3.1
Implicit Marginal Taxes
For any allocation and constant R > 0 we can define the associated marginal tax rates r(wo)
solving the Euler equation
1 = OR(1 - r(wo)) u'(c1(wo))
u'(cO(Wo))
u,(co (Wo))
(2.8)
Below, the constant R plays the role of the pre-tax gross interest rate. Since our economy has
no savings technology, this value is not uniquely pinned down in equilibrium-it is completely
unimportant for anything that follows. Different values of R are associated with different levels
for the tax, but they do not affect its shape.
The first-order condition for U1 (wo), which is necessary and sufficient for optimality, is
' - 6+PAo'(Uo(wo)) =AiC'(Ui(wo)).
where At is strictly positive lagrange multiplier on the resource constraint for period t. From
this equation it follows that Uo(wo) and Ul (wo) move in the same direction with wo. Since
Uo(wo) +/3U
1 (wo)
must be increasing, in order to provide incentives, it follows that both Uo(wo)
and Ul(wo) are increasing; hence, both consumptions co(wo) and Cl (w0 ) are increasing in w0 .
Using the fact that C(u) is the inverse of u(c), so that C'(Ut(wo)) = 1/u'(ct(wo)), and
rearranging we obtain
1=10-10
1+
\A1
(Wo)) u,(ci (wo))(29
11 u,(coAo
/\
) u'(co(Wo))
29
From the first order condition for U it follows that 1/Ao = fo(1/u'(co(w)))dF(w). For what
follows we normalize so that R = Ao/A 1 .
Our first result, derived from (2.9) when
=
3
/,
can be viewed as simply restating the
celebrated Atkinson-Stiglitz uniform taxation result for our economy.
Proposition 17 When
= /6 the optimal allocation implies a zero marginal estate tax rate:
r(wo) = 0 in (2.8) and the marginal rate of substitution u'(ci(wo))/u'(co(wo)) is equated across
all dynasties, i.e. for all wo.
Atkinson and Stiglitz (1976) showed that, provided preferences over a group of goods is
separable from work effort, then consumption within this group should not be distorted. In
other words, the implied marginal taxes for these goods should be equalized to avoid distorting
their relative consumption-uniform taxation is optimal. In our context, this result applies
to consumption at both dates, co and cl, and implies that the ratio of marginal utilities is
3
equalized across agents-the estate tax can be normalized to zero.
In contrast, whenever / >
0 (2.9) implies that the ratio of marginal utilities is not equalized
across agents: there must be some distortion, so the marginal estate tax cannot be zero. Indeed,
since consumption increases with productivity estate taxation must be progressive.
Proposition 18 When / > 3 the optimal allocation implies a nonzero and progressive marginal estate tax: T(wo) 5 0 for all wo and T(wo) is increasing in wo. For R = 3 the marginal
tax rate is
T(wo) =-(0//0 - 1)u'(co(wo))
u'(co(w)) ldF(w)
(2.10)
and co(wo), cl(wo) and yo(wo) are increasing in wo.
We emphasize that the interesting implication for the tax rate here is that it increases with
productivity: taxation is progressive. Without an aggregate savings technology the overall level
of estate tax cannot be uniquely pinned down, it is completely irrelevant. Farhi, Kocherlakota
and Werning (2005) extends the analysis to an economy with capital, which pins down the level
of estate taxation.
2.3.2
A Simple Tax Implementation
We next show that we can implement efficient allocations, and the progressive implicit marginal
tax rates that go with them, with a simple tax system. In our implementation, the government
confronts parents with two separate schedules: an income tax and an estate tax. We say that
an allocation is implementable by non-linear income and estate taxation Ty(yo), T2y and Tb(b)
3
One difference is that AtkSti76 assume a linear technological transformation between goods, whereas we
assume no possible transformation. Their result on uniform taxation implies that marginal rates of substitution
are equalized across agents and that they are all equal to the marginal rate of transformation. Our result only
emphasizes the former.
if, for all wo, the allocation (co(wo),ci(wo),yo(wo)) solves
max {u(co) + /u(ci) - h(yo/wo)}
CO ,Cl ,Yo
subject to
co + bi = Yo - Tb(bl) - Tl (yo),
cl = Rbl + Y2 - T2Y.
It is trivial to change things so that it is the child that pays the estate tax at t = 1.
Furthermore, without loss of generality we can assume that Y2 -2
= 0. To see this, define
1 -- bl + (Y2 - T 2 )/R then
co + b1
Yo - Tb(bl - (Y2 - T 2 )I)
=Yo - tb
where
TY(yo)
(b1)
-•y
- Ty(yo) + T2(yo) and Tb(bl)
-
T(yo)
-
T2Y(yo)
(Yo)
-
Tb(b1 - (Y2 - T 2 )/R).
Our next result establishes formally that efficient allocations can be implemented with
separate nonlinear income and estate taxation. The idea is to define Tb(b) so that
1
= 1 - T(w)
1 + Tb (cl(w))
The proof then exploits the fact that marginal tax rates are progressive to ensure that the
bequest problem faced by the parent is convex.
Proposition 19 Suppose co(wo), cl(wo), yo(wo) and T(wo) are increasing functions. Then
there exists tax functions TY(y) and Tb(b) that implements this allocation, with Tb(b) convex.
Proof. Use the generalized inverse of cl (w), where possible flat portions of c1l(w) define
discontinuous jumps, to define
Tb'(c) =
1
- 1
1- -r((ci)-I(c))
(2.11)
and normalize so that Tb(0) = 0. Note that by the monotonicity of r(w) and co(w), the function
Tb(b) is convex. Next define net income
I(wo) = co(wo) + R-'cl (wo) + Tb(c (wo))
We can express this in terms of output y by using the inverse of yo(wo): IY(y) = I(y'(y)).
Then we let TU(yo)
yo - IY(yo). Finally, let the consumption allocation as a function of net
Y
income I be: (co(I),e1(I)) - (co(I- (I)),cl(I- (I))).
We now show that the constructed tax functions, TY(y) and Tb(b), implement the allocation.
For any given net income I the consumer solves the subproblem:
V(I) - max{u(co) + 3u(ci)}
subject to co + R-1cl + Tb(cl) < I. This problem is convex, the objective is concave and the
constraint set is convex, since Tb is convex. It follows that the first-order condition
1=
OR
u'(cl)
1 + Tb'(b) u'(co)
sufficient for optimality. Combining (2.8) and (2.11) it follows that these conditions for optimality are satisfied by Co(I), ýo(I) for all I. Hence V(I) = u(6o(I)) + fu(6o(I)).
Next, consider the worker's maximization over yo given by
max{V(I(y)) - h(y/wo)}.
Y
We need to show that yo(wo) solves this problem, which implies that the allocation is implemented since consumption would be given by co(I(yo(wo))) = co(wo) and 61(I(yo(wo))) =
cl(wo). Now, from the previous paragraph and our definitions it follows that
yo(wo) e arg max{V(I(y)) - h(y; wo) }
Y
SYo(wo) e argmax{u(6o(I(y))) + 3u(61(I(y))) - h(y/wo)}
Y
'*
wo c arg max{u(co(w)) + /u(ci(w)) - h(yo(w)/wo)}
w
Thus, the first line follows from the last, which is guaranteed by the assumed incentive compatibility of the allocation, (2.6). Hence, yo(wo) is optimal and it follows that (co(wo), cl (wo), yo(wo))
is implemented by the constructed tax functions. m
2.3.3
Discussion
Without estate taxation there is perfect inheritability of welfare. In particular, consumption of
parents and child move in tandem, one for one. This situation is only optimal when the children
are not considered independently in the welfare criterion, so that insuring them against the risk
of their parent's fortune is not valued.
In contrast, when insurance is provided to the children's generation their consumption still
varies with their parent's, but less than one-for-one.
The intergenerational transmission of
welfare is imperfect: consumption mean reverts across generations. The progressivity of the
estate tax schedule reflects this mean reversion. Fortunate parents must face a lower net returns
on bequests in order to give them incentives to tilt their consumption towards the present, that
is, towards themselves. Likewise poorer parents need to face higher net returns so that their
consumption slopes upward. This explains the progressivity of estate taxes.
Another intuition is based on the interpretation of altruism as a form of externality. In
the presence of externalities, some form of corrective Pigouvian taxes are generally desirable.
Think of a parental bequest as a consumption good with a positive externality to the child;
then the Pigouvian logic implies that we should subsidize bequests. Since expected utility is our
concern, and utility is concave, this externality is greatest for children with low consumption.
Thus, the subsidy rate should be highest-or equivalently, the negative tax should be lowestfor poor parents.
Optimal estate taxation is thus progressive.
Since our economy has no
capital, the Pigouvian level of taxation turns out to be irrelevant-we may tax or subsidize
estates. However, the relative tax conclusion in this argument remains robust.
None of these arguments require the private-information structure. However, if productivity
or effort were observable, then the first-best allocation would be achievable. Consumption and
wealth would then be equated across parents. Although one can still think of a progressive
estate tax in this situation for out-of-equilibrium levels of parental wealth, it becomes irrelevant
given the lack of parental inequality. In this sense, our results rely on an interaction between
redistributive and corrective motives for taxation (see also Amador Angeletos and Werning
2005).
2.4
A Dynamic Mirrleesian Economy with Infinite Horizon
We now turn to a repeated version of this economy with an infinite horizon, as in Albanesi
and Sleet (2004). All generations work and receive a random productivity draw. An individual
born into generation t has ex-ante welfare vt solving
vt = Et-1[u(ct) - h(nt) + Pvt+i],
where 0 < 1 is the coefficient of altruism. We assume that the utility function over consumption
satisfies the Inada conditions u'(0) = c0 and u'(oo) = 0. We adopt a power disutility function
h(n) = n'/-y with -y > 1 to ensure that the planning problem is convex.
An individual with productivity w, exerting work effort n, produces output y = w.-n. Utility
can then be written as
00oo
Vt =
Z
s Et- 1 [u(ct+s) - Ot+sh(yt+s)]
(2.12)
s=0
where Ot
w-7 can be interpreted as a taste shock to producing output. Productivity wt,
and hence Ot, is independently and identically distributed across dynasties and generations t =
0,1 ... With innate talents assumed nonheritable, intergenerational transmission of welfare is
not mechanical linked through the environment but may arise to provide incentives for altruistic
parents.
Since productivity shocks are assumed to be privately observed by individuals and their
descendants each dynasty faces a sequence of consumption functions {ct}, where ct(Ot) repre9
01,... , t). A dynasty's
sents an individual's consumption after reporting the history Ot - (o00,
reporting strategy a
{at} is a sequence of functions at : Et+l --+ E that maps histories of
shocks Ot into a current report 0t. Any strategy a induces a history of reports at : Et+l --+
We use a* to denote the truth-telling strategy with a*(0t) = Ot for all 0 E et+l.
t+ .
Given an allocation {ct}, the utility obtained from any reporting strategy a is
O0
U({ct},a;
o ) =-E
E
t[u(ct(at(Ot))) - Oth(y(a t (Ot)))]Pr(Ot ).
t=O oetet+1
An allocation {ct} is incentive compatible if truth-telling is optimal, so that
(2.13)
U({ct},a*; 0) > U({ctl}, a; 0)
for all strategies a.
We identify dynasties by their initial utility entitlement vo with distribution ' in the population. An allocation is a sequence of functions {c, yy } for each v, where c (Ot) and y (Ot)
represents the consumption and income that a dynasty with initial entitlement v gets at date
t after reporting the sequence of shocks 0t . For any given initial distribution of entitlements
0 and resources e, we say that an allocation {cyt} is feasible if: (i) it is incentive compatible
for all dynasties; (ii) it delivers expected utility of v to all initial dynasties entitled to v; and
(iii) average consumption in the population does not exceed the fixed endowment e plus income
generated in all periods:
J
yv~
(Ot) Pr(Ot ) d (v)
cv (Ot) Pr(Ot ) dO (v) < e +
t = 0,1,...
(2.14)
Ot
Ot
Consider the sum of expected utilities weighted by geometric Pareto weights at = t
00
1
atEvt= 1-
1
1
o+ --
00
E t E_1 [u(ct) - Oth(yt)].
(2.15)
with ý > 0. The first term is exogenously given, since we take as given a distribution for the
initial utility entitlements vo. Thus, the welfare criterion is given by
00
•t EB_1[u(ct) - 0th(yt)]
(2.16)
t=0
Future generations are already indirectly valued through the altruism of the current generation.
If, in addition, they are also directly included in the welfare function the social discount factor
must be higher than the private one (for more details, see Farhi and Werning 2005).
When 4 =-,
the planning problem seeks the lowest constant resource level e to ensure that
there exists a feasible allocation that delivers the distribution of utility entitlements 0. This is
precisely the efficiency problem studied in Albanesi and Sleet (2004). When / > 3 we define
the social optimum as maximizing the average social welfare function (2.16), weighed by V,
over all feasible allocations. That is, the social planning problem given an initial distribution
of entitlements 0 and an endowment level e is to maximize
(2.17)
fU(14}} a*, ý)dp(v)
subject to the the resource constraints (2.14), as well as the promise keeping and incentive
constraints: v = U({cy},a*;/)
and U({c'}, a*;3)
U({cJ }, a;)3) for all initial entitlements v
and strategies a.
We are interested in distributions of utility entitlements P such that the solution to the
planning problem features, in each period, a cross-sectional distribution of continuation utilities
vt that is also distributed according to 4. We also require the cross-sectional distribution of
consumption and income to replicate itself over time. We term any initial distribution of
entitlements with these properties a steady state and denote them by ¢*. Following Farhi and
Werning (2005), we approach the planning problem by studying a relaxed version of it. The
solutions to both problems coincide for steady state distributions 0*, which is all we seek to
characterize. The relaxed problem has continuation utility as a state variable that follows a
Markov process. Steady states are then invariant distributions of this Markov process.
Define the relaxed planning problem to be equivalent to the social planning problem except
that the sequence of resource constraints (2.14) is replaced by the single intertemporal condition
t=0
t
O-
(ci(Ot) _ yip(Ot) Pr(Ot)) do(v
) <
1 e.
(2.18)
Letting A be the multiplier for this intertemporal resource constraint we form the Lagrangian
L - f LV d (v) where
00
LV -
t(u((0t)) - -Acv (0t ) - 0th(y (Ot )) + AY(Ot)) Pr(0t )
(2.19)
t=0 ot
and study the maximization of L subject to v = U({cv},a*; p) > U({cY' }, o;,) for all v and a.
For any endowment level e, there exists a unique positive multiplier A(e) so that the maximizing
this Lagrangian is equivalent to solving the relaxed problem. Maximizing L is equivalent to the
pointwise optimization, for each v, of the subproblem:
k(v) - sup L'
subject to v = U({c(uv)},a *;I)
(2.20)
> U({c(u')}, a;I) for all a.
The value function of the component planning problem k(v) defined by (2.20) is continuous,
concave, and satisfies the Bellman equation
k(v) = uhw
maxE[u(0) -
\c(u(0))
- Oh(0) + \y(h(0)) + 3k(w(0))]
(2.21)
subject to
v = E[u(0) - Oh(0) + /w(0)]
u(0)-0h(0)+Ow(0) > u(0')-0h(0')+3w(0')
(2.22)
for all
0,0' E E. (2.23)
Denote by gw(v, 0) and gu(v, 0) the optimal policy function for w and u. The next lemma
characterizes some key properties of the value function k(v).
Lemma 20 The value function k(v) is strictly concave and continuously differentiable on (v, V)
where v = -oo; it is unbounded below on both sides limv_, k(v) = limv-, k(v) = -o; and the
derivative has limv-, k'(v) = 1 and limv,v k'(v) = -oo.
2.5
Steady States and Progressive Taxation
We are interested in steady state distributions 0* that have no mass at misery u(0)/(1 - p).
Our first result is that this is not possible when future generations are not weighed directly,
so that 3 = /3. We then show that, in contrast, whenever 3 > /3 a steady state distribution
exists with no mass at misery. The efficient allocation displays a form of mean reversion across
generations that keeps inequality bounded. The mean reversion is characterized by a modified
inverse Euler equation which implies that estate taxation is progressive.
2.5.1
An Immiseration Result
For ý = ý,we have to modify our definition for the Social Planning problem. For any distribution b of initial welfare entitlements, the planning problem is to minimize the net resources
required to deliver the utility entitlements in an incentive compatible way:
inf e
subject to,
I
(2.24)
t ))do(v) < e
-(c'(O t ) - yg(O
ot
(2.25)
U({c },a; 0) = v for all v
(2.26)
U({c'J},a*; 0) Ž U({c'}, a; 0) for all v and a
(2.27)
From this program, we can define an invariant distribution exactly as in Section 4 of the
paper. We are interested in steady state distributions V* without full mass at misery. Our first
result is that this is basically not possible when 3 = 0.
Proposition 21 Suppose that limu.oo sup c"(u)/c'(u) < co.
Then if / = /, there exists no
invariant distribution0* without full mass at misery.
This result extends the immiseration result in Atkeson and Lucas (1992), who study an
endowment economy with privately observed taste shocks, instead of the Mirrleesian production
economy with privately observed productivity shocks studied here. They show that the crosssectional distribution of consumption disperses steadily over time, with inequality growing
without bound. As a result, almost everyone converges to the misery, consuming nothing, while
a vanishing fraction tend towards bliss, consuming the entire aggregate endowment. Thus,
no steady state distribution with positive consumption exists. To the best of our knowledge
Proposition 21 is the first formal statement of an analogous result in the context of a Mirrleesian
economy, where private information is regarding productivity shocks. Researchers that assume
S=-3
have been typically forced to impose an ad hoc lower bound on continuation utility
to avoid misery and ensure that an steady-state distribution exists (Atkeson and Lucas 1995,
Albanesi and Sleet 2004).
2.5.2
Steady States and a Modified Inverse Euler Equation
We now return to efficient allocations where future generations are given positive weight. We
first derive an important intertemporal condition that must be satisfied by the optimal allocation. This condition has interesting implications for the optimal estate tax, computed later.
Let A be the multiplier on the promise-keeping constraint and let 1(0, 0') represent the
multipliers on the incentive constraints. Then the first-order conditions for interior solutions
for u(0) and w(0) are
p(9) - Ac'(u(0))p(O) - Ap(9) - •
p(9, 0') + E•p~(0', 0) = 0
01
O'
1k' (w(o))p(o) - OAp(o) - 1E A(o, 0') + 1 ,(0', 9)= 0
0'
(2.28)
01
9'
(2.29)
O'
The envelope condition is k' (v) = A. From the first-order condition for w(O) we obtain the
CLAR equation
k'(v) =
3
0
k'(gw(v, 0))p(0).
(2.30)
This equation encapsulates the mean-reversion force in the model. In sequential notation
-k'(vt) = Et [k'(vt+l)],
(2.31)
so that 1/1 < 1 acts as an autoregressive coefficient ensuring that over time the derivative k'(vt)
mean reverts back to zero, where the function k(v) finds its interior maximum. The meanreverting force provided by
1
> 0 is crucial for the existence of steady state distributions with
bounded inequality, which we prove below. In contrast, when 3 = 3 no such central tendency
exists, increasing inequality and immiseration ensues and no steady state exists (Proposition 21).
The optimal resolution of the tradeoff between incentives for altruistic parents and insurance
for newborns gives rise to a less than one-for-one intergenerational transmission of welfare-in
contrast to the case where 3 =
0. The descendants of a rich parent are more fortunate than
those of a poor parent, but less and less so the more distant is the descendant: the impact of
the initial fortune of dynasties dies out over generations.
The more weight is put on future generations, the higher is / compared to
/, and the
less intense is the link between the welfare of parents and child. But as we will now show,
even the smallest amount of mean-reversion in the form of
/ > /0 puts enough limits on the
transmission of shocks across generations to prevent the distribution of consumption and welfare
from exploding.
The first-order conditions (2.28)-(2.29) imply that
-k'(w(O)) = 1- Ac'(u(O))and k'(v)
1 - Ac'(u_),
where u should be interpreted as the previous period's assignment of utility from consumption.
Substituting these relations into the CLAR (2.30) we arrive at a Modified Inverse Euler equation
1
_
u'(C-)
1
0 u(c(0))
U/1
p(O) -
1 .
(2.32)
The left-hand side together with the first term on the right-hand side is the standard inverse
Euler equation. The second term on the right-hand side is novel, since it is zero when
/ = /
and is strictly negative when / > 0.4
We now show that a steady state exists whenever the welfare criterion places direct weight
on children so that
/ > 0. The proof follows Farhi and Werning (2005) quite closely, which
proves such a result for an economy with taste shocks.
Proposition 22 There exists an invariant distribution 4*for the Markov process {vt } implied
by gW.
Moreover any invariant distribution 0* has a support bounded away from misery E.
4
*FarKocWer05 show that this equation, and its implications for estate taxation, generalize to an economy
with capital and an arbitrary process for skills.
Suppose in addition that limu,
supc"(u)/c'(u) < oo. Then any invariant distribution
necessarily has a support bounded away from T.
Note in particular that if limu_• sup c"(u)/c'(u) < oo, then there exists an invariant distribution, and any invariant distribution has a compact support.
The result relies heavily on the force for mean reversion that is behind (2.31) and (2.32).
To see this mean-reversion force most clearly consider, as an example, the logarithmic utility
case, u(c) = log(c). Then 1/u'(c) = c and (2.32) can be written with sequential time notation
as
lEt[ct+l]
--
Ct +
(
-
,
or simply
ct+1 = -ct +
1-
+et+1
with Et[Et+lj] = 0
-1
where
- A
is average consumption at the steady-state cross-sectional distribution. As the
last expression indicates, with logarithmic utility, consumption itself is autoregressive with an
autoregressive coefficient equal to
2.5.3
1/1 < 1.
Tax Implementation
Any allocation that is incentive compatible and feasible, and has strictly positive consumption,
can be implemented by a combination of taxes on labor income and estates. Here we first
describe this implementation, and explore some features of the optimal estate tax in the next
subsection.
For any incentive-compatible and feasible allocation
{c'(Ot),
y(OtQ)} we propose an imple-
mentation along the lines of Kocherlakota (2005). In each period, conditional on the history
of their dynasty's reports Ot-1 and any inherited wealth, individuals report their current shock
Ot, produce, consume, pay taxes and bequeath wealth subject to the following set of budget
constraints
Ct + bt
• yt(&t)
- Tt(t) + (1 - Tt(bt))Rt-1, tbt-
1
t = 0, 1,...
(2.33)
where Rt-1 ,t is the before-tax interest rate across generations, and initially b-1 - 0. Individuals
are subject to two forms of taxation: a labor income tax Tt(ot), and a proportional tax on
5
inherited wealth Rt-,,tbt-1 at rate Tt(^t).
Given a tax policy {T' ((O),
(t) , y (0t))}, an equilibrium consists of a sequence of interest
rates {Rt,t+ 1}; an allocation for consumption, labor income and bequests {cV(Ot), bg(Ot)}; and a
reporting strategy {fa(Ot)} such that: (i) {ct, bt, at} maximize dynastic utility subject to (2.33),
taking the sequence of interest rates {Rt, t+1} and the tax policy {Tt, Tt, Yt} as given; and (ii) the
asset market clears so that
fE_ 1[bv (Ot)] de(v)
= 0 for all t = 0, 1,... We say that a competitive
equilibrium is incentive compatible if, in addition, it induces truth telling.
For any feasible, incentive-compatible allocation {c1, yv}, with strictly positive consumption we construct an incentive-compatible competitive equilibrium with no bequests by setting
Ttv(Ot) = yt(Ot) - ct(Ot) and
t(O
(t)
1-
1
u(c (e-l))
ORt-1,,t
u'(ct(9t)
(2.34)
for any sequence of interest rates {Re-1, t}. These choices work because the estate tax ensures
that for any reporting strategy o, the resulting consumption allocation {c(at(Ot))} with no
bequests bv (Ot) = 0 satisfies the consumption Euler equation
u' (c+l(at+1 (Ot, Ot+))) (1 - TV,1 (at+1 (t, Ot+l))) Pr(0t+l).
u'(c4 (at(ot))) = ORt, t+1
Ot+1
The labor income tax is such that the budget constraints are satisfied with this consumption
allocation and no bequests. Thus, this no-bequest choice is optimal for the individual regardless
of the reporting strategy followed. Since the resulting allocation is incentive compatible, by
hypothesis, it follows that truth telling is optimal. The resource constraints together with the
6
budget constraints then ensure that the asset market clears.
As noted above, in our economy without capital only the after-tax interest rate matters
5In this formulation, taxes are a function of the entire history of reports, and labor income yt is mandated
given this history. However, if the labor income histories yt: O-e Rt being implemented are invertible, then by
the taxation principle we can rewrite T and r as functions of this history of labor income and avoid having to
mandate labor income. Under this arrangement, individuals do not make reports on their shocks, but instead
simply choose a budget-feasible allocation of consumption and labor income, taking as given prices and the tax
system.
'Since the consumption Euler equation holds with equality, the same estate tax can be used to implement
allocations with any other bequest plan with income taxes that are consistent with the budget constraints.
so the implementation allows any equilibrium before-tax interest rate {Rt-l,t}. In the next
subsection, we set the interest rate to the reciprocal of the social discount factor, Rt-l,t =
-1.
This choice is natural because it represents the interest rate that would prevail at the steady
state in a version of our economy with capital.
2.5.4
Optimal Progressive Estate Taxation
In our environment, the relevant past history is encoded in the continuation utility so the estate
tax -(0 t --l,
0) can actually be reexpressed as a function of vt(O t - 1) and Ot. Abusing notation we
then denote the estate tax by rt(v, Ot). Since we focus on the steady-state, invariant distribution,
we also drop the time subscripts and write T(v, 0).
The average estate tax rate i(v) is then defined by
1 - i(v) - E
(1 - T(v, 0))p(0)
(2.35)
6
0
Using the modified inverse Euler equation (2.32) we obtain
t(v)
=
--
- 1U' (c_ (v))
- 1
In particular, this implies that the average estate tax rate is negative, i(v) < 0, so that bequests
are subsidized. However, recall that before-tax interest rates are not uniquely determined in
our implementation. As a consequence, neither are the estate taxes computed by (2.34). With
our particular choice for the before-tax interest rate, however, the tax rates are pinned down
and acquires a corrective, Pigouvian role. Differences in discounting can be interpreted as a
form of externalities from future consumption, and the negative average tax can then be seen
as a way of countering these externalities as prescribed by Pigou. In our setup without capital,
this result depends on the choice of the before-tax interest rate. However, the negative tax on
estates would be a robust steady-state outcome in a version of our economy with capital.
In our model it is more interesting to understand how the average tax varies with the history
of past shocks encoded in the promised continuation utility v. The average tax is an increasing
function of consumption, which, in turn, is an increasing function of v. Thus, estate taxation
is progressive: the average tax on transfers for more fortunate parents is higher.
Proposition 23 In the repeated Mirrlees economy, an optimal allocation with strictly positive
consumption can be implemented by a combination of income and estate taxes. At a steadystate, invariant distribution *, the optimal average estate tax 1(v) defined by (2.34) and (2.35)
is increasing in promised continuation utility v.
The progressivity of the estate tax reflects the mean-reversion in consumption. The fortunate must face lower net rates of return so that their consumption path decreases towards the
mean.
2.6
7
Conclusions
Societies that do not value future generations directly should help their citizens lead their
descendants into misery. But when a society cares about future generations then it should be
concerned with insuring children against the greatest risk of all: the family they are born into.
A natural tax instrument to for this goals is estate taxation. We find that estate taxation
should be progressive.
7
Farhi, Kocherlakota and Werning (2005) explore more general versions of this result and discuss other
intuitions.
Appendix
2.7
Proof of Lemma 20
Strict concavity and differentiability follow from standard arguments. In order to derive the
limits of k and k' at the bounds of the domain, we derive a lower bound km in and an upper
bound kmax, for which we can easily compute the corresponding limits.
Consider the solution {uVO (0 t), yVO (t)}
to the relaxed planning problem for a given v0 . For
all v < vo, define {uvo (t),yvo (0t)} by
u(o (t)
= uvo (t)
h(y•vo (o0))
= h(yVo (o0))
yvo•(Ot)
Let
for all t > 0
-
yVO (t)
+ vo - v
for all t > 1
OO
kmin(v)
t E-[Vo(t)
=c(uvo(9t))
-
+
vyo( t ) -
Oth(yvo(o t ))]
t=0
Since {u vO(0t), yvO (ot)} is incentive compatible and delivers welfare level v, we have k(v) >
kmin(v), for all v < vo. We have
kmin'(v) = 1 - AE h,(h(yvo(O) + vo - v)
Hence
lim k m in (v) = 1
V--+-OO
Since k(v) _ kmin(v), for all v < vo and both k and km in are concave, this implies that
lim k'(v) < 1
V---OO
Next define
00
k(v) = sup Et=O
t
E-1[u(Ot) - ýc(u(9t)) + Ay(O t ) - Oth(y(Ot ))]
00
v = E tE_1l[u(Ot) - Oth(y(0t))l
t=O
This corresponds to the relaxed planning problem, but without the incentive constraints. Hence
we have k(v) < k(v).
Let
m = max u - Ac(u) + Ay - Oh(y)
u,y,O
Then
00
k(v) 5 sup Z"
t=o
t EI[u(O
t
)-
5 v +sup {
Ioot=o
ftE
Ac(u(Ot ))
l [-Ac(u(Ot))
+"y( t) + Ay(O t )]
Oth(y(Ot ))] + m +[ 1
- 1-
-
1
1-+ 1M1]
1-
Hence if we define
C(v) = inf E
tE-l[c(u(Ot)) - y(Ot)]
t=O
00
OO
v= Z
tE_ [u(Ot) - Oth(y(Ot))]
t=O
and
kmax(v) = v - C(v) + m
1-/
1-/3
we have
k(v) < kma(v)
Denote by {uC(Ot, v), yC(Ot,v)} the solution of the program defining C. Combining the first
order conditions for u(Ot) and the envelope theorem, we get
c'(uC(Ot,v)) = C'(v) for all t > 0
1
= C'(v) for all t > 0
Oth'(yc (Ot, v))
This implies that
lim C'(v)= 0
V-4 -00
lim uC(Ot v)= u
V--+
00
lim yC (0tv)=
V---00
Hence
lim k m ax ' (v) = 1
V--
Since k < km"
O-4-0
and both k and k m a x are concave, this implies that
lim k'(v) > 1
V-- -00
Since we already have
k' (v)< 1
this implies that
lim k'(v) = 1
V---Oo
Note that we always have
lim C'(v) = +oo
V-4V
v--v
lim km ax (v) = -oo
v--v
Since k(v) • kmax(v),and both k and kmax are concave, this implies that
lirm k'(v) = -oc
V--V
Finally, note that
lim k m x(v) = lim kmax(v) = -x0
V----+V
V--4V
Hence
lim k(v) = lim k(v) = -oo
V---;V
2.8
V---V
Proof of Proposition 21
In order to characterize the optimal allocation it is convenient to study a relaxed problem. The
Lagrangian theorem guarantees that there exists a unique sequence of multipliers {qt} with
qo = 1 on (2.25) such that solving (2.24) is equivalent to solving the following program:
(ce(ot)- y (0t ))dO (v)
qt
inf
t>O
Ot
subject to (2.26) and (2.27). Note that this problem is equivalent to the minimization v by v of
C(v; {qt}) =
qt
t>O
f
(c(O) - y;(O t))
et
subject to
U(I{c}, o; 0) = v
U({c'}},o*; 0) _ U({cv'}, ;3) for all a
Hence C(v; {qt}) is the least possible cost of an incentive compatible allocation delivering
welfare v to the first generation. It is trivial to see that C(v; {qt }) is the solution of the following
Bellman equation
qt+
C(v; {qt+s} 8>I) = inf E[c(uo) - y(ho) + qt+C(wo, { }s
qt+l
2)]
(2.36)
-
subject to
v = E[uo +/wo
- Oho]
uo + /wo - Oho _ Uo, + /wo, - Oho,
For future use, let us denote by gw(v, Ot) the continuation utility after a history of shock Ot
when the initial welfare entitlement is v.
Suppose there exists an invariant distribution 0*, and let {qt} be the associated sequence
of multipliers. Since 0 is a state variable for (2.24), this shows that qt+l/qt is independent
of t. Hence there exists 0 < q < 1 such that qt+l/qt = q for all t. We can therefore drop the
time dependence on the sequence {qt} in Ct(v; {qt}), and simply write C(v) as a shortcut for
C(v, {qt}t>o).
Lemma 24 Suppose there exists an invariantdistributionV* without full mass at misery. Then
q>_3.
Proof. We will make use of two possible state variables. The first state variable is the
natural one: v, promised future utility. The other one is utility attained by the previous
generation u_. Indeed, from the first order conditions, it is easy to see that these two state
variables are related by
c'(u_) = !C'(v)
The existence of an invariant distribution
existence of an invariant distribution
* (v) with not mass at misery is equivalent to the
p*(u_) with no mass at misery.
Let xo = uO + 3wo. Then we can rewrite the Bellman equation (2.36) as
C(v) = inf E[c(ue) - y(he) + qC(wo)]
subject to
v = [xo - Oho]
Xo - Oho > X0, - Oho,
uO + 3 wo = Xo
Hence, given a value x for xo, uo and wo are given by the sub-program
min c(u) + qC(w)
subject to
u + /w = x
The solution is given by the first order condition
c'(x - aw) + C'(w) = 0
Using the implicit function theorem, we can then compute
du
6C"(w)
dx
CIC"(w) + Ocd'(x - Ow)
Hence
du
d-•x -
This in turn implies that there exists M > 0 such that
max luo, - uol < M max he
0
0,0'
The first order conditions for uo in in (2.36) imply that
-c'(u_) = E[c'(uo)]
q
Hence
Oc'(u_) = E(c'(uo)] 5 c'(uo)
q
Therefore,
log(A) + log(c'(u_)) 5 log(c(ug0))
and hence
logQý) + log(c'(u-)) <5log(c'(u-)) +
q
(max
c(u) (O -u)
which we can rewrite as
IogQ13)
Iq
c"()'
< UO
-
U-
Hence for all 0 E E,
log(16)
logq)
-
Mmax hO, < uo - u
'ee
0e
maxuE _,U
-
In order to allow for bunching in (2.36), it is convenient to consider the following program
inf
Pn {c(un) - y(hn) + qC(wn)}
n
v=
Pn(un+ Wn --9nhn)
n
w
-Onhn + un + iwn
-Onhn+l + un+1 +
3
Wn+1 for
n = 1, 2,..., K - 1,
This problem and its notation require some discussion. We do not incorporate the monotonicity
constraint on h. But this notation allows us to consider bunching in the following way. If any
set of neighboring agents is bunched, then we group these agents under a single index and let
pn be the total probability of this group. Likewise let On represent the conditional average of 0
within this group, which is what is relevant for the promise-keeping constraint and the objective
function. Let On be the shock of the highest agent in the group. The incentive constraint must
rule the highest agent in each group from deviating and taking the allocation of the group above
him.
Of course, every combination of bunched agents leads to a different program. The optimal
allocation of our problem must solve one of these programs with a strictly monotone allocationsince bunching can be characterized by regrouping agents. Thus, below we characterize solutions
to these programs with strict monotonicity of the solution.
The first order conditions for hn is
y'(hn) = C'(v)On
+ On9n,n+1 - On-l-ln-
This implies in particular that at the optimum, for any of these programs (and hence for the
program solved by the true optimal allocation),
y'(h 0 ) > C'(v)0.
It is easy to verify that C > C, where C is the solution to (2.36) without the incentive
compatibility constraints. Let i be the upper bound of the domain for v. Since both C and C
are increasing and convex, and since
lim C(v) = c
and
lim C'(v) = oo
U-4OO
V-M
we have
lim
C(v) = oo
V)---V
and
lim C'(v) = oo
V---V
Therefore,
lirm y'(ho(v)) = oc
=>
lim ho(v) = 0
V--4V
and since ho has is decreasing in 0,
lim ho(v) = 0 for all
V)--+V
E
e)
But this in turn implies that
log(q)
lmgQ)•im[_,
(maxuEfu-,u~j
inf(uo - u_)
C7(U-D
Suppose that q < 3. This implies that for v or equivalently u_ high enough, the policy
functions uo are all such that uo > u_. This in turn implies that
* necessarily has a support
bounded away from U. This in turn implies that
SC'(v)do*(v)=
c'(u_)db*(u_) < oo
Integrating
- C'(v) = E[C'(wo)]
q
over v, we get
C'(v)d4*(v)
C'(v)do*(v) =
Since ,* doesn't have full mass at misery, we have
f
C'(v)do*(v) > 0. This in turn implies that
= q, a contradiction. n
We have therefore proved that q Ž
at ¢*. But then from the equation
!C'(v) = IE[C'(wo)]
q
we see that C'(vt) is a positive supermartingale. By the martingale convergence theorem, for
any initial value vo for v, the sequence of random variables {vt} converges almost surely to a
random variable C'" with
E[C'oo] < C'(v).
Suppose that there must exists a v* such that Pr(C'" > 0). We will show that this is not
possible.
For any realization 0o' define the set of periods where Ot takes on some particular value
0 E E as
O0(0)
Then since
E9
=-{t, Ot( 0' ) = 0}.
is finite, we have that with probability one all values of 0 occur infinitely often
Pr(#O00( 0 o) = oo for all 0 eE) = 1.
Hence there exists an event
value, and #00(0o)
0
' such that C/(gw (v*, Ot (0o))) converges to a positive and finite
= oo for all 0 e
e.
Hence gw(v*, Ot( 90))
Since gw(v, 0) is continuous in v, and #00(0o)
converges to a finite value w*.
= oo this implies that gw(w*, 0) = w* for all
0 E 9. This implies that the incentive constraints are not binding at w*, a contradiction.
Hence Pr(C•'"
> 0) = 0 for all v. Therefore for all v, C'(gw(v, 0t)) converges almost surely
to 0. This in turn implies that the stochastic process C'(vt) converges almost surely to 0. This
implies that C'(vt) converges in distribution to 0. Since 0* is an invariant distribution, C'(vt)
is distributed as C(vo). This implies that the distribution of C'(vo) has full mass at zero, i.e.
that 0* has full mass at misery.
2.9
Proof of Proposition 22
We start with two lemmas, and then proceed to prove the proposition.
Lemma 25 The following inequalities hold
i
(1- k'(v)) +
for all 0 E
E,
and y =i (/0)
1 - k'(g'w(0, v)) •
-
where the constants are given by
(1 - k'(v)) +
= (3/3) max {(1 + On
1<n<N
1-
-
E[0 0 5 On ])/On }
min {1 + n-1 - E[9 0 _> On]/n-}
2<n<N
Proof. Consider the program
max1 Pn{uf
-
^\c(us) + Ay(hn) - Onhn +ik(wn)}
n
Vi=Zin(un+IOwn-9nhn)
n
-Onhn
+
Un + Own ý_-Onhn+l + Un+1 + OWn+1 for n = 1,2,..., K - 1,
This problem and its notation require some discussion. We do not incorporate the monotonicity
constraint on h. But this notation allows us to consider bunching in the following way. If any
set of neighboring agents is bunched, then we group these agents under a single index and let
fin be the total probability of this group. Likewise let On represent the conditional average of 9
within this group, which is what is relevant for the promise-keeping constraint and the objective
function. Let On be the shock of the highest agent in the group. The incentive constraint must
rule the highest agent in each group from deviating and taking the allocation of the group above
him.
Of course, every combination of bunched agents leads to a different program. The optimal
allocation of our problem must solve one of these programs with a strictly monotone allocationsince bunching can be characterized by regrouping agents. Thus, below we characterize solutions
to these programs with strict monotonicity of the solution.
The first-order conditions are
pn{Ay'(hn) - On + A6n} - Onis + On-In-1 <• 0
Pnk'(wn) -
3A}
+ /3(/n - An-1) = 0
where, by the envelope condition A = k'(v).
Summing the first-order conditions for hn, we get
A[y'(h(9))] = 1 - k'(v)
Summing up the first-order conditions for wn, we get
[k'(g' (v, ))] =
k'(v)
The first-order conditions for n = 1 imply
AE[y'(he)]
Ay'(hl)
90
(1--A) +-01 01
1
<
-
--
01-A
1i
01
This implies
_l-< 1<-A
1
01
Using
k'(wi) =OA
/3
/3 /LL~
j~
Pu'
we get
k(w1) - /Z
3
1-A
O81
+
(1
)-
/3
= -1 +
Similarly, writing the first-order conditions for n = K, we get
OK-1 /AK-1
(K
PK
Ay' (hK)
AE[c'(he)]
OK
OK
1-A
OK
[
011 01
01~
This implies
1-A
IK-1 >
PK -
-(1 - A)
K-1
OK
1
Using
k'(WK) = OA
+ ,K-1
PK
we get
k' (WK) •5
\ -
(1
OK-1 ++-t(1
1-A
1 -
)-OK =-11
+
+
9 1-K-1
OK
OK1
7
OK-1J
klK(v) +
/3iCK
01K-1
1i
OK-
OK-11
For any n, WK 5 Wn < WI,
1
k(v)+
+9-Or1
1]
-/
[O1
01 - 0
- k'(ws)
k1(n
['_1+
-
k'(v) +
OK-
1 +O-
OK-1
[OK
OK-1
1
OK-i ]
After rearranging, we obtain
+9
1
(i1
(-
k'(v)) +1
_
> 1 - k' (gw (0, v))
/3 [
1
ii +OK-1
/3L
-
OK ](1(1i - k'(v)) + 1
K-i1
OK-1
/3.
By Lemma 20 we have that lim-,_k'(v) = 1, then using the bounds we obtain that
lim k'(gw (v, 9))
V---V
for all 0 e
=
/3<
1 = V---*V
lim k'(v),
E.
The following lemma describes the behavior of the optimal allocation when v goes to v.
Lemma 26 We have gu(v, 0) > u and limgu(v, 0)
V-V
98
=
u, limgh(v, 0) = co
V--*V
Proof. Consider the program
maxZ
pn{un - \c(un) + Ay(hn) - Onhn + Sk(w~)}
U'w
n
V = EN(Un+I3Wn - Onhn)
n
-Onhn + un 1+ Own
-On•n+l + Un+1 + 8wn+l for n= 1, 2,..., K - 1,
The first order condition for un is 1- 5 c'(un)
=
k'(wn). Hence 1- ýd(g"(v,0))
Since k'(gw(v, 0)) < a, we have Un > u.Moreover, since limk'(v, 0)
v-v
--_
limc'(gu(v, 0)) = 0
or equivalently
v---v
=
=
k'(g'w(v, 0)).
,we have
limg' (v,0)= U
V-_V
That lim gh (v, ) = co follows from
v--+v
\[y'(gh(v,
0))] = 1 - k'(v)
and 1)---*V
limk'(v) = 1. *
Since the derivative k'(v) is continuous and strictly decreasing, we can define the transition
function
Q(x,
) = k'(gw((k')-l(x), 6))
for all x < 1 if utility is unbounded below. For any probability distribution A, let TQ(p) be the
probability distribution defined by
TQ(it)(A)
for any Borel set A. Define
J
1 {Q(x,O)EA}
dp(x) dp (0)
To +T,2 + +T
TQ'nQn
For example, TQ,n(Jx) is the empirical average of {k'(vt)}n=1 over all histories of length n starting
with k'(vo) = x. The following lemma establishes the existence of an invariant distribution by
considering the limits of {TQ,n }.-
We are now able to prove a proposition that implies the first part Proposition ??existence,
and describes an algorithm to construct an invariant distribution.
Proposition 27 For each x < 1 there exists a subsequence {TQ,(n) (6x)} that converges weakly,
i.e. in distribution, to an invariant distribution on (-oo, 1) under Q.
Proof. For all 0 e
e
lim
xT1
Q (x, 9)
=
lim k'(gw(0,v)) =-
v--+-oo
< 1.
Note that we have a continuous transition function Q(x, 0): (-oo, 1) x
E --+ (-oo, 1).
We next show that the sequence {Tý(bx)} is tight, in that for any e > 0 there exists a
compact set Ke such that T6(6x)(Ke) _ 1 - e, for all n. The expected value of the distribution T6(6x) is simply El[k'(vt(0t-1))] with x = k'(vo) < 1. Recall that El[k'(vt(9- 1 ))]
(f/I)tk'(vo) --+ 0. This implies that
min{0, k'(vo) } 5 El[k'(vt(Ot-1))]
<5 T (x)(-oo, -A)(-A) + (1 - T6(6x)(-oo, -A))1
for all A > 0. Rearranging,
T6 (6=)(-oo, -A) <
1 - min{0,
+ 1 x}
A
Hence we can find Ae > 0 such that
T6(Jx)(-oo, -Ae) <
Define a, by
1-a,=
sup Q(x,O)
xE[Aý,1)
OEE
100
Since for all 0 e
e, Vlim k'(v, 0) <
+-00
< 1, we have a, > 0. In addition, for all n > 1,
Tn(6J) = TQ(Tý-'(6x)), so that
T (Jx)(1 - aEl 1) _T< (Jx)(-oo, AE) <
Since we also have
TQ(6)(-00,-Ac) < 2
this implies
Tý(6x)[Ae, 1 - ael] > e
Taking Ke = [Ae, 1 - ae], this implies that {T0(6)}n>1l is tight, and therefore {Tý(6x)}n>0,
is tight.
Tightness implies that there exists a subsequence TQ(n)(6x) that converges weakly, i.e. in
distribution, to some probability distribution
on (-oo, 1). Since Q(x, 0) is continuous in x,
7r
then TQ(TO(n)(6x)) converges weakly to TQ(r). But the linearity of TQ implies that
TQ(T(n)(6))
TQ(n)+(6x)
O(n)- TQ(x)
and since O(n) -+ oo we must have TQ(ir) =
7r.
T(n)(6 )
*
Note that for any invariant distribution 7r, TQ (r)=
7r
implies that the support of
7r
is
contained in (-oo, 4]. This proves the second part of Proposition 22. We finally prove a lemma
that implies the last part Proposition ??existence.
Lemma 28 Suppose that limu-, sup c"(u)/c'(u) < oo. Then any invariant distribution Vb
necessarily has a support bounded away from U.
Proof. We will make use of two possible state variables. The first state variable is the
natural one: v, promised future utility. The other one is utility attained by the previous
generation u.
Indeed, from the first order conditions, it is easy to see that these two state
variables are related by
1 - Ac'(u_) = -k'(v)
p
101
The existence of an invariant distribution
*(v) with not mass at misery is equivalent to the
existence of an invariant distribution ý*(u_) with no mass at misery.
Let xe = uO + Owo. Then we can rewrite the Bellman equation (2.21) as
k(v) = sup E[ue - Ac(ue) - Oho + Ay(he) + /k(wo)]
subject to
v = E[xe - Oho]
Xo - Oho _ x0 , - Oho,
Uo + 3WO = Xo
Hence, given a value x for xo, uo and wo are given by the sub-program
max u - Ac(u) + 0 k(w)
subject to
u+ Ow= x
The solution is given by the first order condition
-ýc'(u) = -kl(,
)=0
Using the implicit function theorem, we can then compute
-
k"(_
)
du
_
dx
-• k"(~) + ^c"(u)
_
Hence
du
d-•x -
This in turn implies that there exists M > 0 such that
max luo, - uol < M max ho
0
0,0'
102
Consider the program (2.9). The first order condition for hn is
Pn{jAy'(hn) - On + A6n} - Onfn + On-lnn-1 < 0
where A = k'(v). This implies that
0
y'(h40) > Z(1 - k'()v)
This shows that
lim y'(ho(v)) = co
Slim h_(v) = 0
v -T-7
v--4V
and since ho has is decreasing in 8,
lirm ho(v)
= 0 for all 0 c
E
V-4-u
The first order condition(2.32) implies that
/3
~-1
c'(u_) > -C'(Uo)
-A
(~-1)
^
~~1(•)/3
-
> C'(U0_)
which can be rewritten as
C'(u)
13
This in turn implies that for all 0
(
EE
exp (M max he max
0
+J ý-1
c'(u)
uE[U-,ujm
c
0-c'(u_)
+ •-
+~
1--A2c/(uo)
~~
•
Since
lim ho(v) = 0 for all 0 E E
V--4V
we have
lim exp (M max he max
0
uE [uT,uj
_ --(
103
c"(u)
C, (U)
= 1 for all 0 E O
c0
This in turn proves that for u_ high enough, all the policy functions up are such that uo < u-.
Hence any invariant distribution
* necessarily has a support bounded away from U. This is
equivalent to saying that any invariant distribution V/necessarily has a support bounded away
from T. m
104
Bibliography
[1] Albanesi, Stefania and Christoper Sleet, "Dynamic Optimal Taxation with Private Information," 2004. mimeo. 2, 4, 11, 12, 15
[2] Amador, Manuel, George-Marios Angeletos, and Iv'an Werning, "Corrective vs. Redistributive Taxation," 2005. work in progress. 11
[3] Atkeson, Andrew and Robert E. Lucas Jr., "On Efficient Distribution with Private Information," Review of Economic Studies, 1992, v59(3), 427-453. 3, 15
[4] Atkeson, Andrew and Robert Lucas, "Efficiency and Equality in a Simple Model of Efficiency
Unemployment Insurance," Journal of Economic Theory, 1995, 66, 64-88. 2, 15
[5] Atkinson, A.B. and J.E. Stiglitz, "The Design of Tax Structure: Direct vs. Indirect Taxation,"Journal of Public Economics, 1976, 6, 55-75. 2, 8
[6] Diamond, Peter A., "Optimal Income Taxation: An Example with a U-Shaped Pattern of
Optimal Marginal Tax Rates," American Economic Review, 1998, 88 (1), 83-95. 4
[7] Ebert, Udo, "A reexamination of the optimal nonlinear income tax," Journal of Public
Economics, 1992, 49 (1), 47-73. 4
[8] Farhi, Emmanuel and Ivan Werning, "Inequality, Social Discounting and Progressive Estate
Taxation," NBER Working Papers 11408 2005. 4, 12, 13, 16
[9] Farhi, Emmanuel, Narayana Kocherlakota, and Ivan Werning, 2005. work in progress. 4, 8,
16, 18
105
[10] Golosov, Mikhail, Aleh Tsyvinski, and Iv'an Werning, "New Dynamic Public Finance: A
User's Guide," forthcoming in NBER Macroeconomics Annual 2006, 2006. 4
[11] Golosov, Mikhail,Narayana Kocherlakota, and Aleh Tsyvinski, "Optimal Indirect and Capital Taxation," Review of Economic Studies, 2003, 70 (3), 569-587. 4
[12] Kocherlakota, Narayana, "Zero Expected Wealth Taxes: A Mirrlees Approach to Dynamic
Optimal Taxation," 2004. mimeo. 4, 16
[13] Mirrlees, James A., "An Exploration in the Theory of Optimum Income Taxation," Review
of Economic Studies, 1971, 38 (2), 175-208. 2, 4
[14] Saez, Emmanuel, "Using Elasticities to Derive Optimal Income Tax Rates," Review of
Economic Studies, 2001, 68 (1), 205-29. 4
[15] Seade, Jesus, "On the Sign of the Optimum Marginal Income Tax," Review of Economic
Studies, 1982, 49 (4), 637-43. 4
[16] Tuomala, Matti, Optimal Income Taxation and Redistribution, Oxford University Press,
Clarendon Press, 1990. 4
[17] Seade, Jesus, "On the Sign of the Optimum Marginal Income Tax," Review of Economic
Studies, 1982, 49 (4), 637-43. 4
106
Chapter 3
Saving and Investing for Early
Retirement: A Theoretical Analysis 1
Introduction
Two years ago, when the stock market was soaring, 401(k)'s were swelling and
(...)
early retirement seemed an attainable goal. All you had to do was invest
that big job-hopping pay increase in a market that produced double-digit gains like
clockwork, and you could start taking leisurely strolls down easy street at the ripe
old age of, say, 55. (Business Week December 31, 2001)
The dramatic rise of the stock market between 1995 and 2000 significantly increased the
proportion of workers opting for early retirement (Gustman and Steinmeier, 2002). The above
' This chapter is the product of a joint work with Stavros Panageas. E. Farhi: MIT Department
of Economics, 50 Memorial Drive, Cambridge MA 02142. S. Panageas: The Wharton School,
Univ. of Pennsylvania, Finance Department, 2326 Steinberg Hall-Dietrich Hall 3620 Locust Walk,
Philadelphia, PA 19104. Contact: panageas~wharton.upenn.edu. Tel: 215 746 0496. We would
like to thank Andy Abel, George Marios Angeletos, Olivier Blanchard, Ricardo Caballero, John
Campbell, George Constantinides (the discussant at Frontiers of Finance 2005), Domenico Cuoco,
Peter Diamond, Phil Dybvig, Jerry Hausman, Hong Liu, Anna Pavlova, Jim Poterba, John Rust,
Kent Smetters, Nick Souleles, Dimitri Vayanos, Luis Viceira, Ivan Werning, Mark Westerfield,
and seminar participants at MIT, Helsinki School of Economics and Swedish School of Economics
(Hagen), Univeristy of Houston, University of Southern California, Wharton, and participants of
the Frontiers of Finance and CRETE 2005 conferences for useful discussions and comments. We
would also like to thank Jianfeng Yu for excellent research assistance. All errors are ours.
107
ý
ý
quote from Business Week demonstrates the rationale behind the decision to retire early: A
booming stock market raises the amount of funds available for retirement and allows a larger
fraction of the population to exit the workforce prematurely.
Indeed, for most individuals, increasing one's retirement savings seems to be one of the
primary motivations behind investing in the stock market. Accordingly, there is an increased
need to understand the interactions among optimal retirement, portfolio choice, and savings,
especially in light of the growing popularity of 401(k) retirement plans. These plans give
individuals a great amount of freedom when determining how to save for retirement. However,
such increased flexibility also raises concerns about the extent to which agents' portfolio and
savings decisions are rational. Having a benchmark against which to determine the rationality
of people's choices is crucial for both policy design and in order to form the basis of sound
financial advice.
In this paper we develop a theoretical model with which we address some of the interactions
among savings, portfolio choice, and retirement in a utility maximizing framework. We assume
that agents face a constant investment opportunity set and a constant wage rate while still
in the workforce. Their utility exhibits constant relative risk aversion and is nonseparable in
leisure and consumption. The major point of departure from preexisting literature is that we
model the labor supply choice as an optimal stopping problem: An individual can work for a
fixed (nonadjustable) amount of time and earn a constant wage but is free to exit the workforce
(forever) at any time she chooses. In other words, we assume that workers can work either full
time or retire. As such, individuals face three decision problems: 1) how much to consume, 2)
how to invest their savings, and 3) when to retire. The incentive to quit work comes from a
discrete jump in their utility due to an increase in leisure once retired. When retired, individuals
cannot return to the workforce. 2 We also consider two extensions of the basic framework. In
the first extension we disallow the agent from choosing retirement past a pre-specified deadline.
In a second extension we disallow her from borrowing against the net present value (NPV) of
her human capital (i.e., we require that financial wealth be nonnegative).
The major results that we obtain can be summarized as follows:
2
This assumption can actually be easily relaxed. For instance, we could assume that retirees can return to
the workforce (at a lower wage rate) without affecting any of the major predictions of the model.
108
First, we show that the agent will enter retirement when she reaches a certain wealth
threshold, which we determine explicitly. In this sense, wealth plays a dual role in our model:
Not only does it determine the resources available for future consumption, but it also controls
the "distance" to retirement.
Second, the option to retire early strengthens the incentives to save compared to the case in
which early retirement is not allowed. The reason is that saving not only increases consumption
in the future but also brings retirement "closer." Moreover, this incentive is wealth dependent.
As the individual approaches the critical wealth threshold to enter retirement, the "option"
value of retiring early becomes progressively more important and the saving motive becomes
stronger.
Third, the marginal propensity to consume (MPC) out of wealth declines as wealth increases
and early retirement becomes more likely. The intuition is simple: An increase in wealth will
bring retirement closer, therefore decreasing the length of time the individual remains in the
workforce. Conversely, a decline in wealth will postpone retirement. Thus, variations in wealth
are somewhat counterbalanced by the behavior of the remaining NPV of income and in turn
the effect of a marginal change in wealth on consumption becomes attenuated. Once again this
attenuation is strongest for rich individuals who are closer to their goal of early retirement.
Fourth, the optimal portfolio is tilted more towards stocks compared to the case in which
early retirement is not allowed. An adverse shock in the stock market will be absorbed by
postponing retirement. Thus, the individual is more inclined to take risks as she can always
postpone her retirement instead of cutting back her consumption in the event of a declining
stock market. Moreover, in order to bring retirement closer, the most effective way is to invest
the extra savings in the stock market instead of the bond market.
Fifth, the choice of portfolio over time exhibits some new and interesting patterns. We show
that there exist cases in which an agent might optimally increase the percentage of financial
wealth that she invests in the stock market as she ages (in expectation), even though her income
and the investment opportunity set are constant. This result obtains, because wealth increases
over time and hence the option of early retirement becomes more relevant. Accordingly, the
tilting of the optimal portfolio towards stocks becomes stronger. Indeed, as we show in a calibration exercise, the model predicts that, prior to retirement, portfolio holdings could increase,
109
especially when the stock market exhibits extraordinary returns as it did in the late 1990s during which time many workers experienced rapid increases in wealth, and allow the individual to
opt for an earlier retirement date. In fact our model suggests a possible partial rationalization
of the (apparently irrational) behavior of individuals who increased their portfolios as the stock
market was rising and then liquidated stock as the market collapsed. 3
This paper is related to a number of strands in the literature that are surveyed in Ameriks
and Zeldes (2001) and Jagannathan and Kocherlakota (1996). The paper closest to ours is that
of Bodie, Merton, and Samuelson (1992) (henceforth BMS). The major difference between BMS
and this paper is the different assumption we make about the ability of agents to adjust their
labor supply. In BMS, labor can be adjusted in a continuous fashion. However, a significant
amount of evidence suggests that labor supply is to a large extent indivisible. For example, in
many jobs workers work either full time or they are retired. Moreover, it appears that most
people do not return to work after they retire, or if they do, they return to less well-paying jobs
or they work only part time. As BMS claim in the conclusion of their paper,
Obviously, the opportunity to vary continuously one's labor without cost is a far
cry from the workings of actual labor markets. A more realistic model would allow
limited flexibility in varying labor and leisure. One current research objective is to
analyze the retirement problem as an optimal stopping problem and to evaluate the
accompanying portfolio effects.
This is precisely the direction we take here. There are at least two major directions in which
our results differ from BMS. First, we show that the optimal retirement decision introduces a
nonlinear option-type element in the decision of the individual that is entirely absent if labor
is adjusted continuously. Second, the horizon and wealth effects on portfolio and consumption
choice in our paper are fundamentally different than those in BMS. For instance, stock holdings
in BMS are a constant multiple of the sum of (financial) wealth and human capital. This
multiple is not constant in our setup, but instead depends on wealth.4 Third, the model we
present here allows for a clear way to model retirement, which is difficult in the literature that
3Some (indirect) evidence to this fact is given in the August 2004 Issue Brief of the Employee Benefit Research
Institute (Fig. 2 - based on the EBRI/ICI 401(k) Data).
If we impose a retirement deadline, this multiple also depends on the distance to this deadline.
110
allows for a continuous labor-leisure choice. An important implication is that in our setup,
we can calibrate the parameters of the model to observed retirement decisions. In the BMS
framework, on the other hand, calibration to microeconomic data is harder because individuals
5
do not seem to adjust their labor supply continuously.
The model is also related to a strand of the literature that studies retirement decisions. A
partial listing includes Stock and Wise (1990), Rust (1994), Laezar (1986), Rust and Phelan
(1997), and Diamond and Hausman (1984). Most of these models are structural estimations
that are solved numerically. Here our goal is different: Rather than include all the realistic
ramifications that are present in actual retirement systems, we isolate and very closely analyze
the new issues introduced by the indivisibility and irreversibility of the labor supply - retirement
decision on savings and portfolio choice. Naturally, there is a trade-off between adding realistic
considerations and the level of theoretical analysis that we can accomplish with a more complicated model. Other studies in this literature include Sundaresan and Zapatero (1997), who
study optimal retirement, but in a framework without disutility of labor, and Bodie, Detemple,
Otruba and Walter (2004), who investigate the effects of habit formation, but without optimal
retirement timing.
Some results of this paper share similarities with results that obtain in the literature on
consumption and savings in incomplete markets. A highly partial listing includes Viceira (2001),
Chan and Viceira (2000), Campbell, Cocco, Gomes and Maenhout (2001), Kogan and Uppal
(2001), Duffie, Fleming, Soner and Zariphopoulou (1997), Duffie and Zariphopoulou (1993), Koo
(1998), and Carroll and Kimball (1996) on the role of incomplete markets and He and Pages
(1993) and El Karoui and Jeanblanc-Pique (1998) on issues related to the inability of individuals
to borrow against the NPV of their future income. This literature provides insights on why
consumption (as a function of wealth) should be concave, and also offers some implications on
portfolio choice. However, while in the incomplete markets literature, the results are driven by
the inability of agents to effectively smooth their consumption due to missing markets, 6 in this
5
Liu and Neiss (2002) study a framework similar to BMS, but force an important constraint on the maximal
amount of leisure. This, however, omits the issues related to indivisibility and irreversibility, which as we show
lead to fundamentally different implications for the resulting portfolios.
In sum, the fact that labor supply
flexibility is modeled in a more realistic way allows a closer mapping of the results to real-world institutions than
is allowed for by a model that exhibits continuous choice between labor and leisure.
6Chan and Viceira (2000) combines insights of both literatures. However, they assume labor- leisure choices
that can be adjusted continuously.
111
paper the results are driven by an option component in an agent's choices that is related to the
ability of agents to adjust their time of retirement.
Throughout the paper we maintain the assumption that agents receive a constant wage.
This is done not only for simplicity, but more importantly because it makes the results more
surprising. It is well understood in the literature 7 that allowing for a (positive) correlation
between wages and the stock market can generate upward-sloping portfolio holdings over time.
What we show is that optimal retirement choice can induce observationally similar effects even
when labor income is perfectly riskless. Since the argument and the intuition for this outcome
are orthogonal to those in existing models, we prefer to use the simplest possible setup in every
other dimension, thereby isolating the effects of optimal early retirement.
Technically, our model extends methods proposed by Karatzas and Wang (2000) (who do
not allow for income) to solve optimal consumption problems with discretionary stopping. The
extension that we consider in Section 3.3 uses ideas proposed by Barone-Adesi and Whaley
(1987), and in Section 3.5, we extend the framework in He and Pages (1993) to allow for early
retirement.
Finally, three papers that present parallel and independent work on similar issues are
Lachance (2003), Choi and Shim (2004), and Dybvig and Liu (2005). Lachance (2003) and
Choi and Shim (2004) study a model with a utility function that is separable in leisure and
8
consumption, but that abstracts from a deadline for retirement and/or borrowing constraints.
The somewhat easier specification of separable utility does not allow consumption to fall upon
retirement as we observe in the data. Technically, these papers solve the problem using dynamic programming rather than convex duality methods, which cannot be easily extended to
models with deadlines, borrowing constraints, etc. Our approach overcomes these difficulties.
Dybvig and Liu (2005) study a very similar model to that in Section 3.5 of this paper, with
similar techniques. However, they do not consider retirement prior to a deadline as we do. A
deadline makes the problem considerably harder (since the critical wealth thresholds become
time dependent). Nonetheless, we are able to provide a fairly accurate approximate closed-form
solution for this problem in Section 3.3. One can actually perform simple exercises that demon7
See, e.g., Jagannathan and Kocherlakota (1996) and BMS.
SAnother model that makes similar assumptions is that of Kingston (2000).
112
strate that in the absence of a retirement deadline, the model-implied distribution of retirement
times becomes implausible. Most importantly, compared to the papers above, the present paper
goes into significantly greater detail in terms of the economic analysis and implications of the
results. In particular, we provide applications (like the analysis of portfolios of agents saving for
early retirement in the late 1990s) that demonstrate quite clearly the real-world implications
of optimal portfolio choice in the presence of early retirement.
The structure of the paper is as follows: Section 3.1 contains the model setup. In Section
3.2 we describe the analytical results if one places no retirement deadline. Section 3.3 contains
an extension to the case in which retirement cannot take place past a deadline, Section 3.4
contains some calibration exercises, and Section 3.5 extends the model by imposing borrowing
constraints. Section 3.6 concludes. We present technical details and all proofs in the Appendix.
3.1
3.1.1
Model setup
Investment opportunity set
The consumer can invest in the money market, where she receives a fixed, strictly positive
interest rate r > 0. We place no limits on the positions that can be taken in the money market.
In addition, the consumer can invest in a risky security with a price per share that evolves
according to
ddPt
Pt = Ipdt + adBt,
Pt
where , > r and a > 0 are known constants and Bt is a one-dimensional Brownian motion on a
complete probability space (Q, F, P).' We define the state-price density process (or stochastic
discount factor) as
H(t) = E(t)Z*(t),
9
H(0) = 1,
We shall denote by F = {Ft} the P-augmentation of the filtration generated by Bt.
113
where E(t) and Z*(t) are given by
-rt
dB --
Z*(t) = exp
2t ,
Z*(0) = 1
and a is the Sharpe ratio
01
As is standard, these assumptions imply a dynamically complete market (Karatzas and
Shreve, 1998, Chapter 1).
3.1.2
Portfolio and wealth processes
An agent chooses a portfolio process 7rt and a consumption process ct > 0. These processes are
progressively measurable and they satisfy the standard integrability conditions given in Karatzas
and Shreve (1998) Chapters 1 and 3. The agent also receives a constant income stream yo while
she works and no income stream while in retirement. Retirement is an irreversible decision. We
assume until Section 3.3 that an agent can retire at any time she chooses.
The agent is endowed with an amount of financial wealth W 0 > -L.
The process of stock-
holdings lrt is the dollar amount invested in the risky asset (the "stock market") at time t. The
amount We - irt is therefore invested in the money market. Short selling and borrowing are
both allowed. We place no extra restrictions on the (financial) wealth process Wt until Section
3.5 of the paper. Additionally, in Section 3.5 we will impose the restriction Wt Ž 0. As long as
the agent is working, the wealth process evolves according to
dWt =-rt {idt + adBt} + {Wt - rt} rdt - (ct - yo) dt.
(3.1)
Applying Ito's Lemma to the product of H(t) and W(t), integrating, and taking expectations, we get for any stochastic time r that is finite almost surely
E (H(-r)W(r) +
j
H(s) [c(s) - yo] ds <• Wo.
(3.2)
This is the well-known result that in dynamically complete markets one can reduce a dy114
namic budget constraint of the type in Eq. (3.1) to a single intertemporal budget constraint
of the type in Eq. (3.2). If the agent is retired, the above two equations continue to hold with
Yo = 0.
3.1.3
Leisure, income, and the optimization problem
To obtain closed-form solutions, we assume that the consumer has a utility function of the form
1 (11-a&t*7
a•1-7•*
t )~
t
__
U(lt, ct)
a1
_ _
1 - 7*
(3.3)
, 7* >0,
where ct is per-period consumption, It is leisure, and 0 < a < 1. We assume that the consumer
is endowed with 1 units of leisure. Leisure can only take two values, 11 or 7: If the consumer is
working, it = 11; if the consumer is retired It = 7. We assume that the wage rate w is constant,
so that the income stream is yo = w(l - 11) > 0. We normalize 11 = 1. Note that this utility
is general enough so as to allow consumption and leisure to be either complements (7* < 1) or
substitutes (7* > 1). The consumer maximizes expected utility
max E
Ct, rt,'t
[f'
e-tU(1
1 , ct)dt + e
r o
e-p(t-r)U(1, ct)dt ,
(3.4)
where 3 > 0 is the agent's discount factor. 10 The easiest way to proceed is to start backwards
by solving the problem
U2 (Wr) = maxE
CtI t
LirT
e-[e(t-r)U(1,ct)dt ,
where U2 (Wr) is the value function once the consumer decides to retire and W, is the wealth
at retirement. By the principle of dynamic programming we can rewrite (3.4) as
max E
CtWr,T
10
Ir
hW
e-OtU(ll,ct)dt + e-'U 2(Wr)
1
.
(3.5)
By standard arguments the constant discount factor 0 could also incorporate a constant hazard rate of death,
A.
115
It will be convenient to define the parameter 7 as
-Y= 1 - a(1 - 7*)
so we can then reexpress the per-period utility function as
1-9'
c-.y
U(l, c) = l(1-)(1-7*) 1- 7
"
Since we have normalized I = 1 prior to retirement, the per-period utility prior to retirement
is given by
Uli(c) = U(1,c) =
-
(3.6)
Notice that -7> 1 if and only if 7y* > 1, and 7-y
< 1 if and only if y7*< 1. Under these assumptions,
it follows from standard results (See, e.g., Karatzas and Shreve, 1998, Chapter 3), that once in
retirement, the value function becomes
U2 (W)
(3.7)
()
where
---
r+-
+-.
In order to guarantee that the value function is well defined, we assume throughout that11
S> 0 and 3 - r <
..
12 It will be convenient to redefine the continuation value function as
W -"
U2 (W.) = K 1-7 ,
where
K=
(11ac)1--y*
(1)>1
(3.8)
"Observe that this is guaranteed if -y > 1.
12As we show in the Appendix, this will guarantee that retirement takes place with probability one in this
stochastic setup.
116
Since 7 > 11 = 1, it follows that
1
K
K
3.2
>
1
<
1
-if- < 1
0
1
- if 7 > 1.
(3.9)
(3.10)
Properties of the solution
Theorem 31 in the Appendix presents a formal solution to the problem. The nature of the
solution is intuitive: The agent enters retirement if and only if the level of her assets exceeds
a critical level W, which we analyze more closely in Subsection 3.2.1. As might be expected,
another feature of the solution is that the agent's marginal utility of consumption equals the
stochastic discount factor, both pre- and post-retirement (up to a constant A*, which depends
on the wealth of the agent at time 0 and is chosen so that the intertemporal budget constraint
is satisfied):
e-PtUc(It,ct) = e-Otl( 1- a)(1- y *)c -7 = A*H(t).
(3.11)
This is just a manifestation of the fact that the market is dynamically complete. 13 Importantly,
the marginal utility of consumption is continuous when the agent enters retirement. This is
a consequence of a principle in optimal stopping that is known as "smooth pasting," which
implies that the derivative of the value function Jw is continuous. When smooth pasting is
combined with the standard envelope theorem, that is,
Uc(lt, ct) = Jw,
(3.12)
it follows that the marginal utility of consumption (Uc) is continuous. A consequence of the
continuity of the marginal utility of consumption is that consumption itself will jump when the
agent enters retirement. This is simply because consumption needs to "counteract" the discrete
change in leisure, which enters the marginal utility of consumption in a nonseparable way. The
jump is given by
cr+
cT.
= (_
(1
a
)
-
K1/tO.
3monograph
See the
of Karatzas andCShreve (1998), Chapters 3 and 4.
13See the monograph of Karatzas and Shreve (1998), Chapters 3 and 4.
117
(3.13)
Notice that y* > 1 will imply a downward jump and 7* < 1 an upward jump (since 1 > 1).
For the empirically relevant case (-y* > 1), the model predicts a downward jump in consumption,
consistent with the data.
In the next three subsections we explore some properties of the solution in more detail.
The benchmark model against which we compare our results is a model in which there is a
constant labor income stream and no retirement (the worker works forever). This is the natural
benchmark for this section, since it keeps all else equal except for the the option to retire. The
results we obtain in this section allow us to isolate insights related to optimal retirement in
a framework in which solutions are not time dependent and therefore are easier to analyze.
Fortunately, all of the results continue to hold when we introduce a retirement deadline in
Section 3.3, in which case the natural benchmark model will be one in which the agent is forced
to work for a fixed amount of time, which is more natural.
3.2.1
Wealth at retirement
For a constant (2, where
1- 2
- V(1 - 2 -- ) 2 + 8J
2
S2 =
Theorem 31 gives wealth at retirement as
1
=
(ý2
- 1)K~
1 0
yo
(3.14)
(K0 - 1 r
1 +- 21)
As Theorem 31 asserts, for wealth levels higher than W, it is optimal to enter retirement,
whereas for lower wealth levels, it is optimal to remain in the workforce. In the Appendix
we show that W is strictly positive, i.e., a consumer will never enter retirement with negative
wealth since there is no more income to support post-retirement consumption. It is clear
that the critical wealth level W does not depend on the initial wealth of the consumer. The
discrete decision between work and retirement is a choice variable that by standard dynamic
programming depends only on the current state variable of the system, namely, the current
level of wealth. If the current level of wealth is above W, retirement is triggered, otherwise it
118
is not. However, it is still true that agents who start life with a higher level of wealth are more
likely to hit the retirement threshold in a shorter amount of time. Potentially, an agent might
be born with a wealth level sufficiently above W that she can retire immediately.
To understand the forces behind the determination of W, we sketch the basic idea behind the
derivation of (3.14). The Appendix demonstrates that one can reduce the entire consumption portfolio - retirement timing problem to a standard optimal stopping problem. After reducing
the problem to an optimal stopping problem, one can use well-known intuitions from option
valuation. Specifically, retirement can be viewed as an American-type option that allows one
to exchange the value of future income with the extra leisure that is brought about by retiring.
In this section the option has infinite maturity; in Section 3.3 its maturity is finite. The key
idea behind transforming the problem into a standard optimal timing problem is not to use
the level of wealth as the state variable, but instead its marginal value, Jw. We can see the
advantage of doing so by combining (3.11) and (3.12) to obtain
Jw =-
eftH(t).
The right-hand side of this equation is exogenous (up to a constant that is chosen so as to satisfy
the intertemporal budget constraint). By contrast, the evolution of wealth itself depends on
both optimal consumption and portfolio choice, and thus both the drift and the volatility of
the wealth process are endogenous.
The next step is to define Zt = Jw
-
A*eItH(t) as a state variable and pose the entire
problem as a standard optimal stopping problem in terms of Zt. It is straightforward to show
that
dZt
dt = ( - r) dt. -
zt
dBt,
and thus that Zt follows a standard geometric Brownian motion with volatility equal to the
Sharpe ratio (.) and drift equal to the difference between the discount rate and the interest rate
(0- r). To complete the analogy with option pricing, it remains to determine the net payoff
from exercising the option of early retirement as a function of Zt. The key difficulty in achieving
this is that the cost of forgone income is expressed in monetary terms, while the benefit of extra
leisure is in utility terms.
The Appendix shows that the correct notion of net benefit is the
119
difference in "consumer surplus" enjoyed by an agent who is not working versus someone who
is, assuming that the marginal value of wealth is the same for both. In the Appendix we show
that this computation leads to the following net payoff upon retirement:
Z,
1
0 1-7
(
I0 - 1) Z7A - Y].
(3.15)
r
The term Z, outside the square brackets is equal to the marginal value of wealth upon retirement
and hence transforms monetary units into marginal utility units. The second term inside the
square brackets (-I) is negative, capturing the permanent loss of the net present value of
income, and can be thought of as the "strike" price of the option.
The first term inside square brackets is always positive and captures the payoff of the option.
To analyze this term, it is probably easiest to use the following relation, which we show in the
Appendix,
r
W = K1/'yz I1
so that we can rewrite the payoff once the option is exercised as
Z
[- 7 (K1170 - 1)W _yo
1 - -y K1/70
r
The first term inside the square brackets now has an intuitive interpretation: It is the monetary equivalent of obtaining the leisure level 1. Schematically speaking, going into retirement
SummaW atttefxdcs
the fixed cost r ..Sma
is "as if' the wealth of the agent is increased by 1-.
K
g-YO
Y\Kl/O
rizing, in order to obtain the optimal retirement time, it suffices to solve the optimal stopping
problem
sup E e{&
rZ
[Z1
7 - 1) Zl
-
o]}
There are many direct consequences of this option interpretation. For instance, a standard
intuition in optimal stopping is that an increase in the "payoff" of the option upon exercise will
increase the opportunity cost of waiting and will make the agent exercise the option earlier. A
consequence of Eq. (3.15) is that an increase in yo will reduce the payoff of "immediate exercise"
and hence will push the critical retirement wealth upward. Indeed, Eq. (3.14) demonstrates a
linear relation between W and yo. This homogeneity of degree one shows that one can express
120
the target wealth at retirement in terms of multiples of current income, and suggests the
normalization Yo = 1, which we adopt in all quantitative exercises.
Furthermore, Eq. (3.15) allows us to perform comparative statics with respect to an agent's
disutility of labor. Assume that 7* is kept fixed and is larger than one for simplicity. 14 Assume
now that we increase I so that K 11 7 decreases (by Eq. (3.13) and the fact that I > 1). The
value of immediate exercise in Eq. (3.15) will then increase (since 7y > 1) and the agent will
decide to enter retirement sooner. This is intuitive: For agents who work longer hours for the
same pay, the relative increase in leisure upon retirement (1) is larger and hence retirement is
more attractive, all else equal. Similar comparative statics follow for variations in the relative
importance of leisure in the utility function (a). Another standard intuition from option pricing
is that increases in the volatility of the underlying state variable (the Sharpe ratio n in our case)
will lead to postponement of exercise (retirement in our case).
Interestingly, there is a direct link between the change in consumption upon retirement
and the critical level of wealth, W. Note that by combining (3.13) and (3.14), we obtain the
following relation:
W =
(-2
-
1+ J'Y2
1)c
,
YO(
(3.16)
Thus, assuming 7 > 1, lower threshold levels of the critical wealth will be associated with
larger decreases in consumption. In the empirical literature this correlation between low levels
of wealth at retirement and large decreases in consumption is seen as evidence that workers do
not save enough for retirement. Our rational framework suggests the alternative explanation
that this correlation is simply the result of preference heterogeneity: Agents who value leisure
a lot will be willing to absorb larger decreases in their consumption upon retirement (since
leisure and consumption enter nonseparably in the utility function) and will have lower levels of
retirement-triggering wealth. In option jargon, the payoff of immediate exercise will be too large,
as will the opportunity cost of waiting. Hence, low levels of wealth upon entering retirement
and large decreases in consumption are merely two manifestations of the same economic force,
namely, a stronger preference for leisure.
14
Similar arguments can be given when -y*< 1.
121
3.2.2
Optimal consumption
We concentrate on a consumer with wealth lower than W, that is, a consumer who has an
incentive to continue working. The following proposition characterizes the optimal consumption
behavior of the consumer.
Proposition 29 Assume that Wt < W, so that the agent has not retired yet. Let ct be the
optimal consumption process, and let cB denote optimal consumption in the benchmark model,
in which the consumer has no option of retirement. Then:
i) Consumption prior to retirement is lower compared to the benchmark case: ct < cF.
ii) The marginal propensity to consume out of wealth, 9 , is a declining function of Wt.
By contrast the marginal propensity to consume out of wealth is constant and equal to 0 in the
benchmark case.
The intuition for the first assertion is straightforward: The desire to attain retirement
incentivizes the agent to save and accumulate assets compared to the benchmark case. This
explains part i) of the proposition.
Fig. 1 illustrates Part ii) of the proposition. In the standard Merton (1971) framework (with
or without an income stream), the marginal propensity to consume out of wealth is fixed at 0.
However, in the present model the marginal propensity to consume approaches 0 asymptotically
as wealth goes to the lowest allowable level, namely, -~.
It declines between -2
and W and
then jumps to 9 when wealth exceeds the retirement threshold, W, so that the agent enters
retirement.
The best way to understand why the marginal propensity to consume is not constant, but
rather declining, is to consider the following thought experiment. Suppose that we decrease
the wealth of an agent by an amount x prior to retirement, due, e.g., to an unexpected stock
market crash. In our framework this will have two effects. First, it will reduce the agent's total
resources and hence will lead to a consumption cutback, as in the standard Merton framework.
Second, it will distantiate the agent from the threshold level of wealth that is required to attain
retirement. As a result, the agent can now expect to remain in the workforce longer and thus
she will have the opportunity to partially recoup the loss of x units of wealth by the net present
122
MPC
Merton's
solution
Marginal propensityto
consume out of wealth
9
1
yo
0
'W
W
Retirement threshold
Figure 3-1:
value of the additional income. In short, part of the wealth shock is absorbed by postponing
retirement and thus the effect on consumption is moderate.
Naturally, one would expect this effect to be strongest when the distance to the retirement
threshold is small (i.e., for wealth levels close to W). By contrast, when the option of early
retirement is completely "out of the money" (for instance, when wealth is close to
- 1
), then
marginal changes in wealth will have almost no impact on the net present value of future income
and thus the marginal propensity to consume will asymptote to 0 as Wt
-y-r in Fig. 1.
Of course, if Wt > W, the agent enters retirement and the usual affine relation between
consumption and wealth prevails, as is common in Merton-type setups. The marginal propensity to consume is constant at 0 since all of the adjustment to wealth shocks goes through
consumption.
It is important to note that the key to these results is not the presence of labor supply
flexibility per se, but the irreversibility of the retirement decision along with the indivisibility
of labor supply. To substantiate this claim, assume that the agent never retires and that her
leisure choice is determined optimally on a continuum at each point in time, so that It + ht = 7,
where ht are the hours devoted to work, and the instantaneous income is wht, with w defined as
in Section 3.1.3.15 The solution for optimal consumption that one obtains in such a framework
15See BMS for a proof. Liu and Neis (2002) impose the constraint ht > 0 and obtain different results. It
is interesting to note that in the framework of Liu and Neiss (2002), an individual starts losing labor supply
flexibility as she approaches the constraint ht = 0. Hence, she effectively becomes more risk averse. In our
123
with perfect labor supply flexibility is
(
or
for two appropriate constants C1 and C2 . Notice the simple affine relation between wealth and
consumption. These results show an important direction in which the present model sheds some
new insights, beyond existing frameworks, into the relations among retirement, consumption,
and portfolio choice. In particular, under endogenous retirement, wealth has a dual role. First,
as in all consumption and portfolio problems, it controls the amount of resources that are
available for future consumption. Second, it controls the distance to the threshold at which
retirement is optimal. It is this second channel that is behind the behavior of the marginal
16
propensity to consume that we analyze above.
3.2.3
Optimal portfolio
The following proposition gives an expression for the holdings of stock.
Proposition 30 Priorto retirement, the holdings of stock are given by
1__
a-
r
-1
02
C22
JW (W)
ar
7
1 71
1rt=
(3.17)
The second term in (3.17) is always
i) positive, and
ii) increasing in Wt.
framework this is true only post-retirement. Pre-retirement, the individual exposes herself to more risk because
this is the only way in which she can accelerate retirement. This shows that taking indivisibility and irreversibility
into account, the properties of the solution are fundamentally different.
16
The concavity of the consumption function is also a common result in models that combine non-spanned
income and/or borrowing constraints of the form Wt
> 0 (e.g., Carroll and Kimball, 1996). A quite important
difference between these models and the one we consider here is that in the present model, the effects of concavity
are most noticeable for high levels of wealth and not for low wealth levels. In our model the MPC asymptotes to
Sas Wt --+ -=, and declines from there to the point where W = W. It then jumps back up to 0, reflecting the
loss of the real option associated with remaining in the workforce. By contrast, in models such as Koo (1998) or
Duffie, Fleming, Soner and Zariphopoulou (1997) the MPC is above 0 for low levels of wealth and asymptotes
to 0 as W --* oo. We discuss this issue further in Section 3.5, where we introduce borrowing constraints.
124
w
Figure 3-2:
Post-retirement,the optimal holdings of stock are given by the familiar Merton formula:
7rt =
K 1 Wt
In simple terms, assertion i) in Proposition 30 implies that the possibility of retirement
raises an agent's appetite to take risk (compared to an infinitely lived Merton investor without
the option to retire). The intuition is straightforward: First, the ability to adjust the duration
of work effectively hedges the agent against stock market variations. Second, the possibility
to attain the extra utility associated with more leisure raises the agent's willingness to accept
more risk.
Fig. 2 provides a graphical illustration of the first assertion, sketching the following three
value functions: a) the value function of a Merton problem wherein the agent has to work
forever, b) the value function of an agent who is already retired, and c) the value function of the
problem involving optimal retirement choice. As we can see, the third value function looks like
an "envelope" of the other two functions which are "more" concave. This implies that relative
risk aversion will be lower and hence holdings of stock will be higher in the presence of optimal
retirement choice. The value function of the problem involving optimal choice of retirement
asymptotes to the first value function as Wt
-y-r (i.e., the option of early retirement becomes
completely worthless) and it coincides with the second value function when Wt > W (i.e., when
the agent enters retirement).
125
Assertion ii) in Proposition 30 is driven by a separate intuition. To see why it holds, it
is easiest to think of a fictitious asset, namely, a barrier option that could be used to finance
retirement. This option pays off when the agent enters retirement. Its payoff is given by
W- wNR,
where W is the target wealth given by Eq. (3.14) and WNR is given by the wealth of an agent
without the option to retire, keeping the marginal value of a dollar (Jw) the same across the
two agents. It can be shown that W - WNR is always positive and can be expressed as
W-
wNR
-
-
E
(c~-cR
E
(cR _ cN ) ds +
+Yo)ds =
,
where c• is the consumption of a retiree and cN"R is the consumption of a worker. In other words,
this expression is equal to the difference in the NPV of the consumption streams of a retired
versus a nonretired person plus the net present value of forgone income. Hence, W - WNR is
the extra wealth that is needed to finance retirement.
The second term in (3.17) is just the replicating portfolio of such an option. As for most
barrier options, the replicating portfolio becomes largest when the option gets closer to becoming
exercised, that is, as Wt --+W; this is why assertion ii) holds.
It is interesting to relate the above results to BMS. To do so, we start by normalizing the
nominal stock holdings by Wt, so as to obtain nominal holdings of stock as a function of financial
wealth. This gives the "portfolio" fraction of stocks t '= g, or using (3.17),
Ot
=
0,7
1+
Wt . r) +
(JwJw(Wt)
(W)
C2 - 1 rK
yo 1
o-Wr2
1)]
(2
+ 7
7
1-
(
2 - 1)
1+ 2
It is interesting to note the dependence of these terms on Wt. By fixing
)
Y
and increasing
Wt, one can observe that the first term actually decreases. This is the standard BMS effect:
According to BMS, the allocation to stocks depends on the relative ratio of financial wealth
126
to human capital. If an individual is "endowed" with a lot of human capital compared to her
financial wealth, it is as if she is endowed with a bond (since labor income is not risky). Hence,
she will invest heavily in stocks in order to make sure that a constant fraction of her total
resources (financial wealth + human capital) is invested in risky assets. This key intuition of
BMS explains why the first term declines as Wt increases.
However, in the presence of a retirement option this conclusion is not necessarily true, due
to the second term. To see why, compute
w, and evaluate it around W to obtain, after some
simplifications,
(W)=
1
W
(w)
-
1
aaw(
-+
1
1
()2
)
(()2(-
-W
0(W)
KO- 1
K1 0
1
1)K+
2
1-7
The first term is clearly negative and captures the increase in the denominator of
=
.
The second term is positive and potentially larger than the first term, depending on parameters,
and captures the increase in the likelihood that the option of retirement will be exercised. Hence,
for values of Wt close to W, it is possible that Ow > 0. One can easily construct numerical
examples whereby this is indeed the case.
It is noteworthy that this result is driven by the option elements introduced by the irreversibility of the retirement decision and not by labor supply flexibility per se. Indeed, one can
show (using the methods in BMS) that allowing an agent to choose labor and leisure freely on
a continuum would result in
Irt =Wt + r (a- - 1 ) 7*
This implies that 4 would have to be decreasing in Wt, despite the presence of labor supply
flexibility. The reason for these differences is that in BMS, the amount allocated to stocks as
a fraction of total resources (financial wealth + human capital) is a constant. In our framework this fraction depends on wealth. Wealth controls both the resources available for future
consumption and the likelihood of "exercising" the real option of retirement.
In summary, not only does the possibility of early retirement increase the incentive to save
more, it also increases the incentive of the agent to invest in the stock market because this is
the most effective way to attain this goal. Furthermore, this incentive is strengthened as an
127
individual's wealth approaches the target wealth level that triggers retirement.
3.3
Retirement before a deadline
None of the claims made so far rely on restricting the time of retirement to lie in a particular
interval. The exposition above is facilitated by the infinite horizon setup, which allows for
explicit solutions to the associated optimal stopping problem. However, the trade-off is that
in the infinite horizon case, there is no notion of aging, since time plays no explicit role in
the solution. Moreover, the "natural" theoretical benchmark for the model in the previous
section is one without retirement at all. In this section we are able to extend all the insights
of the previous section by comparing the early retirement model to a benchmark model with
mandatory retirement at time T, which is more natural.
Formally, the only modification that we introduce in this section compared to Section 3.1
is that Eq. (3.5) becomes
max
iE
e-(s-t)U(lj,
c,)ds + e
2('T-t)U2(WAT)
,
(3.18)
where T is the retirement deadline and -rA T is shorthand notation for min{r, T}.
The Appendix presents the solution to the above problem in Theorem 35. As might be
expected, one needs to use some approximate method to obtain analytical solutions, because
now the optimal stopping problem is on a finite horizon. 17 The extended appendix discusses the
nature of the approximation and examines its performance against consistent numerical methods
to solve the problem.1 8
One can easily verify that the formulas for optimal consumption,
portfolio, etc. are identical to the respective formulas of Theorem 31 (the sole exception being
that the constants are modified by terms that depend on T - t). As a result, all of the analysis
in Section 3.2 carries through to this section. This is particularly true for the dependence of
consumption, portfolio, etc. on wealth. Here we focus only on the implications of the model
17
An important remark on terminology: The term "finite horizon" refers to the fact that the optimal stopping
region becomes a function of the deadline to mandatory retirement. The individual continues to be infinitely
lived.
18 The basic idea behind the approximation is to reduce the problem to a standard optimal stopping problem and
use the same approximation technique as in Barone-Adesi and Whaley (1987). The most important advantage
of this approximation is that it leads to very tractable solutions for all quantities involved.
128
for portfolio choice as a function of age. The results for consumption are similar.
By Theorem 35 in the Appendix, the optimal holdings of stock as a fraction of financial
wealth are given as
t=
a- 7
( + yo 1 - e-r(T-t)
Wt
or
X2(T-t)
+ (2(T-t)
-
+
K Yo
(1 -
Jw (We, T - t) \
Jw (WT-t, T - t)
er(T-t))
Wt
r
(•
1))
-
2 (T-t)
1
(319)
-
where WT-t is the critical threshold that leads to retirement when an agent has T - t years
to mandatory retirement and the constants (2(T-t) are given in the Appendix. As in Section
3.2.3, the first term is the standard BMS term for an investor with T - t years to mandatory
retirement. Whether financial wealth increases or the time to mandatory retirement decreases,
the first term becomes smaller, which is the standard BMS intuition. Hence, the first term
decreases over time (in expectation) because T - t falls, while Wt increases over time (in
expectation). The second term captures the replicating portfolio of the early retirement option
and is strictly positive. Its relevance is larger a) the closer the option is to being exercised
(in/out of the money) and b) the more time is left until its expiration (T - t). Accordingly, the
importance of the second term in (3.19) should be expected to decrease when T - t is small.
However, it should be expected to increase when Wt increases.
This now opens up the possibility of rich interactions between "pure" horizon effects (variations in T - t, keeping Wt constant) and "wealth" effects. As an agent ages but is not yet
retired, the "pure" horizon effects will tend to decrease the allocation to stocks. However, in
expectation wealth increases as well and thus the option to retire early becomes more and more
relevant, counteracting the first effect.
We quantitatively illustrate the interplay of these effects in the next section.
129
3.4
Quantitative implications
To quantitatively assess the magnitude of the effects described in Section 3.3 we proceed as
follows. First, we fix the values of the variables related to the investment opportunity set to
r = 0.03, p = 0.1, and a = 0.2. For 0 we choose 0.07 in order to account for both discounting
and a constant probability of death. For 7 we consider a range of values (typically 7 = 2,3,4).
This leaves one more parameter to be determined, namely K. The parameter K controls the
shift in the marginal utility of consumption upon entering retirement. It is a well-documented
empirical fact that consumption drops considerably upon entering retirement. As such, the
most natural way to determine the value of K is to match the agent's declining consumption
upon entering retirement. Aguiar and Hurst (2004) report expenditure drops of 17%, whereas
Banks, Blundell and Tanner (1998) report changes in log consumption expenditures of almost
0.3 in the five years prior to retirement and thereafter. Since these decreases mainly pertain
to food expenditures, which are likely to be inelastic, we also calibrate the model to somewhat
19
larger decreases in consumption.
In light of (3.13), we have
c+ = K110 --+ K= c,+
c•.kc,.
where cT- is the consumption immediately prior to retirement and c,+ is the consumption
immediately thereafter. By substituting a post- /pre- retirement ratio of c + 7~--
{0.5, 0.6, 0.7}
in the above formula we can determine the respective values of K that will ensure that the
retirement decrease is equal to {0.5, 0.6,0.7}, respectively. We fix the mandatory retirement
age to be T = 65 throughout and normalize yo to be one. The abbreviation "Ret" indicates the
solution implied by a model with optimal early retirement (up to time T) and "BMS" denotes
the solution of a model with mandatory retirement at time T, with no option to retire earlier
or later.
19
Admittedly, not all of these effects are purely due to nonseparability between leisure and consumption. Home
production is undoubtedly a key determinant behind these decreases. It is important to note, however, that our
model is not incompatible with such an explanation. As long as a) the agent can leverage consumption utility
with her increased leisure, and b) time spent on home production is not as painful as work, the present model can
be seen as a good reduced-form approximation to a more complicated model that would model home production
explicitly.
130
Wealth threshold as a function of age
Wealth threshold as a function of age
14U
r
20
35
E
F
30
25
14
r
-
-
20
-
E
12
15
-
-
0.5
10
-
'
40
50
Age
60
70
5
30
40
50
Age
60
70
Figure 3-3:
Fig. 3 plots the target wealth that is implied by the model, i.e., the level of wealth required
to enter retirement. This figure demonstrates two patterns. First, threshold wealth declines
as an agent nears mandatory retirement. This is intuitive, because the option to work is more
valuable the longer its "maturity": As a (working) agent ages, the incentive to keep the option
"alive" is reduced and hence the wealth threshold declines. Second, the critical wealth implied
by this model varies with the assumptions made about risk aversion, and the disutility of work
as implied by a lower K 1/10.Risk aversion tends to shift the threshold upwards, whereas lower
levels of K1"7 (implying more disutility of labor) bring the threshold down. These are intuitive
predictions. An agent who is risk averse wants to avoid the risk of losing the option to work,
whereas an agent who cares a lot about leisure will want to enter retirement earlier.
Fig. 4 addresses the importance of the real option to retire for portfolio choice. The figure
plots the second term in Eq. (3.19) as a fraction of total stockholdings, 7rt. In other words, it
131
KOt0.5,
KObO.6,
Y.
Klbo.6,
KbOO.5, V
K01•0.5,
'3
4
Percent
Figure 3-4:
plots the relative importance of stockholdings due to the real option component as a percentage
of total stockholdings. This percentage is plotted as a function of two variables, age and wealth.
Age varies between 45 and 64 and wealth varies between zero and x, where x corresponds to
the level of wealth that would make an agent retire (voluntarily) at 64. We normalize wealth
levels by x so that the (normalized) wealth levels vary between zero and one. We then plot
a panel of figures for different levels of 7 and K1/O. Fig. 4 demonstrates the joint presence
of "time to maturity" and "moneyness" effects in the real option to retire. Keeping wealth
fixed and varying the time to maturity (i.e., increasing age), the relative importance of the real
option to retire declines. Similarly, increasing wealth makes the real option component more
relevant, because the real option is more in the money. It is interesting to note that the real
option component is large, realizing values as large as 40% for some parameter combinations.
In Fig. 5 we consider the implications of the model for portfolio choice as a function of
132
age. We fix a path of returns that correspond to the realized returns on the CRSP valueweighted index between 1989 and 1999. We then plot the portfolio holdings (defined as total
stockholdings normalized by financial wealth) over time for an individual whose wealth in 1989
was just enough to allow her to retire in the end of 1999 at the age of 58. This is achieved as
follows. Assume that in 1989 the investor is 48 years old and has wealth W0 . We treat Wo as
the unknown variable that we need to solve for. For any given W0 , and using both the optimal
consumption and portfolio policies and the path of the realized returns between 1989 and 1999,
we can determine how much wealth the investor has in 1999, when she is 58 years old. In order
to ensure that she retires at that point we know that her wealth must be W. Hence, we choose
WO so as to make sure that ten years thereafter (given the optimal policies and the realized
path of returns) the wealth has grown to exactly W. We repeat the same exercise assuming
various combinations of K 11/•
and y. In order to be able to compare the results, we also plot
the portfolio that would be implied if the individual had no option of retiring early and we
label this later case as "BMS." Fig. 5 shows that the portfolio of the agent is initially declining
and then flat or even increasing over time after 1995. This is in contrast to what would be
predicted by ignoring the option to retire early (the "BMS" case). This fundamentally different
behavior of the agent's portfolio over time is due to the extraordinary returns during the latter
half of the 1990s, which makes wealth grow faster and hence the real option to retire very
important towards the end of the sample. By contrast, if one assumed away the possibility
of early retirement, the natural conclusion would be that a run-up in prices would change the
composition of the agent's total resources (financial wealth + human capital) towards financial
wealth. For a constant income stream this would therefore mean a decrease in the portfolio
chosen.
Fig. 6 demonstrates the above effect more clearly. In this figure we normalize total stockholdings by total resources (human capital + financial wealth). Consistent with our results
above, in the BMS case we get a constant equal to -. When we allow for an early retirement
option, we observe that the fraction of total resources invested in stock exhibits a stark increase
towards the latter half of the 1990s, as the option of early retirement becomes more relevant.
The increase in this fraction is small for the first half of the 1990s and large for the latter part
133
Kli"., vp4
1
1
1.8
0.9
1.6
o 0.8
1.4
.
. 1.2
0
. 0.7
0.6
0.8
48
50
52
54
56
n0a
v.;
58
48
50
52
Age
1
54
56
58
Age
K b0.7,
O
K '0.7,
Y3
134
0.7
1.3
1.2
0.65
1.1
.0
.o_
1
Ret
S0.6
1 0.9
0.8
0.55
BMS
0.7
V.U
48
50
52
54
56
58
48
Age
50
54
52
Age
Figure 3-5:
134
56
58
K"b.6, 3
K"b0.6, Y.4
0.65
0.8
0.75
0.6
0.7
.2 0.55
0
C0.65
CL 0.5
0.6
0.45
.2
Ret
0
n5 K
48
BMS
K1
Ag[en
50
52 k07,54
Age
K1/
56
58
48
50
=.7
AA
52
54
56
58
Age
K"10:0.7,
K1/W.7, Y%
U. (0
0.55
fD
U.
0.7
Ret
0.5
o
•o0.65
.2
0
CL
0.7
0.65
.2
n
0.6
n Ri;
0.45
0.6
N
48
BMS
n4
50
52
54
56
58
48
50
52
54
56
58
Age
Age
Figure 3-6:
of the decade.
Fig. 6 is useful in understanding the behavior of the portfolio holdings in Fig. 5. In the first
half of the sample the standard BMS intuition applies. The fraction of total resources invested
in the stock market is roughly constant even after taking the option of early retirement into
account. Hence, by the standard intuition behind the BMS results, the portfolio of the agent
(total stockholdings normalized by financial wealth) declines over time. However, in the latter
half of the sample, the increase in the real option to retire is strong enough to counteract the
decline in the portfolio implied by standard BMS intuitions.
Figs. 7 and 8 repeat the same exercise as in Figs. 5 and 6, only now for an agent who came
close to retirement in 1999, that is, we now assume that her wealth in 1999 was slightly less
than sufficient for her to actually retire. To achieve this we just assume that in 1989 she
started with slightly less initial wealth than necessary to retire by 1999. It is interesting to
note what happens after the stock market crash of 2000. Now, the option of early retirement
135
K"O0.6,
Y.03
0.9
2
._o
.o 0.8
ý1.5
0
0.0
0. 0.7
0.
0.6
Ret
CLt
BMS
45
50
55
60
nA I
45
65
50
55
Age
60
65
Age
010:0.7, Y.3
013:0.7, Y,4
0.75
"I
U..
1.3
1.2
0.65
S1.1
0.6
0
S0.9
0.55
0.8
0.7
45
0.5
50
55
60
0451
v..
45
65
50
Age
55
60
65
Age
Figure 3-7:
starts to become irrelevant and the agent's portfolio declines. The effect of a disappearing
option magnifies the decrease in the portfolio. By contrast, in the BMS case the abrupt
decrease in the stock market (and hence wealth) would be counterbalanced by a change in the
composition between financial wealth and human capital towards human capital. This effect
tends to somehow counteract the effects of aging and produces a much more moderate
decrease in portfolio holdings.
These figures are meant to demonstrate the fundamentally different economic implications
that can result once one takes into account the real option to retire. As such they should be
seen as merely an illustrative application. Note, however, that a stronger result can be shown
in the context of this exercise. For wealth levels close to the retirement threshold and for our
preferred base scenario of K1'1h
= 0.7 and - = 4, the portfolio would increase with age in
expectation as the agent approaches early retirement. Fig. 9 illustrates this effect. The only
136
11
K
0.6, 13
0.7
0.54
0.68
0.52
0.66
2
0.5
0.64
0.48
0 0.62
0.46
0.6
0.58
0.44
BMS
•]
45
50
55
Age
11
K
A
II
0.42
45
60
50
55
Age
1
K /•'0.7,
0.7, Yj3
0.68
0.49
0.66
0.48
60
65
60
65
4
0.47
0.64
.Q
0.62
0.46
0
0.45
0.6
BMS
0.58
0.44
BMS
-
45
50
55
Age
60
65
45
Figure 3-8:
137
50
55
Age
difference from Fig. 5 is that in Fig. 9 we perform the counterfactual exercise of assuming
that the stock market moved along an "expected path" between 1989 and 1999. To do so,
we assume that the increments of the Brownian motion driving returns are zero, so that the
expected returns and the realized returns in the stock market coincide. As can be seen, the
qualitative features of Fig. 5 are preserved. For the base scenario K1/"0 = 0.7 and - = 4, the
fraction invested in stocks is initially decreasing with respect to age, then flat and even slightly
increasing (between ages 57 and 58) along the expected path. This increase of the portfolio
with age (in expectation) would be impossible in the absence of an early retirement option.
Interestingly, the decline in portfolios between ages 48 and 57 is much smaller than what a
BMS model would imply. The increase in the importance of the option of early retirement
counteracts the pure horizon effect, so that the allocation to stock is almost constant for agents
between ages 48 and 57. This may help explain the relatively constant allocations to stock that
Ameriks and Zeldes (2001) document empirically.
The present paper is theoretical in nature, and we don't claim to have modeled even a
small fraction of all the issues that influence real life retirement, consumption, and portfolio
decisions (e.g., shorting and leverage constraints, transaction costs, undiversifiable income and
health shocks, etc.). However, note that the model does produce "sensible" portfolios (for the
combination 7y= 4 and K'/O = 0.7) as well as variations in portfolio shares between 1995
and 2003. In the bottom right plots of Fig. 7, for instance, the portfolio of the agent grows
from 0.58 to 0.62 between 1995 and 1999 and then declines to roughly 0.5 by the beginning
of 2003. In comparison, the Employee Benefits Research Institute (EBRI) 20 reports that the
average equity share in a sample of 401(k)s grew steadily from 0.46 to 0.53 between 1995 and
1999 only to fall to 0.4 by the beginning of 2003. The reason the model performs well is that
we are considering an agent close to retirement, that is, at a time when the remaining NPV
of her income is not a large component of her total wealth in the first place. For young agents
the model has similar problems matching the data as BMS, which is to be expected.
20
Employee Benefits Research Institute, Issue Brief 272 (Aug 2004), especially Figure 2.
138
K"11:0.6,
Ko150.6,
s3
1.2
0.85
1.1
Ret
0.9
BMS
0.7
.
0
0.8
0.55
0.7
48
56
54
52
50
0.5
48
58
50
56
58
56
58
Age
K "60.7,Y4
0.7, I3
Kl
54
52
Age
0.7
1
0.95
50.65
0.9
~Ret
Ret
0
0 .
0.8
0
a.
0.8a.
BMS
0.75
BMS
0.55
0.7
0.65
48
50
54
52
56
0.5
48
58
50
54
52
Age
Age
Figure 3-9:
139
3.5
Borrowing constraint
Thus far we assume that the agent is able to borrow against the value of her future labor
income. In this section we impose the additional restriction that it is impossible for the agent
to borrow against the value of future income. Formally, we add the requirement that Wt > 0,
for all t > 0. To preserve tractability, we assume in this section that the agent is able to go into
retirement at any time that she chooses without a deadline. This makes the problem stationary
and as a result the optimal consumption and portfolio policies will be given by functions of Wt
alone.
The borrowing constraint is never binding post-retirement because the agent receives no
income and has constant relative risk aversion. This implies that once the agent is retired, her
consumption, her portfolio, and her value function are the same with or without borrowing
constraints. In particular, if she enters retirement at time 7- with wealth W, her expected
utility is still U2 (W;).
The problem the agent now faces is
orr
max E
etWrT
e-ptU(tli, ct)dt + e-rU2(WT)
(3.20)
0f
subject to the borrowing constraint
Wt > O, Vt> 0,
(3.21)
dWt = rt {pdt + adBt} + {Wt -irt}rdt - (ct - yol{t < r})dt.
(3.22)
and the budget constraint
We present the solution in Theorem 36 in the Appendix. We devote the remainder of this
section to a comparison of results we obtain in Section 3.2 with the resulting optimal policies
we obtain in Theorem 36.
The Appendix gives a simple proof as to why wealth at retirement is smaller with borrowing
constraints than without (even though quantitatively the effect is negligible).
In terms of
optimal stockholdings, the presence of borrowing constraints moderates holdings of stock, and
140
r
30
25
J;
0 20
15
ret.
10
10
II
0
-
5
-
I -
10
15
I
20
25
Wealth
Figure 3-10:
decreasingly so as the wealth of the agent increases.
Fig. 10 compares optimal portfolios for four cases formed by those with and without the
early retirement option, and those with and without the imposition of borrowing constraints.
For the cases in which we allow retirement, we take wealth levels close to retirement but lower
than the threshold that would imply retirement. The figure demonstrates that for levels of
wealth close to retirement there are only (minor) quantitative differences between agents with
borrowing constrains and agents without. The qualitative properties are the same. Holdings of
stock increase with wealth (more than linearly). One can observe that the optimal stockholdings
in the presence of early retirement are tilted more towards stocks whether we impose borrowing
constraints or not.
Similarly, the optimal holdings of stock are smaller when one imposes
borrowing constraints (whether one allows for a retirement option or not).
We conclude by summarizing the key insights of this section. Borrowing constraints are
relevant for levels of wealth close to zero, where optimal retirement is not an issue. Similarly, the
141
ý
ý
effects of optimal retirement are relevant for high levels of wealth, where borrowing constraints
are unlikely to bind in the future. Hence, as long as one examines the effects of the option
to retire close to the threshold levels of wealth, borrowing constraints can be safely ignored.
However, it is important to note that borrowing constraints can fundamentally affect quantities
related to, e.g., the expected time to retirement for a person who starts with wealth close to
zero because they will typically imply lower levels of stockholdings and hence a more prolonged
time (in expectation) to reach the retirement threshold.
3.6
Conclusion
In this paper we propose a simple partial equilibrium model of consumer behavior that allows for
the joint determination of a consumer's optimal consumption, portfolio, and time to retirement.
The Appendix provides essentially closed-form solutions for virtually all quantities of interest.
The results can be summarized as follows. The ability to time one's retirement introduces
an option-type character to the optimal retirement decision. This option is most relevant for
individuals with a high likelihood of early retirement, that is, individuals with high wealth levels.
This option in turn affects both an agent's incentive to consume out of current wealth and her
investment decisions. In general, the possibility of early retirement will lead to portfolios that
are more exposed to stock market risk. The marginal propensity to consume out of wealth will
be lower as one approaches early retirement, reflecting the increased incentives to reinvest gains
in the stock market in order to bring retirement "closer."
The model makes some intuitive predictions. Here we single out some of the predictions
that seem to be particularly interesting. First, the model suggests that during stock market
booms, there should be an increase in the number of people that opt for retirement as a larger
percentage of the population hits the retirement threshold (some evidence for this may be found
in Gustman and Steinmeier (2002) and references therein). Second, the models shows that it
is possible that portfolios of aging individuals could exhibit increasing holdings of stock over
time, even if there isn't variation in the investment opportunity set and the income stream
exhibits no correlation with the stock market (or any risk whatsoever). This is interesting in
light of the evidence in Ameriks and Zeldes (2001) that portfolios tend to be increasing or
142
hump-shaped with age for the data sets that they consider. Third, according to the model,
there should be a discontinuity in the holdings of stock and in consumption upon entering
retirement. Ample empirical evidence shows that indeed, this is the case. (see, e.g., Aguiar
and Hurst (2004) and references therein). The discontinuity in stockholdings seems to have
been less tested an hypothesis. Fourth, the model predicts that all else equal, switching to a
more flexible retirement system that links portfolio choice with retirement timing should lead
to increased stock market allocations. This is consistent with the empirical fact that stock
market participation increased in the U.S. as 401(k)s were gaining popularity. Fifth, increasing
levels of stockholdings during a stock market run-up and liquidations during a stock market fall
might not be due to irrational herding; instead, both effects might be due to the behavior of
the real option to retire that emerges during the run-up and becomes irrelevant after the fall.
In this paper we try to outline the basic new insights that obtain by the timing of the
retirement decision. By no means do we claim that we address all the issues that are likely
to be relevant for actual retirement decisions (e.g., health shocks, unspanned income, etc.).
Rather, we view the theory developed in this paper as a complement to our understanding of
richer, typically numerically solved, models of retirement. Many interesting extensions to this
model should be relatively tractable.
A first important extension would be to include features that are realistically present in actual 401(k)-type plans such as tax deferral, employee matching contributions, and tax provisions
related to withdrawals. The solutions developed in such a model could be used to determine
the optimal saving, retirement, and portfolio decisions of consumers that are contemplating
retirement and taking into account tax considerations.
A second extension would be to allow the agent to reenter the workforce (at a lower income
rate) once retired. We doubt this would alter the qualitative features of the model, but it is
very likely that it would alter the quantitative predictions. It can be reasonably conjectured
that the wealth thresholds would be significantly lower in that case, and the portfolios tilted
even more towards stocks because of the added flexibility.
A third extension of the model would be to introduce predictability and more elaborate
preferences. If one were to introduce predictability, while keeping the market complete (like
Wachter, 2002), the methods of this paper can be easily extended. It is also very likely that
143
the model would not loose its tractability if one uses Epstein-Zin utilities in conjunction with
the methods recently developed by Schroder and Skiadas (1999).
A fourth extension of the model that we are currently pursuing is to study its general
equilibrium implications. 21 This is of particular interest as it would enable one to make some
predictions about how the properties of returns are likely to change as worldwide retirement
systems begin to offer more freedom to agents in making investment and retirement decisions.
21The role of labor supply flexibility in a general equilibrium model with continuous labor-leisure choice is
considered in Basak (1999). It is very likely that the results we present in this paper could form the basis for
a general equilibrium extension. It is well known in the macroeconomics literature that allowing for indivisible
labor is quite important if one is to explain the volatility of employment relative to wages. See, for example,
Hansen (1985) and Rogerson (1988).
144
.1
Appendix
This version of the Appendix contains the statements of the theorems and a sketch of the proof
of the main theorem. An extended appendix containing all the proofs is available online.
.1.1
Theorems and proofs for section 3.2
Theorem 31 To obtain the solution to the problem we describe in Section 3.1, define the
constants
62
1- 2,r - /(1 - 20-)2+ 84
2
=--
2
(21)0
(23)
y0
(24)
A
and
S
--
•t•1
)
C2
-1
2ý2-i
assume that
r(
+
0 62-
62) <
1,
and let A* be the (unique) solution of
2C2 (A*
.
0 )- -+ r + Wt= 0=
1
1
Then
C2 > 0,72 < 0,
and the optimal policy is given as
follows:
1
a) If Wt
< W-
(C2-1)K-6
(1
2 )1 K
10
1Y)-
145
(25)
consumption follows the process
1
A*e(s
cs
cs=
-
t)H( s ))
H (t)
1{t < s <
H
A*e(s-t) H()
H(t))
(1-)
*}
(26)
1{s > T*},
(27)
A
(28)
1
the optimal retirement time is
T*
=inf{s : W = W} =
inf
s : A*e
(s - t ) H ( )
=
and the optimal consumption and stockholdings as a function of Wt are given by
ct
=
c(Wt) = (A*(WO))
7rt
=
7r(Wt) =
2(2
( 2 -1)KV 0
b) If Wt
( '+ý2
11
_(
1)C 2 A*(Wt) 21
-
,
(29)
,)
--lA*(Wt) - - )
.
(30)
the optimal solution is to enter retirement immedi-
' YK^,0-1)r
ately (r* = t) and the optimal consumption /portfolio policy is given as in Karatzas and Shreve
(1998), Chapter 3.
The remainder of this section is to provide a sketch for the proof of Theorem 31. Throughout
this section we fix t = 0 without loss of generality.
For a concave, strictly increasing and
continuously differentiable function U : (0, oc) -+ R, we can define the inverse I(.) of U'(.). Let
U be given by
U(y) = max[U(x) - xyl
x>O
=
U(I(y)) - yI(y),
146
O < y < oo.
It is easy to verify that U(-) is strictly decreasing and convex, and satisfies
Uj'(y)
=
-I(y),
U(x)
=
min
y>0
0<y<oo
00
(31)
[U(y) + xy] = U(U'(x)) + xU'(x), 0 <x
< 00.
To start, we fix a stopping time - and define
V,(Wo) = maxE
Ct y'7t fi
e-OtUi(ct)dt+ee'U
2 (W)
,
(32)
where U1 and U2 are as defined in Eqs. (3.6) and (3.7). The following result is a generalization
of the equivalent result in Karatzas and Wang (2000) to allow for income.
Lemma 32 Let
J(A; T) = E
[f
r
[&e-#tU1(AeftH(t)) + AH(t)yo dt + e- uU
2 (Ae 'H(r)) .
For any -r that is finite almost surely, there exists A* such that
V,(Wo) = A>O
inf [j(A; r) + AWo = J(A*;T) + ,*Wo
and the optimal solution to (32) entails
W,
=
I 2 (A*eTrHr)
(33)
ct
=
Ii(A*eptH(t))1{t < 7-}
(34)
with I, and 12 defined similarly to Eq. (31). Moreover, the value function V(Wo) of the
problem outlined in Section 2 satisfies
V(Wo) = sup Vr(Wo) = sup inf [F(A;
7r
r
A>0L
r)
147
= sp [(A*; r) + A*Wo.
+ AWo
Karatzas and Wang (2000) show that one can reduce the entire joint portfolio-consumptionstopping problem into a pure optimal stopping problem by examining whether the inequality
V(Wo) =sup inf [J(A;) +AWo] :< inf sup [J(A;,r) +AWo] = inf [i(A) +AWo
T
A>O
>
(35)
'rX
becomes an equality, with V(A) defined as
V(A) = sup J(A;r) = supE
[f
[e-tU(AePtH(t)) + AH(t)yo dt +e- 'r2(Ae 'H(r)) .
(36)
The inequality (35) follows from a standard result in convex duality (see, e.g., Rockafellar,
1997).
Reversing the order of maximization and minimization in (35) makes the problem
significantly more tractable, since V(A) is the value of a standard optimal stopping problem,
for which one can apply well known results. In particular, the parametric assumptions that we
make in Section 3.1.3 allow us to solve this optimal stopping problem explicitly. This forms a
substantial part of the proof and we present it in the extended appendix.
The cost of reversing the order of the minimization and the maximization, however, is that
it will only give us an upper bound to the value function. The rest of the proof here is therefore
devoted to showing that the inequality in (35) is actually an equality.
Remark 33 The option pricing interpretationgiven in Section 3.2.1 is based on a slight rewriting of equation (36). To see this, note that
V(A)
= supJ(A;
•r) =supE
= E[ [f
[
[e-ftU(Ae~tH(t)) + AH(t)yo dt + e-8rU
2 (Ae'e'-H(r)) =
lte-Ot&i1(AeptH(t)) + AH(t)yo 1 dt
0
+supE e4
(Ae 'H(
-
[e-ptUi(Ae tH(t)) +AH(t)yo
The extended appendix shows that the third line can be further rewritten as
sup E e-
1rZr K1hO - 1
which is precisely the option we analyze in Section 3.2.1.
148
Z7
-
,
dt
The following result is a straightforward extension of a result in Karatzas and Wang (2000)
and is given without proof.
Lemma 34 Let ' be the optimal stopping rule associated with A in Eq. (36). Then
V'(A) = -E
[
H(t) (Ii(AeptH(t)) - Yo) dt + H(IA)I 2 (Ae
HH)
, A e (0, oo).
The final steps towards proving Theorem 31 use this observation in order to replace the
inequality in (35) with an equality sign. In particular, the extended appendix shows how to
use Lemma 34 to demonstrate that the policies we propose in the statement of the theorem are
feasible and their associated payoff provides an upper bound to the value function. We then
conclude that they are the optimal policies.
.1.2
Proofs for Sections 3.2.2 and 3.2.3
See extended Appendix.
.1.3
Theorems and proofs for Section 3.3
The statement of Theorem 35 is almost identical to the statement of Theorem 31 with the main
exception that all the constants now depend on T - t. In the following statement of the theorem
we isolate the results related to the optimal portfolio and leave the precise statement of the
entire theorem along with its proof for the extended Appendix.
Theorem 35 The optimal portfolio is given as
rt-= ir(Wt) =r (
-- 2 •(T-t)
2 (T-t)(
1 1
)C
-Y O(T-t )
2 (T-t)A*(Wt)•(Tt)_+9
A*(Wt)
where A*(Wt)is the unique solution of
(2(T-t)(
and C2(T-t),
2(T-t),
)(2(T-t)
(*)_2( T-
t)
_
l
1
(A)- +o1-e-(-
O(T-t)
-r(T
r
)
+ Wt = 0
(37)
O(T-t) are constants that depend on T - t, which we give explicitly in the
extended appendix.
149
Since Theorem 35 involves a finite-horizon optimal stopping problem, we use an approximation (along the lines of Barone-Adesi and Whaley, 1987) in order to solve it. The extended
appendix discusses the quality of this approximate solution and finds that it is very accurate.
.1.4
Theorems and proofs for Section 3.5
Theorem 36 Under technical conditions given in the extended appendix, there exist appropriate constants Ci, C 2 , ZL, ZH, ý1, and ý2 and a positive decreasing process X* with X* = 1 so
that the optimal policy triple < •8 , W?,
--
> is
1
a) If Wt < W= K ZL •,
1
c8
=
(A*ep(st)X:
< ?}
)1{s
= inf{s :w 8 = W} =
Sinf{s
ZL}
A*e~(s-)x*H(s)
8 H(t)
and A* is given by
lCl (A~*)1 1 + ( 2C 2 (A*)
2- 1
-
0(*)-
+ -
r
+
Wt = 0.
(38)
Using the notationA*(We) to make the dependence of A*on We explicit, the optimal consumption
and portfolio policy is given by
ct = c(We)= (A*(We))
7rt =
r(wt)=
Wt(
A
We)
aor* Aw,,(
where A*4,(We) denotes the first derivative of A*(We) with respect to Wt.
b) If We
W = Kf Z L -, the optimal solution is to enter retirement immediately (r - t)
and the optimal consumption policy is given as in the standard Merton (1971) infinite horizon
problem.
150
We include the proof and the precise assumptions behind this theorem in the extended
appendix to this paper.
151
Bibliography
[1] Aguiar, M., and E. Hurst (2004): "Consumption vs. Expenditure," NBER Working Paper,
w.10307.
[2] Ameriks, J., and S. P. Zeldes (2001): "How Do Household Portfolio Shares Vary With
Age?," Working Paper, Columbia GSB.
[3] Banks, J., R. Blundell, and S. Tanner (1998): "Is There a Retirement-Savings Puzzle?,"
American Economic Review, 88(4), 769-788.
[4] Barone-Adesi, G., and R. E. Whaley (1987): "Efficient Analytic Approximation of American
Option Values," Journal of Finance, 42(2), 301-320.
[5] Basak, S. (1999): "On the Fluctuations in Consumption and Market Returns in the Presence
of Labor and
[6] Human Capital: An Equilibrium Analysis," Journal of Economic Dynamics and Control,
23(7), 1029-64.
[7] Bodie, Z., J. B. Detemple, S. Otruba, and S. Walter (2004): "Optimal consumption-portfolio
choices and retirement planning," Journal of Economic Dynamics and Control, 28(3), 11151148.
[8] Bodie, Z., R. C. Merton, and W. F. Samuelson (1992):
"Labor Supply Flexibility and
Portfolio Choice in a Life Cycle Model," Journal of Economic Dynamics and Control, 16(34), 427-49.
[9] Campbell, J. Y., J. F. Cocco, F. J. Gomes, and P. J. Maenhout (2004): "Investing Retirement Wealth: A Life-Cycle Model," Review of Financial Studies - forthcoming.
152
[10] Campbell, J. Y., and L. M. Viceira (2002): Strategic Asset Allocation, Clarendon Lectures
in Economics. Oxford University Press.
[11] Carroll, C. D., and M. S. Kimball (1996): "On the Concavity of the Consumption Function," Econometrica, 64(4), 981-92.
[12] Chan, Y. L., and L. Viceira (2000): "Asset Allocation with Endogenous Labor Income:
The Case of Incomplete Markets," Working Paper, Harvard University.
[13] Choi, K. J., and G. Shim (2004): "Disutility, Optimal Retirement, and Portfolio Selection,"
mimeo.
[14] Graduate School of Management, Korea Advanced Institute of Science and Technology.
[15] Cox, J. C., and C.-f. Huang (1989): "Optimal consumption and portfolio policies when
asset prices follow a diffusion process," Journal of Economic Theory, 49(1), 33--83.
[16] Cox, J. C., S. A. Ross, and M. Rubinstein (1979): "Option Pricing: A Simplified Approach," Journal of Financial Economics, 7(3), 229-263.
[17] Diamond, P. A., and J. A. Hausman (1984): "Individual Retirement and Savings Behavior," Journal of Public Economics, 23(1-2), 81-114.
[18] Duffie, D. (2001): Dynamic asset pricing theory. Princeton University Press, Princeton
and Oxford.
[19] Duffie, D., W. Fleming, M. Soner, and T. Zariphopoulou (1997): "Hedging in Incomplete
Markets with HARA Utility," Journal of Economic Dynamics and Control, 21, 753-782.
[20] Duffie, D., and T. Zariphopoulou (1993): "Optimal Investment with Undiversifiable Income
Risk," Mathematical Finance, 3, 135-148.
[21] Dybvig, P. H., and H. Liu (2005): "Lifetime Consumption and Investment: Retirement
and Constrained Borrowing," mineo. Washington University in St. Louis.
[22] El Karoui, N., and M. Jeanblanc-Picque (1998): "Optimization of Consumption with Labor
Income," Finance and Stochastics, 2(4), 409-440.
153
[23] Gustman, A. L., and T. L. Steinmeier (1986): "A Structural Retirement Model," Econometrica, 54(3), 555-84.
[241 Gustman, A. L., and T. L. Steinmeier (2002): "Retirement and the Stock Market Bubble,"
NBER Working Paper, 9404.
[25] Hansen, G. (1985): "Indivisible labor and the business cycle," Journal of Monetary Economics, 16(3), 309-327.
[26] Harrison, J. M. (1985): Brownian motion and stochastic flow systems, Wiley Series in
Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York.
[27] He, H., and H. F. Pages (1993): "Labor Income, Borrowing Constraints, and Equilibrium
Asset Prices," Economic Theory, 3(4), 663-696.
[28] Jagannathan, R., and N. Kocherlakota (1996): "Why Should Older People Invest Less in
Stocks than Younger People?," Federal Reserve Bank of Minneapolis Quarterly Review, pp.
11-23.
[29] Karatzas, I., J. P. Lehoczky, and S. E. Shreve (1987): "Optimal portfolio and consumption decisions for a "small investor" on a finite horizon," SIAM Journal on Control and
Optimization, 25(6), 1557-1586.
[30] Karatzas, I., and S. E. Shreve (1991): Brownian motion and stochastic calculus, vol. 113
of Graduate Texts in Mathematics. Springer-Verlag, New York, second edn.
[31] Karatzas, I., and S. E. Shreve (1998): Methods of mathematical finance, Applications of
Mathematics. Springer-Verlag, New York.
[32] Karatzas, I., and H. Wang (2000): "Utility maximization with discretionary stopping,"
SIAM Journal on Control and Optimization, 39(1), 306-329.
[33] Kingston, G. (2000): "Efficient Timing of Retirement," Review of Economic Dynamics,
(3), 831-840.
154
[34] Kogan, L., and R. Uppal (2001): "Risk Aversion and Optimal Portfolio Policies in Partial
and General Equilibrium Economies," NBER Working Papers: 8609, 2001.
[35] Koo, H. K. (1998): "Consumption and Portfolio Selection with Labor Income: A Continuous Time Approach," Mathematical Finance, 8(1), 49-65.
[36] Lachance, M.-E. (2003): "Optimal Investment Behavior as Retirement Looms," Working
Paper, Wharton.
[37] Lazear, E. P. (1986): Retirement from the Labor Forcepp. 305-55, Handbook of Labor
Economics, vol. I eds. O. Ashenfelter and R. Layard. Elsveier.
[38] Liu, J., and E. Neis (2002): "Endogenous Retirement and Portfolio Choice," Working
Paper, UCLA.
[39] Merton, R. C. (1971): "Optimum Consumption and Portfolio Rules in a Continuous-Time
Model," Journal of Economic Theory, 3(4), 373-413.
[40] Musiela, M., and M. Rutkowski (1998): Martingale Methods in Financial Modelling, vol.
36 of Applications
of Mathematics. Springer-Verlag, New York, first edn.
[41] Oksendal, B. (1998): Stochastic differential equations: An Introduction with Applications
- Fifth Edition, Universitext. Springer-Verlag, Berlin.
[42] Poterba, J. M., S. F. Venti, and D. A. Wise (1998): "401(k) Plans and Future Patterns of
Retirement Saving," American Economic Review, 88(2), 179-184.
[43] Poterba, J. M., S. F. Venti, and D. A. Wise (2000): "Saver Behavior and 401(k) Retirement
Wealth," American Economic Review, 90(2), 297-302.
[44] Rockafellar, R. T. (1997): Convex analysis, Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ.
[45] Rogerson, R. (1988): "Indivisible labor, lotteries and equilibrium," Journal of Monetary
Economics, 21(1), 3-16.
[46] Rust, J. (1994): Structural Estimation of Markov Decision Processespp. 3081-3143, Handbook of Econometrics, vol. 4 eds. Engle R.F and McFadden D.L,. North-Holland, Amsterdam.
155
[47] Rust, J., and C. Phelan (1997): "How Social Security and Medicare Affect Retirement
Behavior in a World of Incomplete Markets,"Econometrica, 65(4), 781-831.
[48] Schroder, M., and C. Skiadas (1999): "Optimal Consumption and Portfolio Selection with
Stochastic Differential Utility," Journal of Economic Theory, 89, 68-126.
[49] Stock, J. H., and D. A. Wise (1990): "Pensions, the Option Value of Work, and Retirement," Econometrica, 58(5), 1151-80.
[50] Sundaresan, S., and F. Zapatero (1997): "Valuation, Asset Allocation and Incentive Retirements of Pension Plans," Review of Financial Studies, 10, 631-660.
[51] Viceira, L. (2001): "Optimal Portfolio Choice for Long-Horizon Investors with Nontradable
Labor Income," Journal of Finance, 56(2), 433-470.
[52] Wachter, J. (2002): "Portfolio and Consumption Decisions under Mean-Reverting Returns:
An Exact Solution for Complete Markets," Journal of Financial and Quantitative Analysis,
37, 63-91.
156
157