Moderate Voters, Polarized Parties Richard Van Weelden∗ January 12, 2011 Abstract I consider the welfare implications of divergent platforms in a model of repeated elections. There are two long-lived candidates who have preferences over policy and also an opportunity to engage in rent-seeking when in office. Candidates cannot make binding commitments before being elected, but voters condition their re-election decision on the behavior of the candidate in office. I consider symmetric, platform-stationary, subgame perfect equilibria with different levels of divergence between the two candidates. I show that divergence between the candidates can decrease the amount of rent-seeking by elected candidates, and, when utility functions are concave, increase the welfare of the voters. I then consider how optimal divergence depends on the distribution of voter preferences. When utility functions are are not too concave, increasing the polarization in voter preferences will increase the range for which divergent platforms are beneficial. However, when utility functions are sufficiently concave, increasing the polarization in voter preferences will decrease the pressures for divergence. Consequently, greater patience will be required to support divergence, and less divergence will be optimal in equilibrium. I then consider the case of multi-dimensional policy spaces, and show that, when utility functions are sufficiently concave, it will be optimal to have greater divergence in the dimension in which voters are less polarized. I discuss the implications of these results for electoral competition in the United States. ∗ Max Weber Fellow, European University Institute and Assistant Professor of Economics, University of Chicago. Phone: +39-346-8354-093 Email: Richard.VanWeelden@eui.eu. I would like to thank Dirk Bergemann, Justin Fox, Lucas Maestri, Massimo Morelli, Stephen Morris, Martin Osborne, Larry Samuelson, and Alyosius Siow for helpful discussions. 1 Introduction It has been well established that, at least in the United States, political parties are more polarized than the voters they seek to govern.1 While there has been much work done to document the significant, and increasing, divergence between the parties (see McCarty et al 2006 for an overview), the voters themselves appear to be much less polarized.2 That polarized elected officials represent a comparatively moderate populous is often seen as sign of democratic failure and evidence that political institutions have been hijacked by relatively extreme individuals and special interests (see Chapter 10 of Fiorina et al 2006). In this article, I argue that polarization in candidate preferences and platforms can be beneficial, and, in fact, may be most beneficial when voters themselves are more moderate. The argument builds on that introduced in Van Weelden (2010), that polarization of candidates’ preferences and platforms can create incentives for candidates to work harder to secure re-election, decreasing rentseeking. As the candidates care about the policy implemented, in addition to the rents from office, divergence in candidate preferences and platforms allows a shirking candidate to be punished with a policy further from their ideal point. When utility functions are concave, small changes in policy around the median voter’s ideal point will have a much greater impact on the welfare of non-median candidates than on the median voter. Hence, the decrease in rent-seeking behavior by the elected candidate will more than compensate the median voter for the policy being pulled away from the center, and divergence in candidate platforms will enhance the welfare of the median voter.3 In this paper I consider how the aggregate welfare of voters is affected by the polarization in the policy-preferences and implemented platforms of the candidates for office. For each level of polarization, I determine the best equilibrium which can be supported given candidates of that type, and then compare the voters’ welfare across the equilibria for different candidates. I show that, using a utilitarian social welfare function, divergence in candidate preferences and platforms is beneficial, for any symmetric distribution of voter ideal points, provided the players are sufficiently patient. This is in contrast to most of the results in the literature which find that the equilibrium with convergence is generally welfare maximizing.4 While it is always possible, with sufficient patience, 1 As Fiorina et al (2006) describes it, “for as long as we have had data political scientists have known than political elites are more polarized than the mass of ordinary Americans.” See also Table 2.1 in Fiorina et al (2006) for survey data showing greater polarization in the policy preferences of the delegates in the parties’ conventions than in the rank and file membership. 2 See Fiorina et al (2006): “[P]olarized political ideologues are a distinct minority of the American population. For the most part Americans continue to be ambivalent, moderate, and pragmatic, in contrast to the cocksure extremists and ideologues who dominate our political life.” (Preface to 2nd Edition) 3 Van Weelden (2010) considers a model with a single representative voter, and also shows this representative voter can be interpreted as the median voter in a setting with a large number of voters. 4 There are, however, some papers which find divergence to be beneficial: Bernhardt et al (2009), Van Weelden (2010). Additionally, Kedar (2005) argues that moderate voters may prefer candidates more extreme than themselves, 1 to support divergent platforms, how much patience is required will depend on the distribution of voter preferences. While it is generally assumed that increasing divergence in voters’ ideal points will be associated with greater divergence in candidate platforms, I show that the amount of divergence in the socially optimal equilibrium will be increasing in the polarization of the voters only if the players’ utility functions are not too concave. When utility functions are sufficiently concave, as non-median voters greatly dislike having the policy pulled even further from their ideal point, greater patience will be required to support divergent platforms when the voters are more polarized. In fact, when utility functions are sufficiently concave, divergent platforms will be beneficial unless the preferences of voters are too polarized. Further, when divergence is optimal, I show that, when utility functions are sufficiently concave, the optimal degree of polarization between the candidates will be decreasing in the polarization of the voters; that is, the more moderate the voters are, the greater the polarization between the candidates that is desired. Perhaps the most interesting implications of this model come from the case in which there are multiple dimensions of policy conflict in addition to the opportunity for rent-seeking. I show that polarization in candidates’ positions is more desirable along the dimension in which voters preferences are more polarized if, and only if, the utility functions are not too concave. Consequently, when utility functions are sufficiently concave, it will be socially optimal to have greater divergence in candidate platforms along the dimension in which voters are more moderate. In fact, for a range of parameters, it will be optimal to have convergence in the dimensions in which voters are more polarized, and divergence in the dimensions which voters are most centrist. These results provide a new framework for analyzing party competition, and a new interpretation for the observed divergence in party platforms. Recent work on parties and political attitudes indicates that, while voters are for the most part rather moderate, candidates and political activists are not (Fiorina et al 2006). In addition, while voters are predominantly concerned with economic affairs and policy (e.g. Fiorina et al 2006, McCarty et al 2006), in recent years candidates have diverged more over social policies while moving towards the center on economic policies.5 While this polarization between candidates and parties on issues on which most voters hold moderate positions is frequently seen as a sign of democratic failure, these results provide a more benign interpretation of candidates polarizing over issues in which voters are centrist, and raise the possibility that such polarization is not as harmful as is often claimed. When utility functions are concave, divergence in candidate platforms can benefit the voters by disciplining candidates, decreasing the amount of rent-seeking. Furthermore, when utility functions are sufficiently concave, since voters who have knowing that the candidate will only be able to achieve part of her legislative agenda. 5 Two other explanations for convergence in certain dimensions and divergence in others are differences in how easily messages can be targeted to certain groups (Glaeser et al 2005) and a tendency to converge with concave utility functions and diverge with convex utility functions in probabilistic voting models (Kamada and Kojima 2010). 2 an ideal point which is relatively extreme greatly dislike having the policy pulled even further from their ideal point, the utility loss from the policy being pulled from the center is less the fewer extreme voters there are. Hence, from a social welfare perspective, the best outcome is to have platforms diverge in dimensions in which voters are predominantly centrist and converge in the dimensions in which voters’ preferences are more polarized. Political parties who are polarized over issues on which most voters hold relatively moderate positions is precisely the outcome which has been observed, and frequently bemoaned, in the United States. Fiorina et al (2006) worries that “however misplaced from the standpoint of the welfare of the larger country, an emphasis on cultural and other conflicts not of particular interest to the the majority appears to be an integral part of contemporary electoral politics.” I compare voter welfare between equilibria in which the candidates are polarized over the same dimension the voters are to equilibria with polarization over the dimension in which voters are more moderate, and show that, for a range of utility functions, polarization over issues which voters are moderate is not “misplaced from the standpoint of ... welfare”, but rather the dimension of polarization which maximizes welfare. In this analysis I compare only the differences in welfare for different, exogenously specified, polarization in candidate preferences. I do not model how the candidates are determined. One implication of these results, then, is that factors outside the model– outside interest groups, political primaries – may actually increase aggregate welfare of the population if they produce candidates who have policy preferences which are less representative of the electorate. The paper is organized as follows. In section 2, I present the model and discuss the class of equilibria which the analysis focuses on. Section 3 then presents the results for the singledimensional case. In Section 4 I show how the results extend when there are multiple dimensions of policy conflict. In section 5 I discuss the role of the concavity assumptions and how such assumptions relate to the preferences of voters. Section 6 then concludes. The proofs of the results are left to the Appendix, with the Lemmas proven in the Supplemental Appendix. 2 Model I present a simplified version of the model introduced in Van Weelden (2010). Unlike Van Weelden (2010), which focuses on a setting with a single representative voter, I focus on how the welfare implications of divergent platforms relates to the distribution of voter preferences. I assume there are two infinitely-lived candidates and an infinite number of infinitely-lived voters.6 The elected candidate chooses the policy to implement, x ∈ [−1, 1], and the amount of rent-seeking to engage 6 Whether the voters are long or short lived will not affect the analysis – within the class of equilibria considered the voters’ best response will be to vote for the candidate they myopically prefer. For the candidates it is essential that they care about future outcomes. In a related setup, Van Weelden (2010) shows that infinitely lived candidates can be replaced by term-limited candidates, provided such candidates care about the outcome of future elections. 3 in, m ∈ [0, M ]. These rents could reflect outright corruption, such as taking bribes, or simply a lack of effort – either taking time off work or not addressing difficult but important issues – by the candidate in office. For example, the benefit to the candidate of holding office may be eroded if they work excessively long hours, deny favors to family and friends, or take actions which antagonize powerful lobby groups or vocal minorities. I refer to a policy-rents pair p = (x, m) as a platform. Voters can be one of three types, v ∈ {−1, 0, 1}.7 The stage game utility, for a voter of type v, in any period in which platform pt is implemented is uv (pt ) = −|xt − v|λ − mt . I will use V to refer to a specific voter, and v to refer to the voter’s type. I assume the distribution of voters’ preferences are symmetric so that the same fraction σ ∈ (0, 12 ) of the voters are each of the types v = 1 and v = −1. The fraction of voters who are type v = 0 is then 1 − 2σ. Note that σ is then a measure of the polarization of voter preferences: if σ is small (close to 0) then most of the voters are centrist; when σ is large (close to 1/2) most of the voters are at the extremes of the political spectrum. The candidates’ preferences are identical to the voters except that they benefit from receiving rents if they are in office. If the candidate is not elected then their payoff is identical to a voter with the same policy preferences. I assume that the candidates’ ideal points are symmetric about the median voter’s ideal point, so one candidate has ideal point κ ∈ [0, 1] and the other has ideal point −κ ∈ [−1, 0]. The utility to candidate k ∈ {−κ, κ} is then gk (pt ) = −|xt − k|λ − mt , if they are not in office in that period and platform pt is implemented, and ḡk (pt ) = −|xt − k|λ + γmt , if they are in office. The parameter γ ≥ 1 reflects how much the candidates in office care about rents relative to the voters and the candidate out of office. Note that κ then reflects the degree of polarization in candidate preferences. When κ = 0 both candidates share the policy preferences of the median voter and we have convergence in the candidates’ preferences. When κ = 1 the candidates share the policy preferences of the polarized voters – the bases of the parties if you will. When κ ∈ (0, 1), candidate preferences are polarized, but less polarized than the more extreme voters. The main purpose of this analysis is to determine the optimal degree of polarization in candidate preferences, how that affects the resulting policy outcomes, and how the optimal polarization is affected by the distribution of voter preferences. 7 It is not important that there be only three types. What matters is that the distribution of voters is symmetric, and that there are a positive mass of voters who hold the median position. A model with three types is used only for simplicity, and because it allows the degree of polarization to be characterized by a single parameter. 4 The timing of the game is as follows: 1. In each period each voter votes for a candidate, ktV ∈ {−κ, κ}. The candidate receiving more than half the votes is elected, and, in the event that both candidates receive half the votes, each candidate is elected with probability 1/2. Let kt denote the candidate elected in period t. 2. The elected candidate chooses platform pt = (xt , mt ) for that period. 3. The voters observe the platform pt implemented. 4. The game is repeated with the voters deciding whether to vote for the incumbent or the challenger. I assume that the game is infinitely repeated with discount factor δ < 1. Since each period in this model corresponds to an election cycle, we would expect δ to be significantly below 1. We will then be concerned with more than just the limiting properties as δ → 1. In each period the voters elect a candidate, kt , and the elected candidate chooses platform pt = (xt , mt ). A history then consists of the previously elected candidates, the platforms they implemented when in office, and the share of the vote the incumbent received, µt ∈ [1/2, 1]. That is, a history is ht = ((k0 , p0 , µ0 ), . . . , (kt−1 , pt−1 , µt−1 )). I then define ht(k,µ) = (ht , k, µ) to be the history at which candidate k is elected after history ht with t share µ of the vote. Let H t and H(k,µ) be the set of all t period histories ht and ht(k,µ) respectively, t and define H = ∪t≥0 H t and H(k,µ) = ∪t≥0 H(k,µ) . The strategy for each voter V is to vote for one of the candidates at every history, sV : H → {−κ, κ}, sV (ht ) = ktV . The strategy for each candidate consists of the platform they would implement, if elected by the voters, at every history, sk : H(k,µ) → [−1, 1] × [0, M ], sk (ht(k,µ) ) = pt . For any profile of strategies, s, we can then calculate the payoffs in the repeated game for each player. The payoff for a voter of type v ∈ {−1, 0, 1} is then Uv (s) = (1 − δ) ∞ X δ t uv (pt ), t=0 and for each candidate k ∈ {−κ, κ} the payoff is Uk (s) = (1 − δ) ∞ X t=0 5 δ t uk (kt , pt ), where uk (kt , pt ) = ḡk (pt ) if k = kt , gk (pt ) if k 6= kt . In what follows, I analyze the equilibrium, and the resulting voter welfare, taking the preferences of the candidates as given. I then compare the welfare for different candidate preferences to determine the degree of polarization that is socially optimal. Note that, as no voter can ever be pivotal, a wide range of behavior is consistent with equilibrium. I discuss the voters’ strategies and the class of equilibria more below, and will always assume that voters will vote for the candidate they prefer whenever they have a strict preference for one candidate over the other. For welfare considerations, I use a utilitarian social welfare function which weighs each type of voter by the fraction of voters who are of that type. That is, the aggregate welfare of the voters from a given strategy profile, s, is given by U ∗ (s) = σU−1 (s) + (1 − 2σ)U0 (s) + σU1 (s). Note that a utilitarian social welfare function is only one way to identify the most “efficient” or desirable electoral outcomes. An alternative way to compare two equilibrium outcomes is to look at whether a majority prefers one outcome to another. In Van Weelden (2010) it is shown that the Condorcet winners among the equilibrium outcomes correspond to those which maximize the welfare of the median voter. This will always involve non-median candidates provided the candidates are sufficiently patient.8 2.1 Two-Dimensional Policy Space The model can easily be extended to one with two or more dimensions of the policy space in addition to a distaste for rent-seeking. As two dimensions are all that is necessary to illustrate the results, I consider only the case of a two-dimensional policy space, x = (xA , xB ) ∈ [0, 1]2 . I then allow there to be heterogeneity in the ideal points of the voters in both dimensions A and B. To keep the model as simple as possible I assume that each voter only holds non-centrist policy positions in one dimension, though this assumption is not necessary for the main results. There are then five types of voters. I assume that fraction σA of the voters have ideal points x = (1, 0) and x = (−1, 0), that fraction σB of the voters hold ideal points x = (0, 1) and x = (0, −1), and that 1 − 2(σA + σB ) of the voters have ideal point x = (0, 0). Finally, I assume that 1 σA + σB ∈ (0, ), 2 8 In Van Weelden (2010), there a continuum of potential candidates and, rather than specify the strategy of each individual voter, it is assumed that voters only ever elect a candidate if there is no other candidate who would implement a platform which gives more than half the voters a higher utility. It is then shown that the resulting outcome is equivalent to the case with a single voter. See section 5.2 of Van Weelden (2010) for details. 6 so a non-trivial fraction of the voters hold a median position in both dimensions. Notice that, when σA < σB (σB < σA ), the voters are less (respectively more) polarized in dimension A relative to dimension B. Without loss of generality I assume that σA ≤ σB . As before, I assume there are two candidates, k = (kA , kB ) ∈ {−κ, κ}, where κ = (κA , κB ) ∈ [0, 1]2 . That is, the candidates have policy preferences, which may diverge, in each of the two dimensions. Note that, by symmetry, it is without loss of generality to assume κ ∈ [0, 1]2 . As in the one-dimensional case, candidates and voters have preferences over policy and rentseeking. That is, the stage game utilities are defined as uv (pt ) = −(xtA − vA )λ − (xtB − vB )λ − mt , for the voters, and uk (kt , pt ) = ḡk (pt ) = −(xtA − kA )λ − (xtB − kB )λ + γmt if k = kt , gk (pt ) = −(xtA − kA )λ − (xtB − kB )λ − mt if k 6= kt . for the candidates. Histories, strategies, and payoffs in the super-game can then be defined exactly as in the one-dimensional case, and the aggregate social welfare from a strategy profile can be defined as U ∗ (s) = σA U(−1,0) (s) + σB U(0,−1) (s) + (1 − 2(σA + σB ))U(0,0) (s) + σA U(1,0) (s) + σB U(0,1) (s). 2.2 Candidate Behavior For simplicity I restrict attention to subgame perfect equilibria in pure strategies. As this is a symmetric environment I restrict attention to symmetric equilibria in which the candidates’ strategies are stationary.9 Stationarity means that the platform the elected candidate will implement, if elected, will be independent of the history at which they are elected, or the share of votes they received. Definition 1 Symmetric and Platform Stationary The candidates’ strategies are symmetric and platform-stationary if there exists a platform p = (x, m) with x ≥ 0 and m ∈ [0, M ] such that, for all histories ht and vote shares µ, sκ (ht , κ, µ) = (x, m), and s−κ (ht , −κ, µ) = (−x, m). 9 See Van Weelden (2010) for a discussion of equilibrium selection in a similar setting. 7 Hence, if the candidates’ strategies are platform stationary, the voters will face the same choices in every period. The voters’ can then myopically optimize without concern for how their choices today will influence future political competition. As such, while I have assumed that all players have the same discount factor, only the patience of the candidates will influence the equilibrium outcomes. 2.3 Voter Behavior I now describe the voters’ behavior. As there are a continuum of voters, the probability of each voter being pivotal is 0. I assume that voters vote sincerely, in that if one of the two candidates would implement a platform which gives the voter a higher payoff than the other candidate then the voter would vote for that candidate. Definition 2 Sincere Voting Suppose the candidates are playing symmetric, platform stationary strategies. Then, if for voters of type v ∈ {−1, 0, 1}, uv (x, m) > uv (−x, m), then, for all ht and all voters of type v, sV (ht ) = κ. Similarly, if uv (x, m) < uv (−x, m), then, for all ht and voters of type v, sV (ht ) = −κ. When the candidates’ platforms are stationary, the voters will face the same choices in every period. That voters vote sincerely means simply that each individual will vote for the candidate who will implement the platform more to that voter’s liking when in office. Hence, when x 6= 0, all type 1 voters will vote for candidate κ and all type −1 voters will vote for candidate −κ. Similarly, when the policy space is multi-dimensional, all non-median voters will have a strict preference for one candidate (if platforms do not converge in that dimension) and so will vote for that candidate. By symmetry, the type 0 voters will be indifferent between the two candidates. I assume that the indifferent voters decide who to vote for based on the candidates’ performance in the last period: they vote to re-elect the incumbent if and only if the utility received in the last period is at least as high as the utility they would receive if the incumbent is replaced by the challenger. That is, I assume the indifferent voters decide on the basis of retrospective voting.10 10 Fiorina (1981) introduced the concept of retrospective voting in political science. See Barro (1973), Ferejohn (1986), Duggan (2000), and Van Weelden (2010) for formal models which use retrospective voting to provide incentives 8 Definition 3 Retrospective Voting Suppose the candidates are playing symmetric, platform stationary strategies. Then, if uv (x, m) = uv (−x, m), then, for all ht and all voters of type v, sV (ht ) = kt−1 , if and only if uv (pt−1 ) ≥ uv (x, m) = uv (−x, m). I consider only subgame perfect equilibria in which the voters vote sincerely and retrospectively and the candidates’ strategies are symmetric and platform stationary. Note that, in this complete information environment, it is the existence of voters who are willing to condition their vote on the incumbent’s past behavior that allows for equilibria in which candidates implement platforms other then their ideal policy with full rent seeking when in office. Notice also that, with retrospective voting, since the type 0 voters are pivotal, the incumbent will be re-elected forever on the equilibrium path. This is a consequence of full information and pure strategies. In Van Weelden (2010) it is shown that with imperfect monitoring the incumbent will be defeated with positive probability along the equilibrium path. I now turn to determining when divergence is beneficial. I consider first the case in which the policy is single-dimensional and then turn to the case in which policy is multi-dimensional. 3 3.1 Results for One Dimensional Policy Space Best equilibrium for each κ I first consider the case in which the policy space is single-dimensional. In order to determine the optimal degree of polarization between the candidates I first describe the best equilibrium when the candidates’ ideal policies are κ and −κ respectively. I begin by considering the case where κ = 0, so that the candidates and platforms converge. As the candidate cannot make binding commitments before taking office the candidate must have an incentive to implement the specified platform when in office in order to secure re-election. Further, as the platforms the candidates implement are independent of history, in order for the incumbent to lose office after deviating, the type-0 voters must be indifferent between re-electing the incumbent and electing the challenger. Incentive constraints for the incumbent politician then immediately imply that the minimum level to incumbent politicians. As has been noted in the previous literature (e.g. Fearon 1999) there can often be a friction between voting based on how the candidate performed in the past and how they are expected to perform in the future. I assume that if a voter strictly prefers the platform one candidate would implement they will always vote for that candidate (prospective voting), but vote retrospectively if they are indifferent. 9 of rent-seeking, m0 , that can be supported in a platform-stationary subgame perfect equilibrium with converging platforms can be determined from the equation, ḡ0 (0, m) = (1 − δ)γM + δg0 (0, m). (1) Note that, as ḡ0 (0, 0) = g0 (0, 0), the amount of rent-seeking in any equilibrium must be strictly positive.11 I now turn to describing the implemented policies and the amount of rent-seeking when candidates’ preferences do not converge, and then proceed to describe when the voters’ welfare is enhanced by this divergence. Consider a candidate of type κ > 0. First note that, as the gains from rent-seeking enter linearly, the ratio of marginal utilities between the candidate and the median voter with respect to rents is constant in the amount of rent-seeking. In equilibrium with retrospective voting – so that voters will vote based on the utility level – the ratio of marginal utilities over policies must be the same. So, in equilibrium, we must have ∂ḡκ (x,m) − ∂u ∂x 0 (x,m) ∂x = ∂ḡκ (x,m) − ∂u ∂m 0 (x,m) ∂m = γ, or that the level of rent-seeking is driven to 0. Simple algebraic manipulation then reveals that, in any symmetric and platform-stationary subgame perfect equilibrium with sincere and retrospective voting, if m ∈ (0, M ), we must have κ xκ = 1 . (2) 1 + γ λ−1 Now, as with convergence, it is straightforward to determine the minimum amount of rent-seeking, mκ , which can be supported in equilibrium from the incentive constraints of the elected candidate, ḡκ (xκ , m) = (1 − δ)γM + δgκ (−xκ , m). (3) The above equations then characterize the best equilibrium for each given level of divergence in candidate ideal points. Substituting into the above equations we can then solve for the level of rent-seeking in equilibrium for each κ: mκ = 1 (1 − δ)γM 1 κ λ + [1 − δ(1 + 2γ 1−λ )λ ]( 1 ) . γ+δ γ+δ 1−λ 1+γ (4) 1 Note that mκ is decreasing in κ if the candidates are sufficiently patient, δ > δ ≡ (1 + 2γ 1−λ )−λ , and increasing otherwise. Finally, when mκ is decreasing, define κ̄ to be the minimum κ for which mκ ≤ 0. If κ̄ < 1 then on (κ̄, 1] rent-seeking will be driven to 0, and we need only look for the minimum level of divergence which can be supported with candidates of that type. For each κ in 11 Note that we could allow candidates to receive an intrinsic benefit from holding office above and beyond the rents they secure for themselves – i.e. we could consider the case where ḡκ (0, 0) > gκ (0, 0). The analysis presented would go through provided ḡ0 (0, 0) < (1 − δ)ḡ0 (0, M ) + δg0 (0, 0), so that converging platforms and 0 rent-seeking could not be supported as an equilibrium. 10 [κ̄, 1] we can then define x(κ) to be the minimum level of policy divergence necessary to support an equilibrium with candidates of type κ and 0 rent-seeking. The first lemma shows that x(κ) has a unique minimizer on [κ̄, 1]. Lemma 1 There exists a unique minimizer, κ∗ , of x(κ) on [κ̄, 1]. 3.2 Welfare Implications of Divergence In this subsection, I consider when moving away from the center is beneficial to the voters from a utilitarian welfare perspective. As the electorate consists of both moderate and polarized voters, I consider the effect on each type of voter in turn. Consider first the moderate voters (v = 0). As shown in Van Weelden (2010), divergence will benefit such voters provided the players are reasonably patient.12 Recalling that u0 (x, m) = −xλ − m we have u0 (xκ , mκ ) = − 1 1 λ 1 1 (1 − δ)γM + (1 + γ 1−λ )−λ [δ((1 + 2γ 1−λ )λ − γ 1−λ ) − (1 + γ 1−λ )]κλ . γ+δ γ+δ So the welfare of the centrist voters is increasing in κ provided the players are sufficiently patient. How patient the players must be for divergence to benefit the centrist voters depends on how concave utility functions are. If utility functions are concave enough, it will be possible for divergence to benefit the centrist voters for any level of patience. Lemma 2 There exists δ̄0 (λ, γ) ∈ (0, 1) such that the welfare of the type 0 voters is maximized with candidates of type κ = 0 if and only if δ ≤ δ̄0 . When δ < δ̄0 , κ = 0 is the unique maximizer of the welfare of the type-0 voters. When δ > δ̄0 the welfare of type-0 is maximized with candidates of type κ= κ∗ if κ̄ < 1, 1 otherwise. Further, δ̄0 is decreasing in λ with limλ→∞ δ̄0 = 0. Now consider the effect of divergence on the welfare of the non-centrist voters. Obviously, if candidate κ > 0 is elected, the welfare of a voter of type 1 and of type −1 will be quite different. However, when using a utilitarian social welfare function, given that there are the same number of voters of each type, our concern is with the average of the two utilities. When utility functions are sufficiently concave the average welfare of the type −1 and type 1 voters’ utility will decrease by more than the type 0 voters’ utility as the implemented policy moves away from the center. First note that the utility of the two types of voters are u−1 (xκ , mκ ) = −mκ − (1 + xκ )λ , 12 See Van Weelden (2010) for a discussion and numerical examples which demonstrate that divergence is beneficial with patience well below 1 for reasonable parameters. 11 and u1 (xκ , mκ ) = −mκ − (1 − xκ )λ , respectively. Combining the two equations we can see that the average utility of the polarized voters is u−1 (xκ , mκ ) + u1 (xκ , mκ ) (1 − xκ )λ + (1 + xκ )λ = −mκ − . 2 2 Notice that, as voters’ utilities are concave, the aggregate effect of the policy being pulled from the center – holding the degree of rent-seeking constant – is negative, as pulling the policy away from the median (xκ > 0) decreases the welfare of the type −1 voter more than it increases the welfare of the type 1 voter. Holding the level of rent-seeking constant, the utility loss from policy xκ instead of 0 is given by (1−xκ )λ +(1+xκ )λ 2 −1. How the welfare of the polarized voters relates to the (1−xκ )λ +(1+xκ )λ − 1 and xλκ , which 2 < xλκ , and so the aggregate welfare centrist voters is then determined by the relationship between I discuss in the Appendix. When λ < 2, (1−xκ )λ +(1+xκ )λ 2 −1 of the polarized voters will decrease by less than the centrist voters when the policy is pulled from the center. Conversely, when λ > 2, we have (1−xκ )λ +(1+xκ )λ 2 − 1 > xλκ , and the polarized voters are harmed more by the policy being pulled away from the center than median voters are. Hence, it is possible that, while divergence is beneficial to moderate voters, the aggregate welfare of polarized voters is decreased. Due to the concavity of utility functions, polarized voters greatly dislike having the policy pulled further from their ideal point, and so the aggregate welfare of the polarized voters may decrease even though the welfare of the centrist voters is increased by divergence in candidate preferences. As such, when λ > 2, greater patience will be required for the welfare of the polarized voters to increase with divergent platforms than for the centrist voters. Nevertheless, I show that the polarized voters will, on average, benefit as well, provided the players are sufficiently patient. However, if utility functions are very concave, divergence will only increase the average welfare of the polarized voters if the players are extremely patient. Lemma 3 There exists δ̄1 (λ, γ) ∈ (0, 1) such that the welfare of the polarized candidates, u−1 (x, m)+ u1 (x, m), is maximized at κ = x = 0 if and only if δ ≤ δ̄1 . When δ < δ̄1 , κ = 0 is the unique maximizer of the welfare of the polarized voters. When δ > δ̄1 the welfare of polarized is maximized with candidates of a type κ > 0. When λ > 2 then the welfare is maximized with ∗ κ if κ̄ < 1, κ= 1 otherwise. Further, limλ→∞ δ̄1 = 1. So we have that the aggregate welfare of the polarized candidates is maximized with convergent candidates if and only if δ ≤ δ̄1 . Note that, when λ < 2 we will have δ̄1 < δ̄0 , and when λ > 2, 12 δ̄1 > δ̄0 . When λ = 2 the average welfare of the polarized voters will always equal the average welfare of the centrist voters, so δ̄1 = δ̄0 . As we are interested in maximizing the social welfare function u0 (x, m) + σ(u−1 (x, m) + u1 (x, m)), when δ̄1 6= δ̄0 , whether or not divergence is socially beneficial can depend on the distribution of voter preferences. The next proposition characterizes when divergence will be beneficial for each level of patience, and each distribution of voter ideal points, for a given degree of concavity, λ. Proposition 1 Concavity and Convergence 1. When δ ∈ (0, min{δ̄0 , δ̄1 }), for all σ ∈ (0, 21 ), the utilitarian social welfare function is maximized with κ = 0. 2. When δ ∈ (δ̄1 , δ̄0 ), there exists a σ̄(λ, δ) ∈ (0, 12 ) such that a utilitarian social welfare function is only maximized by setting κ = x = 0 if σ ≤ σ̄. When σ ∈ (σ̄, 12 ), welfare is maximized by a κ > 0. 3. When δ ∈ (δ̄0 , δ̄1 ), there exists a σ̄(λ, δ) ∈ (0, 21 ) such that a utilitarian social welfare function is only maximized by setting κ = x = 0 if σ ≥ σ̄. When σ ∈ (0, σ̄), welfare is maximized by a κ > 0. 4. When δ ∈ (max{δ̄0 , δ̄1 }, 1), for all σ ∈ (0, 21 ), the utilitarian social welfare function is maximized with κ > 0. The above proposition outlines the various levels of patience the players may have and the resulting welfare implications. As there are four components I address each part in turn. The first part of the above proposition says that, when the players are too impatient, it will always be optimal to choose converging platforms and candidates. The reason for this is simple. While x = 0 is the welfare maximizing policy, it can still be beneficial to elect candidates who will implement x 6= 0 as this makes it possible to discipline candidates more effectively. When δ is low, the threat of future punishments are less effective, so it is optimal to choose centrist candidates. The final part of the proposition says that, when the players are sufficiently patient, there exist equilibria with non-converging platforms which outperform those with median candidates and converging platforms regardless of the distribution of voters’ preferences. Parts 2 and 3 of Proposition 1 show that, for an intermediate degree of patience, whether or not divergence is beneficial will depend on the distribution of voters. When voter’s utility functions are not too concave – that is, when λ < 2 – the average welfare of polarized voters will decrease less than the welfare of the centrist voters when the policy is pulled away from the center, so less patience will be needed for divergence to benefit the polarized voters (δ̄1 < δ̄0 ). Hence, when 13 δ ∈ (δ̄1 , δ̄0 ), aggregate welfare will be enhanced with polarized parties, provided there is sufficient polarization in voters’ preferences. Conversely, when utility functions are more concave – λ > 2 – the average welfare of the polarized voters decreases more than for the centrist voters as the policy is pulled from the center. Hence, when δ ∈ (δ̄0 , δ̄1 ), so that there is sufficient patience for divergence to benefit the centrist voters but not to benefit the polarized voters, polarization in candidates and platforms will enhance aggregate welfare unless the preferences of voter are too polarized. An alternative way to view the above proposition is to consider how much patience is required to support divergence for each distribution of voter preferences. Holding λ fixed, for each σ, we can define δ̄(σ) to be the minimum level of patience such that, when δ > δ̄(σ), κ = x = 0 does not maximize a utilitarian social welfare function. Notice that δ̄(σ) ∈ [min{δ̄0 , δ̄1 }, max{δ̄0 , δ̄1 }] for all σ. Then we have the immediate corollary which says that, when λ > 2, the necessary patience is increasing in the degree of polarization, and when λ < 2 it is decreasing. Corollary 1 Patience and Polarization 1. When λ > 2 the degree of patience necessary for divergence to be beneficial is increasing in polarization: ∂ δ̄(σ) > 0. ∂σ 2. When λ < 2 the degree of patience necessary for divergence to be beneficial is decreasing in polarization. ∂ δ̄(σ) < 0. ∂σ So we see that, when λ > 2, as the voters become more polarized, greater patience is required to support divergence. In fact, when utility functions are very concave, we can say something stronger. Recall that by Lemma 2, no matter how impatient the players are, when utility functions are sufficiently concave, the welfare of the centrist voters will be enhanced by candidates moving away from the center. Similarly, by Lemma 3, no matter how patient the players are, when utility functions are sufficiently concave, the welfare of the polarized voters will be decreased by a lack of convergence in platforms. Hence, regardless of the level of patience, when utility functions are sufficiently concave, divergence in candidate platforms will increase aggregate welfare when enough voters are centrist, and decrease aggregate welfare when enough voters are polarized. Proposition 2 High Levels of Concavity For any δ ∈ (0, 1) there exists a λ̄(δ) ∈ (2, ∞) such that, for all λ > λ̄(δ), there exists a σ̄λ ∈ (0, 1/2) such that κ = 0 is socially optimal if and only if σ ≥ σ̄λ . So, we see that, for any level of patience, when utility functions are sufficiently concave, the welfare of centrist voters will be enhanced by divergent platforms, and the welfare of the polarized 14 voters will be decreased by divergence. Consequently, the aggregate welfare of the voters will be enhanced by polarized parties and candidates, if and only if the voters’ preferences are not too polarized. The results of this section, which has considered how the optimality of converging or nonconverging platforms is determined based on the level of concavity in voters utility functions (λ), the players’ patience (δ), and the distribution of voters preferences (σ), can be summarized as follows. Holding the degree of concavity in utility functions fixed, if the players are sufficiently patient, divergent platforms will always be beneficial for any symmetric distribution of voter preferences. Conversely, when players are impatient, convergence will always be optimal. For intermediate levels of patience, whether divergence or convergence is optimal will depend on the distribution of voter ideal points: the amount of patience necessary for divergence to be beneficial will be increasing in voter polarization when utility functions are sufficiently concave, and decreasing when utility functions are less concave. Alternatively, holding the amount of patience fixed, we have seen that, if the players’ utility functions are concave enough, divergence will benefit the centrist voters, but harm the polarized voters. Consequently, convergence will be optimal when voters are sufficiently polarized, and divergence will be optimal when voters are sufficiently centrist. Having now established when converging platforms will be optimal and when divergence will be beneficial, I now turn to analyzing the optimal divergence when divergence is beneficial. 3.3 Optimal Divergence Having now established that non-convergence in candidate preferences and platforms is welfare enhancing, with sufficient patience, I now turn to the question of how much divergence is optimal for different degrees of polarization in voter preferences. In this model, however, it will often be optimal to have no rent-seeking or full polarization in candidate preferences, making it difficult to take comparative statics. This is due to the functional form assumption that utility functions are linear in rents. With sufficient patience, it is optimal to increase divergence until the equilibrium rent-seeking is driven to 0, or there is maximum polarization in candidate preferences, κ = 1. Proposition 3 Optimal Divergence Suppose δ > max{δ̄0 , δ̄1 }. Then it is socially optimal to have κ = 1 if κ̄ ≥ 1 and κ = κ∗ ∈ [κ̄, 1] if κ̄ < 1. In this section, in order to consider comparative statics, I modify the model to ensure that the optimal polarization will not involve a “corner solution” with maximum polarization or 0 rentseeking. As noted in Van Weelden (2010), divergence can still be beneficial when utility functions are concave over rents, but the amount of rent-seeking will not be driven to 0. How much rentseeking will persist in the best equilibrium will then depend on the distribution of voter preferences. 15 In this subsection I consider a variation of the model in which voter’s utility functions are concave over rents. That is, the voters’ utility is uv (pt ) = −|xt − v|λ − m1+β t . 1+β When β = 0 the voters’ utility functions are exactly as in the previous subsection, but when β > 0 utility functions are concave. I then perturb the utility functions of the candidates k ∈ {−κ, κ} to be ḡk (pt ) = γ m1+β t − |xt − k|λ , 1+β if the candidate is elected, and gk (pt ) = −|xt − k|λ − mt + γ m1+β ( t − mt ), δ 1+β if they are not. While this then assumes that utility functions are convex over rents for the candidates, and introduces additional complexity in the candidates’ utility functions, allowing the stage game utility to depend on the level of patience for the candidate who is not elected, it guarantees that the platform each candidate implements in office will be the same as in the case with linear rents. This provides the simplest perturbation in which the rents will not be driven to 0 and so the degree of divergence will vary with the degree of polarization. In order to ensure that we are not at a corner solution I assume that M is neither too big or too small – since the disutility of rent-seeking depends on the level of rent-seeking, if M is too small it will not be beneficial to move away from the center, and if M is too large it will be optimal to diverge until reaching max polarization. The following proposition shows how polarization in candidates relates to polarization in voters. 16 Proposition 4 Optimal Divergence Suppose δ > min{δ̄0 , δ̄1 }. Then there exists β̄ > 0 such that for all β ∈ (0, β̄), there exists a non-degenerate interval (M , M̄ ), such that for all M ∈ (M , M̄ ), 1. when λ > 2 the optimal degree of polarization in candidates is strictly decreasing in the polarization of the voters. That is, there exists σ̄(λ, δ, M, β) ∈ (0, 12 ] such that, on (0, σ̄(λ, δ, M, β)), the best equilibrium involves less polarization, ∂κ∗ (σ) < 0; ∂σ and more rent-seeking, ∂x∗ (σ) < 0, ∂σ ∂m∗ (σ) > 0. ∂σ 2. when λ < 2 the optimal degree of polarization in candidates is strictly increasing in the polarization of the voters. That is, there exists σ̄(λ, δ, M, β) ∈ [0, 21 ) such that, on (σ̄(λ, δ, M, β), 12 ), the best equilibrium involves more polarization, ∂κ∗ (σ) > 0; ∂σ and less rent-seeking, ∂x∗ (σ) > 0, ∂σ ∂m∗ (σ) < 0. ∂σ So we see that, when not at a “corner solution” with full polarization or 0 rent-seeking, the optimal degree of polarization will depend on the distribution of voter preferences. When λ > 2, the benefits of divergence are greater for the centrist voters than for the polarized voters and so the degree of polarization is decreasing in the fraction of polarized voters. Conversely, when λ < 2, the benefits of polarization are greater for polarized voters, thus, the more polarized the voters the more polarization it will be optimal to have in the candidates. 4 Multiple Dimensional Policy Space In the previous section I considered the case in which the policy space is single dimensional. I now consider the model with multi-dimensional policy-space as defined in Section 2.1. In order to keep the analysis as simple as possible I assume that voters only hold non-median positions in one dimension. This also allows for a simple representation of which dimension is more polarized.13 13 In this model, regardless of the distribution of voter preferences, there will always exist pure strategy subgame perfect equilibria when there are only two candidates. In particular, there would always be an equilibrium in which both candidates implement their most preferred policy with full rent seeking at every history in which they are elected, and all voters simply vote for the candidate they prefer given that strategy. Such an equilibrium is consistent with retrospective and sincere voting. In addition, if the assumption that all voters hold centrist preferences in at least one dimension was replaced with the assumption than any voter who has a positive (negative) ideal point in 17 Recall that fraction σA of the voters are at either pole in one dimension and σB are at either pole in the other dimension. The remaining 1 − 2(σA + σB ) of the voters are located in the center at (0, 0). Note that, for each κ ∈ [0, 1]2 , we can determine the policy implemented in each dimension as before. As in the single dimensional case, recall that divergence is more beneficial to polarized voters than moderate voters when λ < 2 and more beneficial to moderate than polarized voters when λ > 2. Hence, when λ < 2, it is socially optimal to have greater divergence in the dimension in which voters are more polarized, and, when λ > 2, it is socially optimal to have divergence in the dimension in which voters are more centrist. Proposition 5 Dimension of Greater Polarization Suppose δ > min{δ̄(σA ), δ̄(σB )}, σA < σB and κ̄(M/2), κ∗ (M/2) < 1, 1. if λ > 2 then it will be optimal to have greater polarization in the dimension voters are less polarized in: xA > xB and κA > κB . 2. if λ < 2 then it will be optimal to have greater polarization in the dimension voters are more polarized in: xA < xB and κA < κB . So we have that, when λ > 2, it is socially optimal to have less polarization in the dimension in which voters are more polarized, and when λ < 2 it is socially optimal to have greater polarization in the dimension in which voters are more polarized. Note, finally, that we must have that M is not too large – κ∗ (M/2) < 1, so that if there was an equal level of polarization in each dimension, the most moderate policy with 0 rent-seeking would involve less than full polarization in candidate preferences – to ensure that polarization will differ in different dimensions. When M is very large, it may be optimal to simply have maximum polarization in both dimensions. Now I show that, for a range of parameters, not only is it socially optimal to have greater polarization in the dimension in which voters are more moderate, but it is socially optimal to have convergence in the dimension in which voters are more polarized. This occurs when the opportunities for rent-seeking are modest enough that only one dimension of polarization is necessary to dissuade elected candidates from securing rents for themselves. one dimension has a non-negative (non-positive) ideal point in the other dimension, the set of equilibria with sincere and retrospective voting would be completely unchanged. Banks and Duggan (2008) provide results on the existence and uniqueness of equilibria with retrospective voting and prospective voting in a setting with multiple dimensional policy spaces in which, unlike in this model, the candidates’ types are private information. 18 Proposition 6 Divergence and Convergence in Two-Dimensions Assume δ > max{δ̄0 , δ̄1 }, σA < σB , and κ̄(M ), κ∗ (M ) < 1. Then, 1. if λ > 2 the best equilibrium involves divergence in dimension A and convergence is dimension B: κA > 0 and xA > 0, and κB = xB = 0. 2. if λ < 2 the best equilibrium involves divergence in both dimensions, but greater divergence in dimension B: 0 < κA < κB and 0 < xA < xB . So we see that, from a social welfare perspective, when utility functions are sufficiently concave – λ > 2 – it is optimal to have divergence in the dimension in which voters are less polarized and convergence in the dimension in which voters are more polarized. While this may sound counterintuitive at first the reason for this result is very simple. The more polarized voters are, if utilities are sufficiently concave, the more costly having a non-median policy implemented is for the voters – as the voters on the extremes are most sensitive to changes in policies. Hence, by having divergence in only the dimension in which voters are more moderate, there are fewer non-centrist voters who are harmed by this divergence. When λ < 2, non-median policies are less harmful, on average, to polarized voters, so we get greater polarization in the dimension in which voters are more polarized. Unlike in the case where λ > 2, however, we get divergence in both dimensions. This is because, when λ < 2, the aggregate welfare in one policy dimension, −σ[(1 + x)λ + (1 − x)λ ] − (1 − 2σ)xλ is strictly concave in κλ , so the benefits from moving away from the center decrease the further away from the center the candidates move, and thus there is always an advantage to moving a positive distance away from the center in both dimensions, regardless of the distribution of preferences. When λ > 2, −σ[(1 + x)λ + (1 − x)λ ] − (1 − 2σ)xλ is convex in κλ , so it is optimal to choose the most advantageous dimension and polarize only in that dimension. 5 Discussion The focus of this paper has been on comparing the welfare across equilibria with different degrees of polarization in candidate preferences. While earlier work in Van Weelden (2010), has endogenized the candidates, my analysis here has focused on the normative comparison across different, exogenously given, candidates. The aim in this analysis is then less to explain why parties locate the way they do than to question the usual conclusions about the welfare implications of diverging platforms. I have shown that, from a social welfare perspective, polarization in the candidates can be welfare enhancing, whether the voters themselves are polarized or not. In fact, for a range of utility functions, polarization is more advantageous when the voters themselves are more centrist. In recent years political scientists (see Fiorina et al 2006, McCarty et al 2006) have observed that the political parties have become more polarized. Further, in recent elections, parties have 19 been less polarized over economic policy and more polarized with respect to social policy (see Figure 9.5 in Fiorina et al 2006).14 This is despite the fact that income remains the best predictor of an individual’s vote (McCarty et al 2006) and despite an apparent lack of polarized views among the American public on social issues (Fiorina et al 2006). In the model presented, far from being a sign of democratic failure, such an outcome – when utility functions are sufficiently concave (λ > 2) – is simply the most efficient way to motivate candidates not to shirk their responsibilities when in office. Divergence over the dimension in which only a small fraction of the voters hold extreme preferences (social policy) and convergence in the dimension in which voters are more polarized (taxation and economic policy), would, in fact, be the socially efficient way to discipline elected officials. It could even provide a possible explanation for changes in the dimension of polarization over time. If an increase in income inequality over time leads to increased polarization in voter preferences over economic policy, as argued in McCarty et al (2006), my results indicate that, if λ > 2, it would be optimal to move from parties polarized on economic issues to parties polarized on social issues – precisely because there are so few voters who hold extreme positions on these issues. The key assumption driving the divergent platforms result, in this paper and in Van Weelden (2010), is that utility functions are strictly concave (λ > 1). When utility functions are not concave, there is no benefit to divergence for any level of patience. While concavity is usually assumed about voter utility functions, it is not necessarily clear that welfare functions always have that shape.15 What does concavity of utility functions mean in a voting context? One interpretation of the concavity of voters utility functions is that, with each policy issue that goes against an individual’s liking, the more upset the voter is are about the next issue policy outcome which goes against their preferences. That is, an individual who supports free markets, would be more angry about a new tariff being introduced if it follows a tax increase rather than a tax cut. Such an assumption about voter’s preferences certainly seems plausible and may be possible to test in a laboratory In this case, the form of the optimal polarization will depend on whether or not λ > 2. Note that λ = 2 corresponds to a quadratic loss-function. Recall that, with a quadratic loss function, individuals care only about the expected distance from their ideal point, and the variance in policy. As x increases, the expected policy – if each candidate is elected with probability 1/2 in the first period – remains 0 and the variance is simply x. Hence, the average utility change for polarized and centrist voters is the same. λ > 2 then corresponds to the case in which utilities are more concave than the quadratic utility function – that is, increasing the variance in policy is more 14 See also Gerring (1998), who shows that the fraction of the Democratic party’s platform devoted to social and cultural issues has increased substantially since the 1970s. 15 See Osborne (1995) for a discussion of the assumption that utility functions are concave. He concludes that “in the absence of any convincing empirical evidence, it is not clear which of the assumptions [concave utility functions or not] is more appropriate”. 20 harmful to a voter the further the expected policy is from the voter’s bliss point. When increasing variance increases the disutility more the further the expected policy is from the voter’s ideal point, polarized voters will be harmed more, on average, by changes in policy around the median voter’s ideal point. In essence, λ > 2 means that utility functions get more concave the further the policy is from the voter’s ideal policy. Similarly, λ < 2 corresponds to the case in which utility functions are less concave than the quadratic. Another feature to notice about utility functions with λ > 2 is that, calculating the third derivative of the voter’s utility function as the policy moves towards the λ−3 > 0. Whether the third derivate of the utility voter’s ideal point, u000 v (x) = λ(λ − 1)(λ − 2)|x − v| function is positive or not has been extensively discussed in the macroeconomics literature – a positive third derivative corresponding to a precautionary savings motive in life cycle consumption (Leland 1968), which has been empirically supported (e.g. Gourinchas and Parker 2001). 6 Conclusions In this paper I consider how polarization in candidate platforms relates to the polarization of voter preferences. While it is often assumed that the positions of candidates closely mimic those of the voters, this does not have to be the case. While there are many parameters for which the optimal divergence will mirror the distribution of voter preferences, I have shown that, when utility functions are sufficiently concave, the welfare of the voters may be higher with candidates who have policy preferences much different than the majority of the voters: who are polarized on issues where the voters are moderate and centrist where the voters are polarized. This provides a new explanation for why parties are polarized when voters are not, and a possible explanation for why parties have diverged over social issues despite a lack of evidence that voter preferences are similarly polarized. Interestingly, for a range of preferences, divergence over such issues is, in fact, the most efficient way to create incentives for candidates not to abuse their position once elected: elected officials are disciplined by the threat of changes in policy which matter greatly to them, but which are relatively unimportant to the comparatively moderate electorate. 7 References Banks, Jeffrey and John Duggan (2008). “A Dynamic Model of Democratic Elections in Multidimensional Policy Spaces.” Quarterly Journal of Political Science, 3 (3): 269-299. Barro, Robert (1973). “The Control of Politicians: An Economic Model.” Public Choice, 14: 19-42. Bernhardt, Dan, John Duggan, and Francesco Squintani (2009). “Private Polling in Elections and Voters’ Welfare.” Journal of Economic Theory, 144 (5): 2021-2056. 21 Duggan, John (2000). “Repeated Elections with Asymmetric Information.” Economics and Politics, 12 (2): 109-135. Fearon, James (1999). “Electoral Accountability and the Control of Politicians: Selecting Good Candidates versus Sanctioning Poor Performance.” In Democracy, Accountability, and Representation, eds. A. Przeworski, S.C. Stokes and B. Manin. Cambridge University Press. Ferejohn, John (1986). “Incumbent Performance and Electoral Control.” Public Choice, 50: 5-25. Fiorina, Morris (1981). Retrospective Voting in American Elections. Yale University Press. Fiorina, Morris, with Saumel Adams and Jeremy Pope (2006). Culture War? The Myth of a Polarized America. Pearson Longman. Gerring, John (1998). Party Ideologies in America, 1828-1996. Cambridge University Press. Glaeser, Edward, Giacomo Ponzetto, and Jesse Shapiro (2005). “Strategic Extremism: Why Republicans and Democrats Divide on Religious Values.” Quarterly Journal of Economics, 120 (4): 1283-1330. Gourinchas, Pierre-Olivier and Jonathan Parker (2001). “The Empirical Importance of Precautionary Savings.” American Economic Review, 91 (2): 406-412. Kamada, Yuichiro and Fuhito Kojima (2010). “Voter Preferences, Polarization, and Electoral Policies.” Working Paper. Kedar, Orit (2005). “When Moderate Voters Prefer Extreme Parties: Policy Balancing in Parliamentary Elections.” American Political Science Review, 99 (2): 185-199. Leland, Hayne (1968). “Saving and Uncertainty: the Precautionary Demand for Saving.” Quarterly Journal of Economics, 82 (3): 465-473. McCarty, Nolan, Keith Poole, and Howard Rosenthal (2006). Polarized America: The Dance of Ideology and Unequal Riches. MIT Press, Cambridge, MA. Osborne, Martin (1995). “Spatial Models of Political Competition Under Plurality Rule: A Survey of Some Explanations of The Number of Candidates and The Positions They Take.” Canadian Journal of Economics, 27: 261-301. Van Weelden, Richard (2010). “Candidates, Credibility, and Re-election Incentives.” SSRN Working Paper No. 1465504. 22 8 8.1 Appendix Relationship between xλ and (1 + x)λ + (1 − x)λ In this subsection I present some properties of xλ and (1 + x)λ + (1 − x)λ which will be useful for establishing the results of this paper. Of particular relevance is the relationship between (1+x)λ +(1−x)λ −2 , 2 the average utility loss for voters when the policy is x rather than 0, and xλ , the utility loss for the type 0 voters. The proofs of these properties are in the Supplemental Appendix. Property 1 (1+x)λ +(1−x)λ −2 2 − xλ is negative and decreasing in x if λ < 2, constant in x and equal to 0 if λ = 2, and positive and increasing in x if λ > 2. Property 2 (1+x)λ +(1−x)λ −2 2xλ Property 3 (1+x)λ−1 −(1−x)λ−1 2xλ−1 is decreasing in x if λ > 2 and increasing in x if λ < 2. is greater than 1 and decreasing in x if λ > 2, and less than 1 and increasing in x if λ < 2. Property 4 limx→0 (1+x)λ−1 −(1−x)λ−1 2xλ−1 Property 5 limx→0 (1+x)λ +(1−x)λ −2 2xλ 8.2 = ∞ if λ > 2 and limx→0 = ∞ if λ > 2 and limx→0 (1+x)λ−1 −(1−x)λ−1 2xλ−1 (1+x)λ +(1−x)λ −2 2xλ = 0 if λ < 2. = 0 if λ < 2. Proofs Proof of Lemma 1 See Supplemental Appendix. Proof of Lemma 2 See Supplemental Appendix. Proof of Lemma 3 See Supplemental Appendix. Proof of Proposition 1 Immediate from Lemmas 2 and 3. Proof of Proposition 2 Immediate from Lemmas 2 and 3. Proof of Proposition 3 There are two cases to consider, λ ≤ 2 and λ > 2. First suppose λ ≤ 2. Then max{δ̄0 , δ̄1 } = δ̄0 . We have established in Lemma 2 that the welfare of the type 0 voters is maximized at (x(κ∗ ), 0) if κ̄ < 1, ∗ ∗ ∗ (x , m ) = x = (x1 , m1 ) otherwise. 23 Now, since λ ≤ 2, (1 + xκ )λ + (1 − xκ )λ − 2 − xλκ , 2 is (at least) weakly decreasing in xκ and hence also κ. Hence if u0 (xκ , mκ ) = −mκ − xλκ , is increasing for all κ ∈ [0, κ̄], then (1 + xκ )λ + (1 − xκ )λ − 2 u−1 (xκ , mκ ) + u−1 (xκ , mκ ) = −mκ − , 2 2 is also increasing in κ for κ ∈ [0, κ̄]. Hence the welfare of the polarized voters will be maximized at (x(κ∗ ), 0) if κ̄ < 1, ∗ ∗ ∗ (x , m ) = x = (x1 , m1 ) otherwise. This is the same point as that which maximized the welfare of the type-0. As the social welfare function is simply a weighted average of centrist and polarized voters’ utilities, we are done. Now consider the case in which λ > 2. Then max{δ̄0 , δ̄1 } = δ̄1 . First note that, as δ > δ̄0 the welfare of the type 0 voters is maximized by (x(κ∗ ), 0) if κ̄ < 1, ∗ ∗ ∗ (x , m ) = x = (x1 , m1 ) otherwise. Next note that, as we have established in Lemma 3, when λ > 2 and δ > δ̄1 , the aggregate welfare of the polarized voters is maximized at ∗ ∗ ∗ (x , m ) = x = (x(κ∗ ), 0) if κ̄ < 1, (x1 , m1 ) otherwise. This is the same point as that which maximized the welfare of the type-0 voters. As the social welfare function is simply a weighted average of centrist and polarized voters’ utilities, we are done. I now prove Proposition 4. To do so I first prove a couple of Lemmas. The first Lemma show that, for each κ, the amount of divergence in candidates is the same as in the case with linear rents. Lemma 4 Suppose δ > δ and κ ∈ (0, κ̄). Then the best equilibrium involving candidates of type κ and −κ is determined by κ xκ = 1 , 1 + γ λ−1 mκ = 1 (1 − δ)M 1+β 1 κ λ + [1 − δ(1 + 2γ 1−λ )λ ]( 1 ) . (γ + δ)(1 + β) γ + δ 1 + γ 1−λ 24 Proof. See Supplemental Appendix. The second Lemma shows that, for the range of parameters considered in Proposition 4, it is socially optimal to have divergence, but not to have so much divergence that rents are driven to 0 or the maximum polarization. Lemma 5 For all δ > min{δ̄0 , δ̄1 }, there exists a β̄ ∈ (0, 1] such that, for all β ∈ (0, β̄), there exists a non-degenerate interval (M , M̄ ), such that for all M ∈ (M , M̄ ), 1. if λ > 2 then there exists σ̄(λ, δ, M, β) ∈ (0, 21 ] such that the socially optimal level of divergence is κ∗ ∈ (0, min{κ̄, 1}) when σ ∈ (0, σ̄). 2. if λ < 2 then there exists σ̄(λ, δ, M, β) ∈ [0, 12 ) such that the socially optimal level of divergence is κ∗ ∈ (0, min{κ̄, 1}) when σ ∈ (σ̄, 21 ). Proof. See Supplemental Appendix. Proof of Proposition 4 By Lemmas 4 and 5 we can restrict attention to equilibria with κ ∈ (0, min{κ̄, 1}), and in which the platform implemented in every period in equilibrium is κ xκ = 1 , 1 + γ λ−1 mκ = 1 (1 − δ)M 1+β 1 κ λ + [1 − δ(1 + 2γ 1−λ )λ ]( 1 ) . (γ + δ)(1 + β) γ + δ 1 + γ 1−λ I now turn to analyzing the optimal level of divergence on the interval (0, min{κ̄, 1}). Taking derivative of w(κ) with respect to κ we get w0 (κ) = −mβκ ∂xκ ∂mκ +λ [σ(1 − xκ )λ−1 − σ(1 + xκ )λ−1 − (1 − 2σ)xλ−1 κ ], ∂κ ∂κ which is equal to 0 if and only if −mβκ ∂mκ ∂xκ =λ [σ(1 + xκ )λ−1 − σ(1 − xκ )λ−1 + (1 − 2σ)xλ−1 κ ]. ∂κ ∂κ 1 I first consider how mκ varies with κ. As δ > min{δ0 , δ1 } > (1 + 2γ 1−λ )−λ , 1 1 λ 1 ∂mκ λ 1 λ λ λ−1 = [1−δ(1+2γ 1−λ )λ ]( = [1−δ(1+2γ 1−λ )λ ]γ λ−1 (1+γ λ−1 )xλ−1 < 0, 1 ) κ κ ∂κ γ+δ γ+δ 1 + γ 1−λ and 1 ∂ 2 mκ λ(λ − 1) 1 λ λ−2 = [1 − δ(1 + 2γ 1−λ )λ ]( < 0, 1 ) κ 2 ∂κ γ+δ 1−λ 1+γ so mκ is decreasing and concave in κ. 25 I now show that, when λ > 2 the optimal amount of polarization is decreasing in σ, and when λ < 2 it is increasing in σ. First consider the case where λ ≤ 2. Recalling that 1 ∂xκ = 1 , ∂κ 1 + γ λ−1 the F.O.C. for welfare maximization reduces to Cmβκ = where C = 1 γ+δ [δ(1 σ(1 + xκ )λ−1 − σ(1 − xκ )λ−1 + (1 − 2σ)xλ−1 κ , xλ−1 κ 1 λ 1 + 2γ 1−λ )λ − 1]γ λ−1 (1 + γ λ−1 )2 is constant with respect to κ. Now note that mκ is decreasing in κ and, because λ < 2, (1 + xκ )λ−1 − (1 − xκ )λ−1 σ(1 + xκ )λ−1 − σ(1 − xκ )λ−1 + (1 − 2σ)xλ−1 κ = 1 − 2σ + 2σ , λ−1 2xλ−1 xκ is increasing in κ. Hence there exists a unique κ∗ which satisfies w0 (κ) = 0 and the welfare of the voters is increasing for κ < κ∗ and decreasing for κ > κ∗ . So we have that κ∗ ∈ (0, κ̄) maximizes aggregate welfare. Finally, since for λ < 2, (1+xκ )λ−1 −(1−xκ )λ−1 2xλ−1 κ is less than 1, we can conclude that ∂κ∗ > 0. ∂σ Now I turn to the case in which λ > 2. When λ > 2 there may be multiple solutions to the FOC condition. As in the case where λ < 2 the FOC is satisfied reduces to Cmβκ = 1 − 2σ + 2σ (1 + xκ )λ−1 − (1 − xκ )λ−1 . 2xλ−1 Note that now both sides of the equation are decreasing in κ. Further, (1 + xκ )λ−1 − (1 − xκ )λ−1 = ∞, κ→0 2xλ−1 κ lim so w0 (κ) < 0 for κ sufficiently small. Note, however, that we have established that κ = 0 is not the maximizer, so we must have w0 (κ) > 0 at some point. I now show that once w0 (κ) becomes positive it will remain positive until it becomes negative at κ∗ , the point which maximizes voters’ welfare. To see this, first note that 1 − 2σ + 2σ (1 + xκ )λ−1 − (1 − xκ )λ−1 2xλ−1 is decreasing and convex in κ. This follows from the fact that ∂ (1 + xκ )λ−1 − (1 − xκ )λ−1 λ−1 = [(1 − x)λ−2 − (1 + x)λ−2 ] λ−1 ∂x 2x 2xλ and so ∂ 2 (1 + xκ )λ−1 − (1 − xκ )λ−1 λ−1 = λ+1 (2[(1 + x)λ−3 + (1 − x)λ−3 ] + λ[(1 + x)λ−3 − (1 − x)λ−3 ]) > 0. 2 λ−1 ∂x 2x 2x 26 Next note that since mκ is decreasing and concave, Cmβκ is decreasing and concave for all β ∈ (0, β̄] λ−1 −(1−x since β̄ ≤ 1. Hence, Cmβκ and 1 − 2σ + 2σ (1+xκ ) 2xλ−1 λ−1 κ) cross at two points, the first of which is a minimum and the second a maximum. So we have established that there exists κ∗ to maximize the aggregate welfare of the voters, which is the greater of the two solutions to Cmβκ = 1 − 2σ + 2σ Now, because λ > 2 and so (1 + xκ )λ−1 − (1 − xκ )λ−1 . 2xλ−1 (1+xκ )λ−1 −(1−xκ )λ−1 2xλ−1 > 1 we can then conclude that ∂κ∗ < 0. ∂σ Finally note that as we have established that κ∗ is increasing in κ for λ < 2 and decreasing in κ for λ > 2, and that m is decreasing in κ and x is increasing in κ, we are done. I now turn to proving the results for a two dimensional policy space. I first prove the following simple lemma. Lemma 6 δ > min{δ̄(σA ), δ̄(σB )}. Then, if the optimal equilibrium involves κA > 0 and κB > 0, we must have xB xA = . κA κB Proof. See Supplemental Appendix. Proof of Proposition 5 First note that, since δ > min{δ̄(σA ), δ̄(σB )}, aggregate voter welfare can be enhanced by moving away from the center in at least one dimension. Also, since κ̄(M/2), κ∗ (M/2) < 1, it cannot be optimal to have κA = κB = 1: there exists κ ∈ [κ̄(M/2), 1) such that there is less polarization in policies with candidates of type (κ, κ) and (−κ, −κ) than with candidates of type (1, 1) and (−1, −1). Hence we can conclude that we must have (0, 0) < (κA , κB ) < (1, 1). I now show that it cannot be optimal to have κA > κB when λ < 2, or to have κA < κB when λ > 2. To see this, recall that by Lemma 6, if κA > κB then we must have xA > xB , and if κA < κB then we must have xA < xB . Note also that ḡ(κA ,κB ) (xA , xB , m) ≥ (1 − δ)γM + δg(κA ,κB ) (−xA , −xB , m), if and only if ḡ(κB ,κA ) (xB , xA , m) ≥ (1 − δ)γM + δg(κB ,κA ) (−xB , −xA , m). 27 Hence, in order for (κA , κB ) to be welfare maximizing, we must have (1 − 2σA − 2σB )u(0,0) (xA , xB , m) + σA [u(−1,0) (xA , xB , m) + u(1,0) (xA , xB , m)] + σB [u(0,−1) (xA , xB , m) + u(0,1) (xA , xB , m)] ≥ (1 − 2σA − 2σB )u(0,0) (xB , xA , m) + σA [u(−1,0) (xB , xA , m) + u(1,0) (xB , xA , m)] + σB [u(0,−1) (xB , xA , m) + u(0,1) (xB , xA , m)] Removing constant terms and rearranging we get that this holds if and only if (σB − σA )([(1 + xA )λ + (1 − xA )λ − 2 − 2xλA ] − [(1 + xB )λ + (1 − xB )λ − 2 − 2xλB ]) ≥ 0. As σA < σB this holds if and only if (1 + xA )λ + (1 − xA )λ − 2 − 2xλA ≥ (1 + xB )λ + (1 − xB )λ − 2 − 2xλB . However, recall that, when λ < 2, (1 + x)λ + (1 − x)λ − 2 − 2xλ is decreasing in x, so this cannot be satisfied with κA > κB . Similarly, when λ > 2, (1 + x)λ + (1 − x)λ − 2 − 2xλ is increasing in x, so this cannot be satisfied when κA < κB . Hence we can conclude that κA ≤ κB when λ < 2 and κA ≥ κB when λ > 2. Finally, I show that it cannot be optimal to have κA = κB ∈ (0, 1) when λ 6= 2. Now recall that, in the best equilibrium, the incentive constraint of the incumbent must hold with equality so we must have ḡ(κA ,κB ) (xA , xB , m) = (1 − δ)γM + δg(κA ,κB ) (xA , xB , m). Holding m fixed, for each κA we can then solve for the corresponding κB which would be necessary for the incumbent’s incentive constraint to hold with equality given that, by Lemma 5, xi = ρ−1 κi . Solving for κB in terms of κA is then equivalent to solving for xB in terms of xA , which we can do from the equation −(1 − ρ)λ [xλA + xλB ] + γm = (1 − δ)γM − δ(1 + ρ)λ [xλA + xλB ] − δm. Note that from this equation xλA + xλB = (1 − δ)γM − (γ + δ)m , δ(ρ + 1)λ − (ρ − 1)λ and so ∂xB xA = −( )λ−1 . ∂xA xB In particular, note that, when xA = xB , ∂xB /∂xA = −1. Now, if we can show that the derivative of the aggregate welfare is non-zero with respect to xA when λ 6= 2 and xA = xB , we could then 28 conclude that κA = κB is not a maximum. Now recall that the aggregate welfare is (1 − 2σA − 2σB )u(0,0) (xA , xB , m) + σA [u(−1,0) (xA , xB , m) + u(1,0) (xA , xB , m)] + σB [u(0,−1) (xA , xB , m) + u(0,1) (xA , xB , m)] = −(1 − 2σA )xλA − (1 − 2σB )xλB − σA [(1 + xA )λ + (1 − xA )λ ] − σB [(1 + xB )λ + (1 − xB )λ ] − m. Taking derivative with respect to xA and evaluating when xB = xA this becomes λ(σB − σA )[(1 + xA )λ−1 − (1 − xA )λ−1 − 2xλ−1 A ], which, as σA 6= σB and λ 6= 2, is never equal to 0. Hence we can conclude that the optimal degree of polarization does not involve κA = κB . We then conclude that, when λ < 2 we must have xA < xB and κA < κB , and that when λ > 2 we must have xA > xB and κA > κB . Proof of Proposition 6 I first consider whether or not the best equilibrium will involve divergence in both dimensions when λ 6= 2. As κ̄ < 1, and since we have established that if moving away from the center is beneficial in either dimension it will be beneficial until rents are driven to 0 when δ > max{δ̄0 , δ̄1 }, we need consider only candidates such that the level of rent-seeking is driven to 0. Note further that as κ∗ < 1, the constraint κA , κB ≤ 1 cannot bind. Now suppose κA > 0 and κB > 0. Recall that by Lemma 6 we then have xB xA = ≡ ρ−1 . κA κB Given this, for each κA and κB , we can then implicitly determine the level of polarization from the equation −(κA − xA )λ − (κB − xB )λ = (1 − δ)γM − δ[(κA + xA )λ + (κB + xB )λ ]. Note that we can decrease the number of variables by substituting in ρxA for κA and ρxB for κB , yielding −(ρ − 1)λ [xλA + xλB ] = (1 − δ)γM − δ(ρ + 1)λ [xλA + xλB ], which reduces to xλA + xλB = (1 − δ)γM . δ(ρ + 1)λ − (ρ − 1)λ Note from this equation that ∂xB xA = −( )λ−1 . ∂xA xB Now recall that the welfare of the voters is given by −(1 − 2σA )xλA − (1 − 2σB )xλB − σA [(1 + xA )λ + (1 − xA )λ ] − σB [(1 + xB )λ + (1 − xB )λ ]. 29 In addition, since we know that κA , κB < 1, if we are to have a solution with κA > 0 and κB > 0 then we must the derivative of −(1 − 2σA )xλA − (1 − 2σB )xλB − σA [(1 + xA )λ + (1 − xA )λ ] − σB [(1 + xB )λ + (1 − xB )λ ], with respect to xA , λ−1 −λ[(1−2σA )xλ−1 A +(1−2σB )xB is equal to 0. Substituting in for σA (1 − ∂xB ∂xB +σA [(1+xA )λ−1 −(1−xA )λ−1 ]−σB [(1+xB )λ−1 −(1−xB )λ−1 ] ] ∂xA ∂xA ∂xB ∂xA and dividing by xλ−1 this becomes A (1 + xB )λ−1 − (1 − xB )λ−1 (1 + xA )λ−1 − (1 − xA )λ−1 ) = σ (1 − ). B 2xλ−1 2xλ−1 A B Now, when λ = 2, (1 + xA )λ−1 − (1 − xA )λ−1 (1 + xB )λ−1 − (1 − xB )λ−1 = = 1, 2xλ−1 2xλ−1 B A and we are at a maximum regardless of the dimension of polarization. Now consider when λ < 2. Recall by Properties 1 and 2 in the previous subsection that (1 + x)λ−1 − (1 − x)λ−1 , 2xλ−1 is less than 1 and increasing in x. Hence as σA < σB this implies that we must have xB > xA . Finally, as (1 + xA )λ−1 − (1 − xA )λ−1 = ∞, xA →0 2xλ−1 A lim there exists a solution of this form with 0 < xA < xB . Finally, note that, as the objective function is strictly concave, the first order conditions are also sufficient, so this is the point which maximizes social welfare. Now consider the case where λ > 2. Recall by Properties 1 and 2 in the previous subsection that (1 + x)λ−1 − (1 − x)λ−1 , 2xλ−1 is greater than 1 and decreasing in x. Hence as σA < σB this implies that we must have xB > xA . However, by Proposition 5, the optimal divergence when λ > 2 must involve xA > xB , and so this point is not a local maximum. Hence, we cannot have a maximum with κA > 0 and κB > 0 and so, when λ > 2 there will be divergence in one, and only one, dimension. Finally as (1 + x)λ + (1 − x)λ > xλ , when λ > 2, since σA < σB , it is socially optimal to have divergence in dimension A when λ > 2 and convergence in dimension B. 30 9 Supplemental Appendix 9.1 Proofs of the relationships between xλ and (1 + x)λ + (1 − x)λ Property 1 (1+x)λ +(1−x)λ −2 2 − xλ is negative and decreasing in x if λ < 2, constant in x and equal to 0 if λ = 2, and positive and increasing in x if λ > 2. Proof. As (1+x)λ +(1−x)λ −2 2 − xλ is equal to 0 when x = 0 for all λ we need only consider whether it is increasing or decreasing. The derivative of λ[ (1+x)λ +(1−x)λ −2 2 − xλ with respect to x is (1 + x)λ−1 − (1 − x)λ−1 − xλ−1 ]. 2 Note that when λ = 2 the derivative is equal to 0. Further, as x ∈ (0, 1), (1+x)λ−1 −(1−x)λ−1 2 − xλ−1 is increasing in λ. Hence, the derivative is positive when λ > 2 and negative when λ < 2. Property 2 (1+x)λ +(1−x)λ −2 2xλ Proof. The derivative of is decreasing in x if λ > 2 and increasing in x if λ < 2. (1+x)λ +(1−x)λ −2 2xλ with respect to x is 2λxλ [(1 + x)λ−1 − (1 − x)λ−1 ] − 2λxλ−1 [(1 + x)λ + (1 − x)λ − 2] = 4x2λ λ [x(1 + x)λ−1 + x(1 − x)λ−1 − (1 + x)λ − (1 − x)λ + 2] = 2xλ+1 λ [2 − (1 + x)λ−1 − (1 − x)λ−1 ]. 2xλ+1 Notice that when x = 0, 2 − (1 + x)λ−1 − (1 − x)λ−1 = 0, and that ∂[2 − (1 + x)λ−1 − (1 − x)λ−1 ] = (λ − 1)[(1 − x)λ−2 − (1 + x)λ−2 ], ∂x which is positive if λ < 2, negative if λ > 2 and equal to 0 if λ = 2. As can then conclude that (1+x)λ +(1−x)λ −2 2xλ λ 2xλ+1 is always positive we is increasing if λ < 2, decreasing if λ > 2, and constant if λ = 2. Property 3 (1+x)λ−1 −(1−x)λ−1 2xλ−1 is greater than 1 and decreasing in x if λ > 2, and less than 1 and increasing in x if λ < 2. Proof. By Property 1, (1 + x)λ−1 − (1 − x)λ−1 > 2xλ−1 when λ > 2 and (1 + x)λ−1 − (1 − x)λ−1 < 2xλ−1 when λ < 2 so we must only consider the statics. The derivative of 2(λ − 1) (1+x)λ−1 −(1−x)λ−1 2xλ−1 xλ−1 [(1 + x)λ−2 + (1 − x)λ−2 ] − xλ−2 [(1 + x)λ−1 − (1 − x)λ−1 ] = 4x2(λ−1) λ−1 [x(1 + x)λ−2 + x(1 − x)λ−2 − (1 + x)λ−1 + (1 − x)λ−1 ] = 2xλ λ−1 [(1 − x)λ−2 − (1 + x)λ−2 ], 2xλ 31 is which is negative if λ > 2 and positive if λ < 2. Property 4 limx→0 (1+x)λ−1 −(1−x)λ−1 2xλ−1 = ∞ if λ > 2 and limx→0 (1+x)λ−1 −(1−x)λ−1 2xλ−1 = 0 if λ < 2. Proof. Apply L’Hopital’s rule. Then we get (1 + x)λ−2 + (1 − x)λ−2 (1 + x)λ−1 − (1 − x)λ−1 = lim . x→0 x→0 2xλ−1 2xλ−2 lim Now if λ > 2 then limx→0 2xλ−2 = 0 and limx→0 (1 + x)λ−2 + (1 − x)λ−2 = 2 so (1 + x)λ−1 − (1 − x)λ−1 = ∞. x→0 2xλ−1 lim Now if λ < 2 then limx→0 2xλ−2 = ∞ and limx→0 (1 + x)λ−2 + (1 − x)λ−2 = 2 so (1 + x)λ−1 − (1 − x)λ−1 = 0. x→0 2xλ−1 lim Property 5 limx→0 (1+x)λ +(1−x)λ −2 2xλ = ∞ if λ > 2 and limx→0 (1+x)λ +(1−x)λ −2 2xλ = 0 if λ < 2. Proof. As both the numerator and denominator go to 0 as x → 0, this is immediate by Property 4 and L’Hopital’s rule. 9.2 Proofs of Lemmas Proof of Lemma 1 For any κ ≥ κ̄ we can define x(κ) to be the unique solution to (1 − δ)γM − δ(κ + x(κ))λ = −(κ − x(κ))λ . Differentiating with respect to κ and re-arranging we have [δ(κ + x(κ))λ−1 + (κ − x(κ))λ−1 ]x0 (κ) = (κ − x(κ))λ−1 − δ(κ + x(κ))λ−1 . So we have that x0 (κ) = 0 if and only if δ(κ + x(κ))λ−1 = (κ − x(κ))λ−1 , which implies that 1 1 − δ λ−1 1 κ. 1 + δ λ−1 Plugging this into the original expression for x(κ) we get that x(κ) = λ (1 − δ)γM = 2λ (δ − δ λ−1 ) (1 + δ 32 1 λ−1 )λ κλ . As this equation has a unique solution, 1 ∗∗ κ 1 + δ λ−1 (1 − δ)γM 1 )λ , = ( 1 2 δ(1 − δ λ−1 ) we see that x0 (κ) = 0 at no more than one point on [κ̄, ∞). Next, note that taking derivative with respect to κ again we get 00 [δ(κ + x(κ))λ−1 + (κ − x(κ))λ−1 ]x (κ) = (λ − 1)[(κ − x(κ))λ−2 − δ(κ + x(κ))λ−2 ] > 0, 00 when κ = κ∗∗ . So x (κ∗∗ ) > 0 and we have a local min. Combining these two observations, we can conclude that x(κ) is decreasing for κ < κ∗∗ and increasing for κ > κ∗∗ . Hence we know that there exist a unique minimizer, κ∗ = arg min x(κ). κ∈[κ̄,1] Notice that, if κ∗∗ ≤ κ̄ then κ∗ = κ̄. Proof of Lemma 2 We have established that the welfare of the centrist voters in the best equilibrium with candidates of type κ ∈ [0, κ̄) and −κ is given by u0 (xκ , mκ ) = − 1 1 λ 1 (1 − δ)γM 1 + (1 + γ 1−λ )−λ [δ((1 + 2γ 1−λ )λ − γ 1−λ ) − (1 + γ 1−λ )]κλ . γ+δ γ+δ Hence the welfare of the centrist voters is increasing provided 1 δ > δ̄0 (λ, γ) ≡ 1 + γ 1−λ 1 λ . (1 + 2γ 1−λ )λ − γ 1−λ Note that, when δ > δ̄0 , welfare is increasing until we either reach maximum polarization – κ = 1 – or the rent is driven to 0. If the rent is driven to 0 (κ̄ < 1) then we find the minimum level of polarization which can support 0 rent-seeking, κ∗ . Notice that as λ > 1 and γ ≥ 1, 1 λ 1 λ 1 (1 + 2γ 1−λ )λ − γ 1−λ > 1 + 2γ 1−λ − γ 1−λ ≥ 1 + γ 1−λ > 0, so δ̄0 ∈ (0, 1). I now show that when δ ≤ δ̄0 choosing κ = 0 maximizes the welfare of the type-0 voters, and is the unique κ to maximize the welfare of the type-0 voters when δ < δ̄0 . Note that on the interval [0, κ̄] we have established that welfare is weakly decreasing in κ for δ ≤ δ̄0 and strictly decreasing when δ < δ̄0 , so we need only check κ ∈ (κ̄, 1], for the case in which κ̄ < 1. Recall, from the proof of Lemma 1, that there exists κ∗∗ such that the policy is decreasing in κ for κ < κ∗∗ and increasing for κ > κ∗∗ . Hence if we can show that κ∗∗ ≤ κ̄, then the welfare of the 33 1 type-0 voters is decreasing on [κ̄, 1]. Now note that, as x∗∗ = we must have 1 1 − δ λ−1 1+δ 1 λ−1 < 1 1−δ λ−1 1 1+δ λ−1 κ∗∗ , in order to have κ∗∗ > κ̄ . 1 1 + γ λ−1 Re-arranging, the patience must be at least 1 δ > (1 + 2γ 1−λ )1−λ ≥ δ̄0 . 1 So, when δ ≤ δ̄0 < (1 + 2γ 1−λ )1−λ , the welfare of the type 0 voters is decreasing in κ on the interval [κ̄, 1], and so the welfare of the type 0 voters must be maximized with candidates of type κ = 0 and converging platforms. Further, when δ < δ̄0 the welfare of the type-0 voters is strictly decreasing on [0, κ̄], and so κ = 0 is the unique maximizer of the type-0 voters’ welfare. 1 I now show that δ̄0 decreases in λ. First, define y = γ 1−λ . Then we have δ̄0 = 1+y . (1 + 2y)λ − y λ Note that, as y is (weakly) increasing in λ, it is sufficient to show that δ̄0 is decreasing in y and is decreasing in λ holding y fixed. I first show that δ̄0 is decreasing in y. Taking logs and differentiating we get ∂log δ̄0 ∂y = 1 λ − [2(1 + 2y)λ−1 − y λ−1 ] = 1 + y (1 + 2y)λ − y λ 1 [((1 + 2y)λ − y λ )(1 − λ) − λ((1 + 2y)λ−1 − y λ−1 )] < 0, (1 + y)[(1 + 2y)λ − y λ ] so δ̄0 is decreasing in y. Finally note that, holding y fixed, δ̄0 is decreasing in λ. To see this, note that the denominator is increasing in λ, ∂[(1 + 2y)λ − y λ ] = λ[2(1 + 2y)λ−1 − y λ−1 ] > 0, ∂λ and the numerator is constant in λ. We can now conclude that δ̄0 is decreasing in λ as claimed. Finally, recall that 1 0 < δ̄0 < (1 + 2γ 1−λ )1−λ . 1 1 Now, since when λ > 2, γ 1−λ ≥ γ −1 we have (1 + 2γ 1−λ )1−λ ≤ (1 + 2γ −1 )1−λ ,when λ > 2. As (1 + 2γ −1 )1−λ clearly goes to 0 as λ → ∞, we can conclude that lim δ̄0 = 0. λ→∞ 34 Proof of Lemma 3 I prove this result by first showing that, for all λ there exists a δ̄1 ∈ [0, 1] such that the welfare of the polarized voters is maximized with κ = 0 if, and only if, δ ≤ δ̄. I then show that δ̄1 ∈ (0, 1), and finally, that limλ→∞ δ̄1 = 1. We have established that the average welfare of the polarized voters in the best equilibrium with candidates of type κ ∈ [0, κ̄) and −κ is (1 − xκ )λ + (1 + xκ )λ u−1 (xκ , mκ ) + u1 (xκ , mκ ) = −mκ − . 2 2 Substituting in for mκ and xκ gives − 1 (1 − δ)γM 1 + [δ(1 + 2γ 1−λ γ+δ γ+δ u−1 (xκ , mκ ) + u1 (xκ , mκ ) = 2 κ κ 1 κ λ λ λ − 1)λ ]( [(1 − 1 ) − 1 ) + (1 + 1 ) ]. 2 1 + γ 1−λ 1 + γ 1−λ 1 + γ 1−λ Now, there are two cases to consider: when λ ≤ 2 and when λ > 2. I consider the case in which λ ≤ 2. Note first that (1 + xκ )λ + (1 − xκ )λ − 2 − xλκ , 2 is (at least) weakly decreasing in xκ and hence also κ. Hence since u0 (xκ , mκ ) = −mκ − xλκ , is increasing for all κ ∈ [0, κ̄] when δ > δ̄0 , so u−1 (xκ , mκ ) + u−1 (xκ , mκ ) 2 (1 + xκ )λ + (1 − xκ )λ − 2 2 (1 + xκ )λ + (1 − xκ )λ − 2 = u0 (xκ , mκ ) − [ − xλκ ], 2 = −mκ − is also increasing in κ for κ ∈ [0, κ̄]. Now define δ̄1 = max{δ : u−1 (x0 , m0 ) + u−1 (x0 , m0 ) u−1 (xκ , mκ ) + u−1 (xκ , mκ ) ≥ for all κ ∈ [0, κ̄]} 2 2 and note that 0 ≤ δ̄1 ≤ δ̄0 < 1. In addition, by continuity, when δ = δ̄1 , for all κ ∈ [0, κ̄], u−1 (x0 , m0 ) + u−1 (x0 , m0 ) (1 + xκ )λ + (1 − xκ )λ ) u−1 (xκ , mκ ) + u−1 (xκ , mκ ) −m0 −1 ≥ −mκ − = . 2 2 2 which implies that m0 − mκ ≤ 1 1 κ (1 + xκ )λ + (1 − xκ − 2)λ ) λ [δ(1 + 2γ 1−λ )λ − 1]( ) = . 1 γ+δ 2 1 + γ 1−λ As m0 − mκ is increasing in δ, when δ < δ̄1 we must have u−1 (x0 , m0 ) + u−1 (x0 , m0 ) u−1 (xκ , mκ ) + u−1 (xκ , mκ ) < . 2 2 35 So, when δ < δ̄1 , κ = 0 is the unique maximizer on [0, κ̄]. Finally, note that, as δ̄1 ≤ δ̄0 , if κ̄ < 1 then κ∗ = κ̄ so, when δ ≤ δ̄1 the welfare of the polarized voters is maximized with κ = 0. We can then conclude that the welfare of the polarized voters is maximized with κ = 0 if and only if δ ≤ δ̄1 , and that κ = 0 is the unique maximizer when δ < δ̄1 . I now turn to the case in which λ > 2. I first show that in order to determine whether u−1 (xκ ,mκ )+u1 (xκ ,mκ ) 2 is maximized at κ = 0 or not is sufficient to evaluate the function at two points: (x0 , m0 ) and ∗ ∗ (x , m ) = (x(κ∗ ), 0) if κ̄ < 1, (x1 , m1 ) otherwise. This is because, when the welfare of the polarized voters is not maximized with κ = 0, it will be maximized with platform (x∗ , m∗ ). To see this, note that u−1 (xκ ,mκ )+u1 (xκ ,mκ ) 2 can be written in the form h(κ) = C + Aκλ − B[(D − κ)λ + (D + κ)λ )], where B, D > 0, on the interval κ ∈ [0, κ̄]. Now, suppose we were to have a local extremum of h with κ ∈ (0, κ̄). Then we have that h0 (κ) = λ(Aκλ−1 − B[(D + κ)λ−1 − (D − κ)λ−1 ]) = 0. Notice then that at this point, since λ > 2, h00 (κ) = λ(λ − 1)[Aκλ−2 − B[(D + κ)λ−2 + (D + κ)λ−2 ]] λ(λ − 1)B [(D + κ)λ−1 − (D − κ)λ−1 − κ(D + κ)λ−2 − κ(D + κ)λ−2 ] = κ λ(λ − 1)BD [(D + κ)λ−2 − (D − κ)λ−2 ] > 0, = κ making this a local minimum. As we cannot then have a local maximum with positive rent-seeking, the maximum must either be when κ = 0, after the rents are driven to 0, or after reaching maximum divergence. Hence we need only compare (x0 , m0 ) with (x∗ , m∗ ) where (x(κ∗ ), 0) if κ̄ < 1, ∗ ∗ (x , m ) = (x1 , m1 ) otherwise, to determine whether the welfare of polarized voters is maximized at κ = 0. I now turn to showing that, if the aggregate welfare of the polarized voters is no higher with platform (x0 , m0 ) than with (x∗ , m∗ ) for some level of patience δ 0 then it is lower for all δ > δ 0 . To show this, it is sufficient to show that and u−1 (x1 , m1 ) + u1 (x1 , m1 ) − u−1 (x0 , m0 ) + u1 (x0 , m0 ) > 0 and u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) <1 u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) 36 for all δ > δ 0 , when these equations hold weakly when δ = δ 0 . Consider first u−1 (x1 , m1 ) + u1 (x1 , m1 ) − u−1 (x0 , m0 ) + u1 (x0 , m0 ) = 1 1 1 λ 2 − (1 − x1 )λ + (1 + x1 )λ + [1 − δ(1 + 2γ 1−λ )λ ]( 1 ) . γ+δ 1−λ 1+γ Clearly, if this is non-negative for δ 0 then, for all δ > δ 0 it is increasing, and so is positive for all δ > δ0. Now I turn to considering (1 − x(κ∗ ))λ + (1 − x(κ∗ ))λ − 2 u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) . = u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) m0 It is sufficient to show that u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) , u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) is always decreasing whenever it is less than or equal to 1. Note that we may either have κ∗ = 1 or κ∗ < 1. Note that for all κ ∈ [κ̄, 1] we have ∂x(κ) −γM − (κ + x(κ))λ = , ∂δ λ[(κ − x(κ))λ−1 + δ(1 + x(κ))λ−1 ] and so −γM − (κ∗ + x(κ∗ ))λ ∂x(κ∗ ) ≤ , ∂δ λ[(κ∗ − x(κ∗ ))λ−1 + δ(1 + x(κ∗ ))λ−1 ] as variation in κ∗ can only decrease x(κ∗ ). Now we can calculate, ∂[(1 + x(κ∗ ))λ + (1 − x(κ∗ ))λ − 2] ∂δ ∗ + x(κ∗ ))λ −γM − (κ λ[(1 − x(κ∗ ))λ−1 + (1 + x(κ∗ ))λ−1 ] λ[(κ∗ − x(κ∗ ))λ−1 + δ(κ∗ + x(κ∗ ))λ−1 ] ≤ < −γM − (κ∗ + x(κ∗ ))λ , because (1−x(κ∗ ))λ−1 +(1+x(κ∗ ))λ−1 > (1−x(κ∗ ))λ−1 +δ(1+x(κ∗ ))λ−1 ≥ (κ∗ −x(κ∗ ))λ−1 +δ(κ∗ +x(κ∗ ))λ−1 . Now note that as m0 = (1 − δ)γM/(γ + δ), ∂m0 (1 − δ)γM γM γ+1 =− − =− m0 . 2 ∂δ (γ + δ) γ+δ (γ + δ)(1 − δ) So we have that the derivative of u−1 (0,0)+u1 (0,0)−u−1 (x(κ∗ ),0)−u1 (x(κ∗ ),0) u−1 (0,0)+u1 (0,0)−u−1 (x0 ,m0 )−u1 (x0 ,m0 ) −γM − (κ∗ + x(κ∗ ))λ + γ+1 (γ+δ)(1−δ) [(1 m0 37 is less than + x(κ∗ ))λ + (1 − x(κ∗ ))λ − 2] . Note that this is negative whenever (1 − δ)γM + (1 − δ)(κ∗ + x(κ∗ ))λ > γ+1 [(1 + x(κ∗ ))λ + (1 − x(κ∗ ))λ − 2]. γ+δ Now, recalling that by the definition of x(κ) we have (1 − δ)γM = δ(κ∗ + x(κ∗ ))λ − (κ∗ − x(κ∗ ))λ , and so (κ∗ + x(κ∗ ))λ > Further, when (1 − δ)γM . δ u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) ≤ 1, u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) we have [(1 + x(κ∗ ))λ + (1 − x(κ∗ ))λ − 2] ≤ (1 − δ)γM . γ+δ So it is sufficient to show that γM + (1 − δ) γ+1 γM > γM. δ (γ + δ)2 Note that this is equivalent to (γ + δ)2 > γ + 1, δ which always holds when γ ≥ 1. Hence, whenever u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) ≤ 1, u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) it is always decreasing in δ. Hence, if it becomes less than 1, it can never become greater than 1 again. We can then conclude that if for some δ 0 , u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) ≤ 1, u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) then u−1 (0, 0) + u1 (0, 0) − u−1 (x(κ∗ ), 0) − u1 (x(κ∗ ), 0) < 1, u−1 (0, 0) + u1 (0, 0) − u−1 (x0 , m0 ) − u1 (x0 , m0 ) for all δ > δ 0 . So we conclude that if κ = 0 is not a maximum for δ 0 then it is not for any δ > δ 0 . We can then conclude that there exists δ̄1 ∈ [0, 1] such that the welfare of the voter is maximized with κ = 0 if and only if δ ≤ δ̄1 , and that κ = 0 is the unique maximizer when δ < δ̄1 . I now show that we must have δ̄1 < 1. To do so I must show that, when δ is sufficiently close to 1, the aggregate welfare of the voters is higher with platform (x∗ , m∗ ) than (x0 , m0 ). First note 38 that, as limδ→1 κ̄ = 0, for δ close to 1, (x∗ , m∗ ) = (x(κ∗ ), 0). Note that, as x(κ∗ ) ≤ x(1) it is sufficient to show that, for δ sufficiently close to 1, u−1 (x(1), 0) + u1 (x(1), 0) = −(1 + x(1))λ − (1 − x(1))λ > −m0 − 2 = u−1 (x0 , m0 ) + u−1 (x0 , m0 ). Next note that as −(1 − x(1))λ = (1 − δ)γM − δ(1 + x(1))λ , it must be that lim x(1) = 0, δ→1 and, differentiating with respect to δ, λ[(1 − x(1))λ−1 + δ(1 + x(1))λ−1 ] ∂x(1) = −γM − (1 + x(1))λ . ∂δ Notice that this implies that −γM − 1 ∂x(1) = , δ→1 ∂δ 2λ lim which is finite. Now consider (1 + x(1))λ + (1 − x(1))λ − 2 (γ + δ)[(1 + x(1))λ + (1 − x(1))λ − 2] . = m0 (1 − δ)γM As both the top and bottom go to 0 as δ → 1, using L’Hopital’s rule, and the fact that limδ→1 ∂x(1) ∂δ is finite, we can conclude that, (1 + x(1))λ + (1 − x(1))λ − 2 = δ→1 m0 lim [(1 + x(1))λ + (1 − x(1))λ − 2] + (γ + δ)λ[(1 + x(1))λ−1 − (1 − x(1))λ−1 ] ∂x(1) ∂δ = 0. δ→1 −γM lim Hence, we must have δ̄1 < 1. It has now been established that, for both the cases in which λ ≤ 2 and λ > 2, that there exists δ̄1 ∈ [0, 1) such that the aggregate welfare of the polarized voters is maximized with κ = 0 if and only if δ ≤ δ̄1 . I now show that δ̄1 > 0. To see this, recall that the level of rent-seeking is decreasing 1 in κ when δ < δ = (1 + 2γ 1−λ )−λ . As the aggregate welfare of the polarized voters is decreasing in both m and x, we can conclude that we must have δ̄1 > δ > 0. Hence we can conclude that there exists δ̄1 ∈ (0, 1) such that the aggregate welfare of the polarized voters is maximized with κ = 0 if and only if δ ≤ δ̄1 . I conclude by showing that lim δ̄1 = 1. λ→∞ Recall that, when λ > 2, to determine whether or not κ = 0 maximizes the welfare of the polarized voters it is sufficient to compare the utility from (x0 , m0 ) with (x(κ∗ ), 0) if κ̄ < 1, ∗ ∗ (x , m ) = (x1 , m1 ) otherwise. 39 Hence it is sufficient to show that u−1 (x∗ , m∗ ) + u1 (x∗ , m∗ ) u−1 (x0 , m0 ) + u1 (x0 , m0 ) < lim = −m0 − 1. λ→∞ λ→∞ 2 2 lim To show this, I start by showing that lim (x∗ , m∗ ) = lim (x(1), 0). λ→∞ λ→∞ First note that as κ lim xκ = lim λ→∞ λ→∞ 1+γ 1 λ−1 = κ , 2 lim (1 − x1 )λ = 0, λ→∞ and lim (1 + x1 )λ = ∞. λ→∞ Recalling mκ solves −(κ − xκ )λ + γmκ = (1 − δ)γM − δ(κ + xκ )λ − δmκ , we can conclude that lim→∞ κ̄ < 1. Hence, we have limλ→∞ (x∗ , m∗ ) = limλ→∞ (x(κ∗ ), 0). Now recall that, from the proof of Lemma 1, when κ∗ > κ̄, then κ∗ = min{κ∗∗ , 1}, where 1 κ∗∗ = 1 As 1+δ λ−1 2 goes to 1 as λ → ∞ and 1 + δ λ−1 (1 − δ)γM 1 ( )λ . 1 2 δ(1 − δ λ−1 ) (1−δ)γM 1 is greater than 1 for λ large, this implies that δ(1−δ λ−1 ) lim κ∗ = 1, λ→∞ and so we can conclude that lim (x∗ , m∗ ) = lim (x(1), 0). λ→∞ λ→∞ I have then established that, in order to show that limλ→∞ δ̄1 = 1 we need only show that, for all δ ∈ (0, 1), lim λ→∞ u−1 (x(1), 0) + u1 (x(1), 0) < −m0 − 1. 2 I now turn to showing that this equation holds. Recall that x(1) is the solution to −(1 − x(1))λ = (1 − δ)γM − δ(1 + x(1))λ . Notice that (1 − δ)γM is constant with respect to λ, so, since 1 − x(1) ∈ (0, 1), lim δ(1 + x(1))λ = (1 − δ)γM + lim (1 − x(1))λ λ→∞ λ→∞ 40 is finite. Therefore we must have that lim x(1) = 0. λ→∞ Now define L∗ ≡ lim (1 + x(1))λ . λ→∞ I now show that lim (1 − x(1))λ = λ→∞ 1 . L∗ To see this, note that, taking logs and applying L’Hopital’s rule, λ log(1 + x(1)) −(1 − x(1)) = lim = −1. λ→∞ λ log(1 − x(1)) λ→∞ 1 + x(1) lim Hence, lim λ log(1 − x(1)) = − lim λ log(1 + x(1)) = − log L∗ λ→∞ λ→∞ So we can conclude that lim (1 − x(1))λ = lim (1 + x(1))−λ = λ→∞ λ→∞ 1 . L∗ Now, substituting in for the limits, we have δL∗ − 1 = (1 − δ)γM = m0 (γ + δ). L∗ Next, recall that u−1 (x(1), 0) + u1 (x(1), 0) < −m0 − 1, 2 if and only if 2m0 + 2 − u−1 (x(1), 0) + u1 (x(1), 0) = 2m0 + 2 − (1 + x(1))λ − (1 − x(1))λ < 0. Taking limit and substituting in for m0 we get that limλ→∞ u−1 (x(1),0)+u1 (x(1),0) 2 < −m0 − 1 if and only if 2+ 2δ ∗ 2 1 1 L − − L∗ − ∗ < 0. ∗ γ+δ γ+δL L This is equivalent to (γ − δ)L∗ + (γ + δ + 2) 1 > 2(γ + δ). L∗ Removing 1/L∗ from this equation then gives (γ − δ)L∗ + (γ + δ + 2)δL∗ − (γ + δ + 2)(1 − δ)γM > 2(γ + δ), and dividing by γ + δ gives (1 + δ)L∗ − (γ + δ + 2)m0 > 2. 41 In order to show that this holds, I need verify only that it holds for M = 0, and that (1 + δ)L∗ − (γ + δ + 2)m0 is increasing in M. Recall that L∗ is the solution to δL∗ − 1 = (1 − δ)γM. L∗ 1 Hence, when M = 0 we have L∗ = δ − 2 . Further we have that ∂L∗ (1 − δ)γ > . ∂M δ Hence we can conclude that ∂(1 + δ)L∗ − (γ + δ + 2)m0 1+δ γ+δ+2 > (1 − δ)γ( − ) > 0, ∂M δ γ+δ as γ > δ, and so it is sufficient to show that (1 + δ)L∗ − (γ + δ + 2)m0 > 2 when M = 0. As 1 L∗ = δ − 2 and m0 = 0 when M = 0, and, since for all δ ∈ (0, 1), 1+δ 1 > 2, we can then conclude δ2 that (1 + δ)L∗ − (γ + δ + 2)m0 > 2, in fact holds. Hence, we can conclude that lim λ→∞ u−1 (x(1), 0) + u1 (x(1), 0) u−1 (x0 , m0 ) + u1 (x0 , m0 ) < −m0 − 1 = lim , λ→∞ 2 2 for all δ. Therefore we can conclude that lim δ̄1 = 1. λ→∞ I have then shown that, for all λ > 1, there exists a δ̄1 ∈ (0, 1) such that the welfare of the polarized voters is maximized with κ = 0 if and only if δ ≤ δ̄1 , and that limλ→∞ δ̄1 = 1. Further we have that κ = 0 is the unique maximizer when δ < δ̄1 , and when λ > 2, the unique maximizer is ∗ κ if κ̄ < 1, κ= 1 otherwise. Proof of Lemma 4 Recalling that, as the centrist voters vote based on the utility they receive the elected candidate must implement a platform which satisfies ∂ḡκ (x,m) − ∂u ∂x 0 (x,m) ∂x ∂ḡ (x,m) κ (κ − x)λ−1 γmβ ∂m = = − = , ∂u0 (x,m) xλ−1 mβ ∂m in the best equilibrium with type κ and −κ, provided the level of rent-seeking is non-0. Therefore we have κ x= 1 1 + γ λ−1 42 , as in the case with linear rents. Hence the level of rent-seeking in the best equilibrium can be determined from the equation ḡκ (xκ , m) = (1 − δ)γ M 1+β + gκ (xκ , m), 1+β which reveals that the level of rent seeking will be mκ = 1 1 κ (1 − δ)M 1+β λ + [1 − δ(1 + 2γ 1−λ )λ ]( 1 ) . (γ + δ)(1 + β) γ + δ 1 + γ 1−λ Proof of Lemma 5 As δ > min{δ̄0 , δ̄1 } > δ, for each M there exits a κ̄(M ) such that the amount of rent-seeking is driven to 0 if the degree of polarization is at least κ̄(M ). I now show that there exists a β1 > 0 such that, for all β ∈ (0, β1 ), there exists M 0 < κ̄−1 (1) such that, for M > M 0 it will be optimal to move away from the center, for an appropriate distribution of voter preferences. Recall that the aggregate welfare of the population when the candidates are type κ ∈ [0, κ̄] and −κ is w(κ) = σu−1 (xκ , mκ ) + (1 − 2σ)u0 (xκ , mκ ) + σu1 (xκ , mκ ) = − m1+β κ − σ[(1 + xκ )λ + (1 − xκ )λ ] − (1 − 2σ)xλκ . 1+β There are two cases to consider: λ ≤ 2 and λ > 2. Consider first the case in which λ ≤ 2. Note that average welfare of the type −1 and 1 voters is higher with candidate of type κ than type 0 if and only if m1+β − m1+β (1 + xκ )λ + (1 − xκ )λ − 2 κ 0 > . 1+β 2 Now, when M = κ̄−1 (1), we can compare κ = 0 with κ = κ̄ = 1 which becomes the equation (1 + x1 )λ + (1 − x1 )λ − 2 m1+β 0 > . 1+β 2 Now as δ > δ̄1 , m1+β (1 − δ)γM (1 + x1 )λ + (1 − x1 )λ − 2 0 = > , β→0 1 + β γ+δ 2 lim so there exists a β1 > 0 such that, for all β ∈ (0, β1 ), m1+β (1 + x1 )λ + (1 − x1 )λ − 2 0 > , 1+β 2 holds, and so κ = 1 provides higher utility to the polarized voters than than κ = 0 when M = m1+β −m1+β κ 0 is increasing in M we can conclude that, 1+β −1 κ̄ (1) such that the average welfare of the of the type κ̄−1 (1). Finally, as 0 exists an M < 0 maximized by setting κ = 0 if and only if M > M . 43 for all β ∈ (0, β1 ) there −1 and 1 voters is not Therefore, for all β ∈ (0, β1 ), there exists M 0 < κ̄−1 (1) such that for all M > M 0 there exists σ̄(λ, δ, M, β) ∈ [0, 12 ) such that, for all σ ∈ (σ̄(λ, δ, M, β), 12 ), w(κ) is not maximized with κ = 0. Now consider the case in which λ > 2. Note that welfare of the type 0 voters is higher with candidate of type κ if and only if m1+β − m1+β κ 0 > xλκ . 1+β Now, when M = κ̄−1 (1), we can compare κ = 0 with κ = κ̄ = 1 which becomes the equation m1+β 0 > xλ1 . 1+β Now as δ > δ̄0 , m1+β (1 − δ)γM 0 = > xλ1 , β→0 1 + β γ+δ lim so there exists a β1 > 0 such that, for all β ∈ (0, β1 ), m1+β 0 > xλ1 , 1+β holds, and so κ = 1 provides higher utility to the centrist voters than than κ = 0 when M = κ̄−1 (1). Finally, as 0 M < m1+β −m1+β κ 0 1+β κ̄−1 (1) is increasing in M we can conclude that, for all β ∈ (0, β1 ) there exists an such that the welfare of the of the type 0 voters is not maximized by setting κ = 0 if and only if M > M 0 . Therefore, for all β ∈ (0, β1 ), there exists M 0 such that for all M > M 0 there exists σ̄(λ, δ, M, β) ∈ (0, 21 ] such that, for all σ ∈ (0, σ̄(λ, δ, M, β)), w(κ) is not maximized with κ = 0. We have now established that there exists M 0 < κ̄−1 (1) such that, for any M > M 0 , such that κ = 0 does not maximize w(κ), for an appropriate distribution of voter preferences. Now I show that, there exists a M̄ > κ̄−1 (1) > M 0 such that, for all M ∈ (M 0 , M̄ ), w(κ) is maximized on (0, κ̄] by κ∗ < min{κ̄, 1}. Recall that, as δ > δ, mκ is decreasing in δ and mκ̄ = 0. Consequently, since mβκ → 0 as mκ → 0, there exist M̄ > κ̄−1 (1) such that, there exists κ0 < min{1, κ̄} such that, for all κ ∈ (κ0 , κ̄], w0 (κ) < 0. Hence the optimal level of divergence on the interval [0, κ̄] must be in [0, κ̄) for all M < M̄ . We can then conclude that, for all M ∈ (M 0 , M̄ ),w(κ) 0 is maximized on (0, κ̄] at some point κ∗ ≤ κ < min{κ̄, 1}. Now we have established the maximum on [0, κ̄] is κ∗ ∈ (0, κ∗ ). We need to now establish that the welfare is never higher with κ ∈ [κ̄, 1]. Note that, since w(κ∗ ) > w(κ̄), if κ̄ is sufficiently close 00 to 1, then w(κ∗ ) > w(κ) for all κ ∈ [κ̄, 1]. As such, there exists M < κ̄−1 (1) < M̄ , such that no 00 κ ∈ (κ̄, 1] can give higher utility than κ∗ when M > M . Hence, defining M = max{M 0 , M 00 } < κ̄−1 (1) < M̄ we have that, for all M ∈ (M , M̄ ), the optimal level of divergence, κ∗ ∈ (0, min{κ̄, 1}). Finally, defining β̄ = min{β1 , 1} we are done. 44 Proof of Lemma 6 Suppose κA , κB > 0. From the assumption of retrospective voting we must have that the candidate chooses platform ((xA , xB ), m) to maximize gκ (xA , xB , m)) = −(κA − xA )λ − (κB − xB )λ + γm, subject to the constraints m ≥ 0 and u0 (xA , xB , m) = −xλA − xλB − m ≥ ū. Note that this implies (κB − xB )λ−1 (κA − xA )λ−1 = ≥ γ, xλ−1 xλ−1 A B and so xA xB = ≡ ρ−1 . κA κB 45