Correlation of preferences and Honesty in matching market Sangram Vilasrao Kadamy March 2, 2011 Abstract rived at, are not at all informative for the real world settings. This leaves the question about core convergence in large matching markets still open. The limit convergence of the core in a large matching market is a known result. If the maximum size of the ranked list submitted by Intro duction men is xed and if their preferences are Consider a one-to-one matching market where randomly generated from any distribution, the there are nmen and nwomen. Suppose they proportion of women who have incentives to manipulate their preferences goes to zero as the play the matching game to produce a ‘stable matching,’ where no pair of individuals would number of participants in the market grows to in nity. I extend this result by allowing correlation prefer each other to their existing matches or prefer to remain single rather than being in the preferences of men while the women matched to their current match. The seminal preferences are allowed to be complete and arbitrary. Furthermore, I nd that for a given level result about the existence of at least one such of the proportion of women who can manipulate match where the preferences are strict on both their preferences, we can allow an increase in sides of the market was given by GaleShapley the size of the rank order list as the correlation in (1962). It can be produced by using Deferred preferences increases. However, I nd that the acceptance algorithm where one side of the market, say men, propose. A man proposing bounds arDAA produces a man-optimal stable matching and a woman-pessimal stable matching. I would like to thank Peter Coles and Al Roth for providing an excellent course which motivated me to investigate on RothSotomayor (1990) prove that the women this issue and for providing me inputs through the progress always have an incentive to misreport their of this paper. I would also like to thank Assaf Romm for preferences helpful discussions. yDepartment of Economics, Harvard University, Cambridge, MA 02138. G1; ID 30802853; email: svkadam@fas.harvard.edu. 1 if there are more than one stable matches, i.e. if there are couples in the market. Couples the match generated for a given woman in the incorporate correlation among individuals over man-optimal stable match is di erent from that in preferences. However, all these results had the woman-optimal stable match. Thus, she assumed in some form that the individuals have does not have incentives to truthfully reveal her no correlation in preferences or|as in case of preferences. KPR|that the proportion of individuals who have The extension of stable one-to-one matching correlation goes to zero as ngrows large. market to the real world problem of medical I have been able to allow for correlation in residency matches have been studies quite preferences of all the market participants. I am extensively. Roth-Peranson (1999) ran some able to extend the above results for a particular simulations on the size of the markets and found kind of correlation. The structure of this paper is that as the number of doctors increase the as follows. In the next section, I discuss the proportion of hospitals who have more than one existing set of results in more details from IM set of stable matches increases to about 90% for and KP. In section 2, I provide an overview of n= 1000, if the preferences on both sides of the the literature on correlation in matching market market are complete and randomly generated. It and then focus on the speci c type of correlation is important to note that the completeness that I am going to use for the rest of the guarantees that every hospital is acceptable to discussion. Section 3 contains the main theorem every doctor and vice versa. In practise, and its proof is presented in section 4. The most however, only a small number of hospitals get important discussion takes place in section 5. listed by doctors on their preference list. Further Firstly, I would provide details about allowing simulations, accounting for this observed fact, greater size of the rank ordered lists for market lead to the Roth-Peranson conjecture that the participants as correlation increases. Secondly, proportion of hospitals who have incentives to the lack of informativeness of the results manipulate their preferences goes to 0 as the presented, here and earlier, is shown through size of the market increases and the preferences numerical simulations of the bound. Section 6 for both sides of the market are generated at concludes highlighting the shortcomings of the random from a uniform distribution. existing results and motivating new directions on This conjecture initiated a set of theoretical this problem. investigations to understand the underlying mechanisms. Immorlica-Mahdian (2005) (henceforth IM) presented the rst explanation 1 Existing results for the setting of one-to-one matching markets. IM generalize the RP conjecture in two KojimaPathak (2008) (henceforth KP) extended directions. Firstly, they allow the men’s the above to many-to-one matching market and preferences to be drawn from an arbitrary a get a similar result that proportion of hospitals distribution and not necessarily uniform who have incentives to misrepresent their distribution. They assume that the preferences preferences goes to zero as n grows large. of men are restricted to size of kand at each Kojima, Pathak and Roth (2010) (henceforth step a random woman is drawn from a KPR) extend this even further to the situations distribution D until someone who where 2 does not already exist on his preference list is of regions of preferences and each doctor found. Thus, the preference list for all nmen is belongs to one and only one region. The generated by drawing from D. Secondly, they do preference list for doctors is generated by not impose any restriction on the preferences of randomly generating a list of hospitals from the women and assume that to be completely distribution of their region. Thus, they are able arbitrary. However, they do maintain the to replicate region speci c preferences as one assumption that any man is acceptable to any might expect in the real world. However, they woman, i.e. the women have complete do not prove or disprove the existence of core preferences over the set of men. They use ckk(n) convergence result for these type of to denote the expected number of women who preferences. A related work on preference have more than one stable husband. Their main correlation is in Coles, Kushnir and Niederle result stated in theorem 3.1 claims that (2010) where they have what they call block kTheorem Consider a situation where each preferences. Doctors agree about the ordering woman has an arbitrary and complete of ranking of two hospitals from di erent blocks preference list, and each man has a but they might have individual tastes about preference list chosen independently at their ranking of hospitals within a block. random according to D. Then, for all xed k, Another form of correlation appears in Coles (2009) where there is nite proportion of women who have exactly identical preferences as a given woman. limn! ck(n) = 0 I have included correlation in preferences in a 1 n di erent way than any of the existing mechanisms. The motivation for this form can best be described through a picture. KP generalize the IM result for many-to-one matching and nd a similar result for the expected proportion of colleges that can manipulate the student-optimal stable match when others are truthful. KPR extend the result where there are couples in the market. Couples indicate real scenarios where there is correlation in preferences among doctors. However, their 3 results hold as long as the proportion of couples vanishes as n grows large. 2 Correlation in preferences In the matching literature correlation in preferences among candidates has been included in a few ways. To list a few, KP have nite number 1On the left extreme we have the IM preferences for men and the restriction of the rank order list (ROL) being of size k. On the right extreme we have the situation of perfect correlation in preferences across all men. In fact, it just means that all the men simply agree on the most eligible bachelorette and the second most eligible and so on. To have maximum number of matches 1I am denoting jROLjas the maximum permissible size of the rank order list and truth telling the size of the rank order list toss. Say H we generate from R and with T we should be equal to the number of women n. This go to D . If we get a T then a random number is will ensure that everybody just lists the ranking generated to enable us to draw someone from and then the women choose in the order of their the distribution D . Suppose we have the ranking their favorite bachelor. Due to IM, we outcomes for the coin toss and the subsequent know that the core convergence result hold in random number generated, if T, is as shown in the limit of n!1for the left extreme situation. We the table below. also know that the results hold with certainty for Coin ip H T H Prob the situation on the right where there is a unique 0.811 stable match. Lets consider the preferences where we have some degree of agreement while some degree This would create the preference list of size 3 of randomness in men’s preferences. This can for X as A, C, B. be modeled as (1 )D + Rpreferences de ned as This preference generation process has one follows. obvious shortcoming. The correlation in De nition Each position on a man’s preference preferences would never allow a woman ranked list is lled with probability 1 by drawing a woman from the distribution D till a woman who does not greater than kin R to be generated at a given already exist on the preference list is drawn and spot for any man when the spot is going to with probability by picking up the highest rankedbe lled from R . Thus, this correlation in preferences has a unique characteristic that it woman on the ranked list R who does not allows for the possibility of only the top k women already exist on the list. making it to the preference list through the The parameter can range from 0 where it would reect IM preferences to 1 where it would channel of correlation, i.e. the ranked list R . be the case of perfect agreement in preferences. We are interested when 2(0;1). An example 3 Main Result would clarify this further. Lets say that we are simulating the preference list for Xavier and he is Lets suppose there are nmen and nwomen in looking at Amy, Beatrice, Cindy, and Dorothy. the matching market we are considering. Then Lets further assume that in this case k= 3 my main result is the following theorem. and =1 2st. Say we have the R as A, B, C, D Theorem Suppose each woman has an ranked as 1;2nd;3rd;and 4threspectively. Lets arbitrary and complete preference list, and assume that D is as shown in the table below. each man has a preference list chosen A B C D Prob 40% 30% 20% 10% independently at random according (1 )D + R , with the restriction that the preference list is of maximum size k. Then, for all and k, we have limn! ck(n) = 0 1 n As =1 2we can decide whether we go to the ranked list or the distribution based on a coin 4 ckThus, the expected proportion of women who have more than one stable husband( n) nis bounded by a quantity that approaches 0 as n grows large. Furthermore, for a xed nand a given bound on proportion who could manipulate their preferences, kcan increase with such that k(1 ) stays constant.2The main contribution of this theorem is twofold. First, I extend the core-convergence result in the limit of n!1to the case where there is correlation in preferences for all the participants on one side of the matching market. The results hold for any level of correlation characterized by the value of . I maintain the assumption of arbitrary preferences on the other side of the market. Second, this result generalizes and provides some theoretical motivation as to why we could and should allow for larger Rank ordered list sizes, i.e. k, when the correlation in preferences is expected to be higher. 4 Pro of I provide the proof here and the details of the proof of a Lemma used in the Appendix. The proof closely follows the chassis laid in IM. Lets assume that the women are numbered in their decreasing order of popularity, i.e. probability of being drawn from D . The expected number of women who could possibly manipulate their preferences can be found by taking expectations|over all possible preference lists for all the men|of the 2This assumption has its bene t of being extremely general in its application and at the same time we can not get a real tight upper bound for the proportion of women who could manipulate given that we are not making any assumptions about their preferences. I would come back to this in the next section of simulations. 5 number of women having incentives to manipulate for a given preference list. Due to linearity of expectation and probability, this quantity will be exactly equal to the sum of probability for a given woman having a profitable deviation by misrepresenting preferences. We know that women have incentives to misrepresent their preferences if and only if they have more than one stable husband in all possible stable matches. Say Multgis the event that a given woman ghas more than one stable husband. ckg(n) = XProb(Multg) To investigate these probabilities, IM use the stochastic algorithm motivated from Knuth, Motwani and Pittel (1990). I have included some details about it in the appendix. I follow the exact same approach and make the same argument that the probability of Multgis bounded above by the probability that woman gunder consideration getting a proposal at the rst place in a step above. Lets call the later event Propg It is important to note that even if woman g receives a proposal it is not necessarily from one of her stable matches or in other words a man she prefers more to her earlier match whom she divorced when she initiated the rejection chain. Next, we look at the number of single women more popular than gand let there be X(g) such women. At this step, we make an assumption and focus only on women who are not in the top kof the ranked list R . As we noted earlier, such a woman can never get on the preference list for a given man 3ln(n) ln( n) =2 + 4nke through the channel of correlation. Thus, ck(n) 16nk(1 )ln(n) ng= 12(1 )eg 8 nk(1 ) g gXn the probability that she can get a spot on + g= the preference list is limited to her gX probability of being drawn from the distribution, say pg, given that the spot is going to be lled from the distribution D . Thus the unconditional probability of Propgis (1 )p. Clearly, this is bounded by1 X( g)+1.g If Y(g) denotes the number of women more ck(n) 16k(1 )ln(n) 3ln(n) = o(1) popular than gbut not listed on any of the n + 4k(1 )p n men’s preference list then essentially they would be single. Clearly we have X(g) Y(g). Using the next Lemma we can arrive at probability bounds for all g 4k. Lemma 8g>4k, we have Both the terms on the right hand side of the last inequality approach 0 as n!1. QED. 16nk(1 )ln(n) 5 Economic implications of the results E h1 1i Y(g) + 12(1 )eg 8 nk(1 ) g The old role of xed kin IM result is taken by k(1 ) and not surprisingly for = 0 it boils down to IM result. An important implication is that for a given bound on the proportion of women who we can allow to have more than one stable husbands, the maximum permissible size of the Although the above lemma is true for g 4k, rank ordered list kcan increase if increases we will use it only for g g16 nk(1 ) ln( n) =. For a such that k(1 ) is xed. Hence, we can and we given value of kand , we can always nd an n should allow for greater sizes of rank ordered large enough that the former is implied by the lists from the perspective of highest number of later.Even if we completely ignore the matches and truthful revelation by men if the incentives of the top g 3women by saying that correlation is expected to be higher. The at most their probability of manipulating their following table illustrates the growth in preferences is 1, we get our nal result. permissible value of kas increases 0 0.25 0.5 0.75 0.9 k 10 13 20 40 100 3As increases and gets closer to 1, we would need a higher value of nto ensure this holds. However, this is really The bound provided at the end of last section not a concern for realistic values of n. For n= 100 the above holds 8 <0:9884 6 consists of two parts. The rst part being the top On the extreme left we have n= 10688and the g women for whom we did not provide any reasonable bound on the probability. The secondvalue of the rst component of the bound is part being all the other women ranked greater than g and most of the results that were proved about 800%. The bound becomes <100% only have been used only for assessing the incentives of these women. The second part of for extremely large value of n 10. This reveals the bound is encouraging and approaches zero at the rate ofp nwhich provides reasonable the enthusiastic nature of our results so far. values for the values of nwe see in real life. This can be seen Although, the result proved holds in limit of n!1and this could be done without any from the next gure. assumptions on the women preferences, we clearly see that this has compromised the tightness and informativeness of the bound. My results are generalization of IM results and follows a similar structure of the proof. Furthermore, KP result uses similar arguments and get a bound which looks similar to this. Their rst term is the proportion of popular colleges and that also has the exact same rst component. Thus, the criticism about lack of tightness and informativeness carries forward to the existing set of results of IM and KP. The core convergence results proved so far in the limit without correlation, in IM or KP, or with correlation, in KPR or in this paper, just give us a directional sense of what are the reasons we might see extremely small proportion of The second part dies to 0 quickly as we increase colleges or women who can misrepresent their nfrom 100 where it takes value 3:5% to 10,000 preferences, which was the original RP where it takes value 0:7%. However, the rst partconjecture. Nonetheless, all these results do of the bound is very discouraging as it not tell us what the mechanism is in the real approaches zero at the rate of 1=ln(n). The plot life scenarios we stumble upon (with nbeing in of the numerical value for the rst part for some the order of 1000 or at best 100,000). really large values of nshows that it only means that at most 100% or all the women have incentives to manipulate their preferences in all real world scenarios. 7 Thus, we are still far from resolving the puzzle of could be informative. Furthermore, I have only RP conjecture and why truthful revelation of included the correpreferences is (at least an approximate) lation in preferences which ensures that only the equilibrium, if it is, when others are truthful for top kwomen stand a chance to bene t from the matching markets we see in real life. introduction of correlation. It would be an 16 nk(1 ) ln( n)It is important to realize that Lemma interesting extension to allow for correlation to a ect all or at least a nontrivial fraction of the 1 holds for all g 4k but we have leveraged it women. only for g . The mathematical craftsmanship, initially discovered in IM, used to arrive at the results in the given level of generality forces us to take this leap in the number of top App endix women/colleges. This is one area where the Stochastic DAA The essence of the stochastic bound can be improved upon. DA algorithm is that a given woman divorces her current mate in the Man Optimal Stable Match and initiates a rejection chain by forcing 6 Conclusion her current mate to propose to his next best alternative. He continues proposals till his In this paper, I studied correlation in preferences of one-to-one matching market and showed that proposal is accepted. The current mate of the woman who recently accepted a better for the particular type of correlation, i.e. (1 )D proposal now starts his proposals and this + Rpreferences, the same core convergence result holds as the size of the matching market continues. We know from the lattice theorems given in Roth-Sotomayor(1990) that if there is increases. An important outcome of this result was that the size of the rank ordered lists being another stable match then this rejection chain necessarily ends on the woman who initiated submitted can (and should) be allowed to this rejection chain.Proof of the Lemma Let Q increase if we have more correlation in preferences. Although, I have proved the result = Pk j=1. Let w be a woman ranked better than g. Let lwpj(m) be the event that a man m does only for one-to-one matching market, the results not list wat a given step given that he has could be extended to many-to-one matching listed w1;w2;:::;wi1as his rst i1 women is given market. by Lastly, the limit results that have been found here and in other earlier works, reveal our gap in understanding real world markets of the sizes we actually see. The results give any meaningful bounds only for extremely large values of nwhich we never see. There is a clear direction to extend our understanding on this extremely important problem which has puzzled us for over a decade. If we can provide a better bound for the top few colleges or women, we would get to results that (m)) = 1 pwj=1 j= i1pw (1 ) 1 Q 1 pw P j 8 Prob(lw (1 ) 1 There are wkwomen more popular than wso clearly pw 1 Q wk. For Ew(= Tn m=1lw(m)) being the correlated events4. event that no man has the given woman w listed on his preference list of size k, we have 2(Yg) E[Yg] Prob(Ew) 1 pw(1 ) 1 Q1 Using Chebyshev inequality and the same 1 wk nk nk arguments as in the proof of Lemma 4.1 in IM, we have Combining inequality 1 and 2 we have the result in Lemma 1. QED. ] (2) E[Y References E h1 Yg+ 6 g 1i [1] Coles, Peter, \Optimal Truncation in Matching Markets," Mimeo, 2009. Furthermore, we will have the expectation of [2] Coles, Peter, Alexey Kushnir and Muriel Ygas sum of the probabilities of all women w Niederle. \Preference Signalling in Matching ranked higher than gbeing not listed. Hence Markets," NBER Working Paper, 2010, for g 4kwe have E[Yg] = Xw=1gXProb(Lw= [3] Gale, David and Lloyd S. Shapley, \College 2 kgXew= g=2ew)4 Admissions and the Stability of Marriage," g nk(1 ) w8 nk(1 ) g American Mathematical Monthly, 1962, 69, 9-15. [4] Immorlica, Nicole and Mohammad Mahdian, \Marriage, Honesty and Stability," SODA, 2005, 53-62. Thus, we [5] Knuth, Donald E., Rajeev Motwani and have Borris Pittel, \Stable Husbands," Random Structures and Algorithms, 1990, 1, 1-14. 8 nk(1 ) g ] g2 (1) E[Yg e . We know that 1xe2 xw 2if w>2k. Hence we have 0 for x2[0;1=2] and wk Prob(Ew ) e 2 nk(1 ) wk e 4 nk(1 ) w For the variance of Y(g), we can use IM Lemma [6] Kojima, Fuhito and Parag Pathak, 4.4 directly without any modi cation as this \Incentives and Stability in Large Two-Sided depends on Ygbeing the count of negatively Matching Markets," American Economic Review, 1999, 99, 608-627. 9 + 2C22= n1 n 4Prob(Ei ^Ej) Prob(Ei)Prob(Ej) and nC [7] Kojima, Fuhito, Parag Pathak and Alvin E. Roth, \Matching with Couples: Stability and Incentives in Large Markets," NBER working Paper, 2010. [8] Roth, Alvin E. and Elliot Peranson, \The Redesign of the Matching Market for American Physicians: Some Engineering Aspects of Economic Design," American Economic Review, 1999, 89, 748-780. [9] Roth, Alvin E. and Marilda A. O. Sotomayor, Two-sided Matching: a study in Game-theoretic Modeling and Analysis, Cambridge: Econometric Society monographs, 1990. 10