The National Mortgage Database (NMDB) Robert Avery, Ken Brevoort, Theresa DiVenti, Carla Inclan, Ian Keith, Jessica Lee, Lexian Liu, Ismail Mohamed, Forrest Pafenberg, Jay Schultz, Cynthia Waldron, Xun Wang, Claudia Wood, Peter Zorn Federal Housing Finance Agency Consumer Financial Protection Bureau Freddie Mac Urban Institute June 11, 2013 The views expressed are those of the authors and do not necessarily represent those of the Consumer Financial Protection Bureau, the Federal Housing Finance Agency, Freddie Mac or their staff. What is the NMDB? A new, nationally representative, loan-level mortgage database jointly funded and managed by the FHFA and CFPB based on a prototype developed by Freddie Mac. » 1st lien mortgages reported to the credit bureaus are used as both the sampling frame and the source of performance data. No new data is collected—the NMDB will make better use of data that already exists. » The database is a 1/20 sample (not a registry of loans). » Because the credit bureaus archive their data, the NMDB recovers data that would have been available had the project been started years ago. The initial 1/20 sample is representative of all mortgages open at any time from January 1998 to June 2012 and (with weights) any borrower who had at least one mortgage during that period. » Going forward, a 1/20 representative sample of newly originated mortgages will be added each quarter, and terminated mortgages will exit the sample. » 10.1 million mortgages are in the initial historic database. In the future the database will track about 3.5 million active mortgages. Credit bureau data are comprehensive. However, they are raw servicing data which requires significant cleaning to make them useful. Also need to add data from other sources. » Major commitment of government staff to do this. Never done before. » Working with active cooperation of credit bureau staff. NMDB will also have a survey component. Each quarter a representative subset of borrowers associated with loans newly added to the database will be sent a mail survey soliciting information on their mortgage shopping and origination experience. National Mortgage Database 2 Four Overlapping Databases The basic unit of observation is the mortgage. » The database will contain full credit information for all borrowers associated with the sampled mortgages. » Borrower data will be gathered from one year prior to sampled mortgage origination to one year after termination and tracked quarterly. » Performance on the sample mortgages will be collected monthly. The NMDB will also make available an historic data base containing full credit data (including scores) from 1998 to 2012 of a representative sample of borrowers associated with an active mortgage during the 1998 to 2012 period. » The database will contain information on all mortgages taken out by these borrowers during the 1998 to 2012. » Data also gathered on all other credit obligations active during this period. » Performance for each mortgage will be tracked from 2000 to 2012. The NMDB will maintain a separate database of a representative 1-in-20 sample individuals who have ever had an active mortgage from 1998 onward. » Quarterly information will be maintained on these individuals from one-year prior to taking out their first mortgage (or 1998) until they die. » Persons will be added to the database when they take out their first mortgage. . The NMDB origination survey data will also be maintained as a separate database. National Mortgage Database 3 Why is the NMDB Needed? • HMDA: – Not fully reasonably representative—does not include HMDA non-reporters. – Lacks detailed borrower, loan or performance data. – Available only 9 to 21 months after mortgages are originated. • LPS McDash and/or CoreLogic: – Servicing files from 26 large servicers versus 2,000 servicers in credit bureaus. – Not representative—poor coverage of portfolio loans. – Same problems as underlying NMDB data—duplication, hanging performance and servicing sales—but not cleaned as the NMDB will be, so you don’t know it. – No information on other obligations, previous or subsequent mortgages, or borrowers. • Problems with NY Fed Equifax: – Similar source as NMDB, but unit of observation is borrowers not loans. – Same problems as underlying NMDB—duplication, hanging performance and servicing sales—but not cleaned as the NMDB will be, so you don’t know it. – Little supplementation with other data and difficult to link files over time. National Mortgage Database 4 What is Missing in Bureau Data? Key items missing are property value (LTV) and characteristics, borrower characteristics (e.g. age, race, income, gender), and some mortgage characteristics (e.g. ARM status, PMI, origination channel). The database is being supplemented with information obtained from matching to existing external sources (some still under negotiation): » » » » » » Home Mortgage Disclosure Act (HMDA) (70% match rate gives income/race). Property transaction (deed/title) data (55% match rate). MLS data (useful for purchase price in non-disclosure states). Property appraisal data. Household moving/address information on last three addresses. Third party servicing data (e.g. LP, LPS). Private label MBS data. Maybe securities data as well (e.g. Ginnie Mae, GSEs). » Administrative files (FHA, VA, RHS, GSEs, home loan banks and possibly large banks). 47% of sample loans are Gov’t-backed. An additional 17% of borrowers have a non-sampled Gov’t backed loan. » Data on age, gender and marital status from public records collected by the credit bureau. National Mortgage Database 5 What Specific Fields will NMDB Have? For each sample mortgage the database will have: » Monthly—Performance (delinquent, current, foreclosure); balance; scheduled payment; actual payment; escrow payment; amortizing contract rate. » Fixed Characteristics—Date opened; term; amount borrowed; number of borrowers; mortgage purpose (home purchase, refinance, new mortgage on free and clear property); owner occupancy status; type of mortgage (FHA/VA/RHS/home improvement/manufactured housing/other); GSE (Fannie/Freddie/Ginnie/Private MBS); servicer type; balloon amount and date; appraised property value, APR, CLTV, LTV and DTI used in underwriting; ARM status; PMI; date closed, payoff amount, and termination form (if closed). » Modification/foreclosure status—date entered modification/foreclosure; change in terms; special program (HARP/HAMP); part of bankruptcy; charge-off amount. For each sample mortgage co-signer the database will have: » Age (date of birth); gender; marital status; deceased indicator; race/ethnicity (from HMDA); income at the time of origination (from HMDA); » Quarterly Vantage Credit Score, bankruptcy, and income estimator » Do they live in property associated with mortgage; first-time homebuyer; censustract/zip code and timing of last three addresses National Mortgage Database 6 Specific Fields (continued) For the property associated with each sample mortgage the database will have: » Quarterly—LTV; CLTV; and value (from AVM model). » Fixed Characteristics—Date purchased; purchase amount; location (census tract, MSA and Zip Code); type of property (e.g. single family); age of structure; square footage; assessed value; owner-occupied. For all concurrent 2nd liens on the property associated with each sample mortgage the database will have: » Monthly—Performance (delinquent, current, foreclosure); balance; scheduled payment; actual payment; escrow payment; amortizing contract rate; credit limit (if a HELOC). » Fixed Characteristics—Date opened (piggie-back or not); term; open- or closedend; amount borrowed (or credit limit); number of borrowers; same servicer as 1st; date closed, payoff amount, and termination form (if closed). National Mortgage Database 7 Specific Fields (continued) For all other mortgages, credit cards, installment loans, student loans, auto loans, lines of credit, and other consumer loans associated with sample mortgage co-borrowers the database will have: » Monthly—Performance (delinquent, current, foreclosure); balance; scheduled payment; actual payment; escrow payment; amortizing contract rate; credit limit (if open-ended). » Fixed Characteristics—Type of credit; date opened; term; open- or closed-end; amount borrowed (or credit limit); number of borrowers; same servicer/property as sample mortgage; date closed, payoff amount, and termination form (if closed). Information on inquiries and public records for all borrowers associated with sample mortgages will also be gathered. An origination survey will be sent (mailed) to a representative subset of new mortgagees in the database each quarter. The survey has been pre-tested three times with response rates of 60 and 45 percent for the last two pilots. The survey is designed to pick up information on issues like loan shopping and suitability that are not available from any other source. National Mortgage Database 8 Timeline Contract signed with Experian on September 27, 2012. Initial data delivery took place in December 2012—1/20 sample of all loans in existence between January 1998 and June 2012 (10.1 million loans and 14.7 million borrowers after preliminary cleaning). An analytic group at FHFA, Freddie Mac and CFPB is processing and cleaning the data and will match it to external sources, impute data for loans that cannot be matched, and develop a series of regular reports and queries to facilitate use of the NMDB. » It will likely take until next spring to finish cleaning the data. » 8 FTEs working on the project—major commitment of FHFA. » Many challenges in following people and mortgages (e.g. servicing is sold; people die or are added to mortgages). An existing pilot prototype dataset in development for 2 ½ years funded by Freddie Mac (1/500 sample of loans outstanding since 2003). » Prototype will be maintained and updated until at least summer 2013. » Already used in FHFA’s 2012 HERA-mandated report. » Pilot testing of an additional Origination survey and a Delinquency Survey. National Mortgage Database 9 Access and the Future NMDB is being set up as a public good. We believe that the contract signed with Experian is a model for data access. The challenge is to (1) protect borrower/lender personally identifiable information and (2) provide useful data. Local geography is critical for mortgage analysis. Our solution: Data is physically housed only on a FHFA/CFPB server. Access, however, is allowed for any federal government/reserve bank/GSE employee going through access process: » Must sign an agreement not to reverse engineer identity of borrower or lender. Severe penalties for violations of agreement. » All work behind a firewall—data can’t be removed. » NMDB software must support a variety of purposes—simple queries (number of new mortgages in California) to complex research projects. » We are working to allow broader academic/research public access via Census-style programs. National Mortgage Database 10 Examples of how NMDB can be used Example 1: Second liens Example 2: Loan performance transition matrix Example 3: Credit tightening Example 4: Market Comparisons All examples with 2010 data using the Prototype National Mortgage Database 11 Example 1: Second liens NMDB coverage is more extensive than HMDA’s 0.5 1.0 HMDA First Lien HMDA Second Lien 0.0 Millions of Mortgages 1.5 NMDB First Lien NMDB First with Second NMDB Second by Second Open Date 2004 2005 2006 2007 2008 2009 2010 Open Date National Mortgage Database 12 Example 1: Second liens (continued) Default (90d or worse) Default (90d or worse) Default rates are higher for firsts with seconds First with Second 40% First without Second 30% 20% 10% 0 Q1 Q2 2004 Q3 Q4 Q1 Q2 2005 Q3 Q4 Q1 Q2 2006 Q3 Q4 Q1 Q2 2007 Q3 Q4 Q1 Q2 Q3 2008 Q4 Q1 Q2 2009 40% 30% 20% 10% 0 2004 2005 2006 2007 2008 2009 First Lien Open Date First with Concurrent non HELOC First with Concurrent HELOC First with Subsequent non HELOC First with Subsequent HELOC National Mortgage Database First without Second 13 Example 1 continued Performance of firsts w/ different types of seconds Concurrent Seconds and Firsts perform similarly Seconds perform better Seconds perform worse ALL 87% 8% 5% Subsequent HELOC non HELOC HELOC non HELOC 89% 7% 4% 87% 8% 5% 88% 7% 4% 78% 15% 7% 88% of firsts and their associated seconds perform similarly GSE firsts and their associated seconds perform better than non-GSE loans. Firsts with piggyback non-HELOC (closed end) seconds have the highest default rates. When performance diverges, seconds tend to out-perform their associated firsts. National Mortgage Database 14 Example 2: Loan performance transition matrix 60D+ loans tend to worsen in performance May 2010 Performance April 2010 performance Total Current 30D 60D 90D 120+ D FCL No hist Closed Current 95.36% 0.96% 0.06% 0.02% 0.04% 0.02% 1.83% 1.72% 100% 30D 26.36% 41.84% 25.33% 0.42% 0.20% 0.18% 3.92% 1.75% 100% 60D 7.23% 12.83% 35.13% 38.33% 0.27% 0.97% 3.11% 2.13% 100% 90D 6.20% 1.71% 5.40% 25.27% 46.50% 8.31% 5.18% 1.45% 100% 120+ D 2.91% 0.75% 0.48% 1.04% 75.56% 9.90% 6.83% 2.53% 100% FCL 1.59% 0.06% 0.03% 0.00% 2.53% 85.27% 3.82% 6.70% 100% No hist 10.43% 0.11% 0.23% 0.04% 0.79% 0.28% 86.51% 1.60% 100% 82.12% 1.72% 0.86% 0.48% 2.24% 2.15% 8.59% 1.84% 100% Row Percent Total 95% of current mortgages remain current the next month. Slightly over 40% of 30-day delinquent loans remain 30-day delinquent the next month (the mode), with roughly equal percentages transitioning into current and 60day delinquent. The disproportionate share of loans delinquent 60 or more days transition into an even worse performing state the next month. National Mortgage Database 15 Example 2: Loan performance transition matrix (continued) First without seconds cure more frequently Current Current 30 D 100 20 95 15 90 10 85 5 80 90 D 120+ D Firs ts no Sec ond Firs ts w ith Sec ond 0 35 30 30 D 60 D 25 20 50 40 45 35 40 30 35 25 30 20 20 25 40 15 20 35 10 15 30 5 10 25 0 5 20 20 20 20 30 15 15 15 25 10 10 10 20 5 5 5 15 0 0 0 10 60 D 50 45 40 35 90 D 60 55 50 45 Jun 06 Jun 07 Jun 08 Jun 09 Jun 06 Jun 07 Jun 08 Jun 09 Jun 06 Jun 07 Jun 08 Jun 09 Jun 06 Jun 07 National Mortgage Database Jun 08 Jun 09 Jun 06 Jun 07 Jun 08 Jun 09 16 Example 3 Credit quality of originations Vantage Sc ore Dis tribution for Purc has e and Refinanc e Originations 1000 V ant age S cor e 900 800 700 600 500 H1 H2 2003 H1 H2 2004 H1 H2 2005 H1 H2 2006 H1 H2 2007 H1 H2 2008 H1 H2 2009 H1 H2 2010 Originat ion Dat e Purchase Ref inance Not e: The dat a are weight ed values f rom t he NMDB and include jumbo loans. Purpose is ident if ied using credit bureau and HMDA dat a. The box represent s t he middle 50% of t he observat ions, t he median is marked by t he whit e line in t he box and t he dot t ed lines ext end t o t he 5t h and 95t h percent iles. The widt hs of t he boxes are proport ionat e t o t he volume of loans. Score distributions are the tightest (lowest risk) since 2003. » The Vantage score cutoff, as measured by the 5th percentile of the score distribution, is currently set higher for both purchase and refinance loans than at any point since 2003. Refinance mortgages appear to face an especially high score cutoff. National Mortgage Database 17 Example 3 continued Credit quality of originations—GSE comparison Va n ta g e Sc o re Di s tri b u ti o n fo r Co n v e n ti o n a l , Co n fo rm i n g GSE a n d n o n -GSE Pu rc h a s e a n d Re fi n a n c e Ori g i n a ti o n s 1000 V antage S core 900 800 700 600 500 1000 V antage S core 900 800 700 600 500 H1 H2 2003 H1 H2 2004 H1 H2 2005 H1 H2 2006 H1 H2 H1 2007 H2 2008 H1 H2 2009 H1 H2 2010 Originat ion Dat e GSE Pur chase non- GSE Pur chase GSE Ref inance non- GSE Ref inance Not e: The mar ket ar e weight ed values f rom t he NMDB and exclude u j mbo and FHA/ VA loans. Purpose is ident if ied using cr edit bureau and HMDA dat a. The box r epresent s t he middle 50% of t he obser vat ions, t he median is marked by t he whit e line in t he box and t he dot t ed lines ext end t o t he 5t h and 95t h percent iles. The widt hs of t he boxes are proport ionat e t o t he volume of loans. The credit quality of GSE loans significantly exceeds that of non-GSE loans, especially for purchase money mortgages. National Mortgage Database 18 Example 4: Market comparisons Comparison of FHA Originations Va n ta g e Sc o re Di s tri b u ti o n fo r No n -FHA L o a n s a n d FHA L o a n s V ant age S core (rescal ed) 800 750 700 650 600 H1 H2 2000 H1 H2 2001 H1 H2 2002 H1 H2 2003 H1 H2 2004 H1 H2 H1 2005 H2 2006 H1 H2 H1 2007 H2 2008 H1 H2 2009 H1 H2 2010 H1 H2 2011 Originat ion Dat e Non- FHA/ VA FHA/ VA Not e: The mar ket is weight ed values f rom NMDB and includes jumbo loans. The box r epresent s t he middle 50% of t he obser vat ions, t he median is mar ked by t he whit e line in t he box and t he lines ext end t o t he 5t h and 95t h percent iles. The widt hs of t he boxes are pr opor t ionat e t o t he volume of o l ans. The credit quality of FHA/VA market originations is consistently lower than that of non-FHA/VA market originations. This difference in quality diminished somewhat during the height of the boom (2004 through 2006), and has increased since 2007. National Mortgage Database 19 Example 4: Market comparisons (continued) Monitoring and benchmarking FHA • Monitoring—As of June 2010 (for loans originated since 2003): – 13.4% of FHA loans were either in a state of delinquency or were closed with a loss. – 11.5% of all open FHA loans were in a state of delinquency. – Comparable figures for VA were 8.5% and 6.9%, respectively; – Comparable figures for RHS were 6.9% and 6.5%, respectively. • Benchmarking—Controlling for loan size, geography (state) and cohort: – FHA is underperforming. The average delinquency rate of loans with FHA’s mix of loan size, state and cohort is 7.9% and 6.2%, respectively. – FHA’s worst performing book year is 2007, with an “excess delinquency rate” of 12.4% above average. – Newly eligible FHA loans (above old limits) are performing worse than market by about 4 percentage points. However, this may be market effect—FHA loans in same markets but below old limits have about the same excess delinquency. – VA and RHS are performing as predicted (+/- 0.5 percentage points). National Mortgage Database 20