Gridlock: The Impact of Income on Commutes to Work December 2022 In this paper, I examine the relationship between incomes and commute times to work in the United States. Using data from the 2018 American Community Survey, an increase in wages by 1% led to an increase in expected commutes by 2.26 minutes. These results imply that wealthier individuals live further away from their place of work, likely in nonmetropolitan areas. In 2021, Americans spent 91 billion hours on the road (American Driving Survey, 2022). A steady increase since 2015, commute times to work were also at an all-time high. With traffic being more prevalent than ever in today’s world, this paper examines whether income affects the length of an individual’s commute to work. Some studies have already attempted to convey the relationship between commuting and wages. However, few have collected data from the United States, and of those studies, many failed to account for the geographical effects of commuting behaviors. Urbanization factors and traffic conditions are much more likely to affect commuters in metropolitan cities than in rural areas. Past studies have either neglected to acknowledge these differences or confined their sample to a few large cities. My findings contribute relevant results on commute times while considering the effects of location and income. I used cross-sectional data from the 2018 American Community Survey (ACS) to examine a large sample of individuals in the labor force between the ages of 18 and 65 that reported their wages and commute times. The ACS divides respondents into sectors called Public Use Microdata Area (PUMA) codes. I used a PUMA-code fixed effect and multiple regression, controlling for race, gender, marital status, age, class of worker, and education. The need for a fixed effect comes from the idea that the US is geographically-diverse, and commuting terrains and behaviors vary within each sector across the country. Further, controlling for these demographic variables correlated with individuals’ wages will isolate the effect of income on commute length. Because the income distribution was heavily skewed, I employed a logarithmic transformation to explain how wages affect commutes. The estimates of my regression suggest that a 1% increase in income is associated with an expected 2.26–minute increase in an individual’s commute to work. Given that most of the US population lives in urbanized areas (US Census, 2010), these results imply that wealthier individuals choose to live further away from their jobs and in more suburban and rural locations. Conversely, lower-class households likely live within large cities. In a country with one of the highest median incomes and vehicles per capita, my findings provide insights into housing demographics that could help improve public transit systems and traffic reduction policies. The remainder of the paper is structured as follows: Section I provides context for the paper and the economic theory of change. Section II details the data and provides the economic specification. Section III describes the empirical analysis. Section IV concludes the paper. I. Context, Theory of Change I examine this relationship in response to the growing number of individuals beginning to commute to work post-pandemic and the historically high traffic density in large cities across the country. In a given area, the traffic conditions are likely the same for each commuter. Thus, the main difference in commute times is how far individuals live from their jobs. Many factors can dictate where one lives, but one of the main housing preferences for those in the labor force is proximity to work. Most individuals prefer to live near their job or search for employment close to where they live. Thus, when considering two jobs of different commute lengths, most would have to be compensated with higher incomes if it meant having a longer commute. One’s willingness to travel further to work is greater when wages are higher. Therefore, higher wages can reasonably be associated with longer commutes. Socioeconomic status may also play a role in housing locations. Lower-class individuals have fewer affordable living options, which often results in purchasing smaller housing commonly found in urban areas, such as condos or apartments. Wealthier individuals likely prefer more rural areas with larger and more secluded homes. In a country where most jobs are in larger cities, I hypothesize that higher incomes are associated with longer commutes to work based on the income effect on commuting and housing by socioeconomic factors. We can group related literature on income and commutes into two strategies. French et al. (2020) and Dargay and Van Ommeren (2005) test this relationship while controlling for demographic characteristics that affect income. French et al. (2020) use a multiple regression that includes race, age, marital status, education, and household occupancy as explanatory variables. However, their study aims to find the effect of commuting times on earnings, the opposite of this paper. Their data is also extremely limited in that all the respondents own cars and are similar in age, an unrealistic representation of the US population. Dargay and Van Ommeren (2005) use 11 years of panel data to employ a fixed effects regression to control for gender, race, and education, though their data comes from the British Panel Housing Survey. They also failed to acknowledge the potential effects of geography on commute times, which is not controlled for by their fixed effect. This paper improves on the works of French et al. (2020) and Dargay and Van Ommeren (2005) because I account for the effects of geographical commuting conditions in addition to using a dataset that is representative and applicable to the actual US population. It is worth noting that both studies similarly found a positive relationship between incomes and commutes, though their coefficient estimates are less than mine. Bogomolov et al. (2019) and Johnston (2019) convey the relationship between income and commutes based on the effect of living in metropolitan and micropolitan areas. Bogomolov et al. (2019) used a gravity model to measure this relationship in 12 highly populated US cities. Their findings failed to provide any information on commuting occurring outside urban areas or even identify a pattern that applies to all large cities. Moreover, they group incomes into three large ranges, reducing the precision of their results. Johnston (2019) uses the 2012–2016 ACS 5year estimates to regress mean commute times as a function of income and population density in the respondent’s city. I conduct a similar study with more updated data while also considering the demographic effects on income that could also explain commute lengths. This paper combines the efforts from these income and geography–related strategies by using both as controls to explain wages and commute times. II. Data and Economic Specification A. Data The data used in this paper comes from the American Community Survey, an annual survey conducted by the US Census Bureau covering a wide range of social, economic, occupational, and demographic variables. Specifically, I used the 2018 cross-sectional data from the ACS that collected information from 3.3 million households. The ACS selects a random sample of addresses representative of the geographical population. The dataset contains all the necessary variables relating to income and commute times. From the data, the average commute time to work is 27.29 minutes. The adjusted mean income of the respondents is $52,481 (see table I). The income distribution has a wide range and is heavily right-skewed (see figure II). Thus, after conducting a skew test, I determined it would be appropriate to use a logarithmic transformation for income to explain their effect on commute times. My estimations based on this data may present potential issues in that survey respondents likely round their income and commute times to even multiples. Rounding is likely why the commuting distribution appears to be multimodal (see figure III). Thus, the precision of my estimates may be affected because most respondents do not know or do not report their exact values for income and commutes. Many demographic variables in the data exhibit mean wages that differ among each subgroup (see table IV for values). The average income for males is $20,735 more than for females, a reflection of the gender pay gap in the United States. The mean wages for White and Asian individuals are higher than the averages for Blacks and Hispanics, and the average income for married individuals is $30,096 more than for unmarried individuals. One’s class of work also appears to influence mean wages. Individuals who work for the federal government or are selfemployed have the highest average incomes. There is a positive relationship between the amount of education one achieves and their mean wage, as more educated individuals are more likely to obtain higher-paying jobs. There is also a roughly positive relationship between one’s age and average income. Higher-status positions are commonly held by individuals with more work experience and are, therefore, older than entry-level workers. From these differences in mean wages, I determined that these variables are all factors that affect one’s income. I examined the extent of their effects on wages and commute times in my analysis. The data also provides variables on commuting behaviors (see table V). The average vehicle occupancy to work is 1.158 people. Individuals who do not carpool to work have a mean commute time that’s 5.5 minutes less than those who do carpool. Of the respondents who reported their mode of transportation, 4.43% take public transport to work (bus, train, ferry, or subway), which on average takes 26.4 minutes longer. The data also reports one’s typical departure time for work, which assigns values of departure time in hour intervals (24-hour time). The average person left for work around 9:00 am with most (83 percent) departing before 10:00 am. The ACS also precisely classifies where its respondents are from using Public Use Microdata Area (PUMA) codes. These areas group the US into statistical, non-overlapping regions of roughly 100,000 people (US Census, 2010). Like zip codes, PUMA codes can classify respondents by geography, which allows me to control incomes and commute times by location. The ACS is a mandatory survey that all selected households must answer. The US Census Bureau mails its surveys to select addresses but will conduct over-the-phone or in-person follow-ups if surveys are not adequately completed or taken promptly. However, not all questions of the ACS are mandatory. Thus, the variables for commute times and income both have missing values. 203,000 respondents did not report their commute times to work but did report their wages. Of these respondents, 146,000 reported earning income even when they listed themselves as not in the labor force or unemployed. The other missing observations may be from individuals who do not commute because they work from home. 53,000 individuals did not report their wages but did report their commute times. Because not all the questions asked in the survey are mandatory, many respondents likely declined to answer various questions. I acknowledge that these missing values may slightly affect my results. However, I did not identify a source of sampling bias caused by a systematic or nonrandom explanation for the missing data. I proceeded to discard these missing values by restricting the sample: I observed the results from respondents between ages 18 and 65 who are in the labor force (the ages of all respondents ranged from ages 0 to 96) who also answered both questions on their commute times and incomes. 1.2 million individuals in the dataset fit this restriction. Overall, the American Community Survey presents an ideal dataset to discuss income and commutes because of its large sample size, which will closely converge to the actual relationship with a small standard error. Given the model for the US population, the data will likely accurately account for the differences in traffic by location. Panel data may be preferred to account for unobserved, time-invariant variables, but the data used still presents accurate results for my research. B. Economic Specification I assess the relationship between income and commute times to work. My results represent the US population, and I used a fixed effect, multiple regression. I restricted my regression data to individuals between 18 and 65 in the labor force who reported both their wages and commute times to work. My economic specification is: Ci = ß0 + ß1ln(Ii) + Sd PUMA Effect + ß2Fi + ß3Esi + ß4Rmi + ß5Wni + ß6Mi + ß7Ai + ß8Ai2 + εi where Ci represents the outcome of commute time to work, and ln(Ii) represents the logarithmic transformation for incomes. Sd is the value of PUMA code-specific intercepts generated by the fixed effect. The variable for PUMA codes was originally separated by state, meaning that some codes overlapped across different states. Before inserting the fixed effect, I assigned a new value for each PUMA-designated area that grouped states and PUMA codes so I could distinguish between every identification number. Fi is the dummy variable for gender (female = 1), Mi is the dummy variable for marital status (married = 1), and Ai represents the respondent’s age. Esi represents the categorical variable of the highest level of education achieved, Rmi is the respondent’s race (White, Black, Hispanic, Asian, other), and Wni is the categorical variable for the class of worker (see table IV). εi is the error term. I use Fi, Esi, Rmi, Wni, and Mi as explanatory variables after identifying that their subgroups influence mean wages (table IV). These variables are a slightly modified version of the regression from French et al. (2020) while adding a fixed effect. The PUMA fixed effect controls for the different commuting conditions likely associated with population density such as traffic, fuel prices, or other road conditions. I did not include explanatory variables for commuting behaviors because they are likely influenced by geography. Those who carpool or take public transit to work likely live in metropolitan areas or densely populated cities, which is controlled by the fixed effect. I determined that age more closely fits a nonlinear relationship with wages according to the data (table VI). Previous empirical work also exhibited the nonlinear relationship between age and income (Rosenzweig, 1976). Therefore, I used the variables age and age2 to explain wages in my framework. From this specification, I expect to find a positive ß1 value for ln(Ii) given the included controlled factors. III. Empirical Analysis I controlled for the relationship between income and commutes in two ways. Firstly, I examined the effects of geography and determined that mean commute times vary greatly across different PUMA codes. I then regressed commute times on incomes using these PUMA codes as a fixed effect (see Table VII). This caused a change in the coefficient for ln(Ii) and an increase in the adjusted R2 value from .02 to .08. With the fixed effect, a 1 percent increase in wages is associated with an expected 2.73–minute increase in commute time to work. This value is equivalent to 0.12 standard deviations in commute times. Secondly, I identified demographic characteristics that could affect income by examining the difference in mean wages among subgroups of variables (see Table IV). I ran simple regressions of these demographics on wages to determine the extent of their effects. After producing significant coefficients for each, I combined these variables with the fixed effect to produce the multiple regression outlined in my framework (see Table VII). Upon adding each variable, my results for the coefficient for ln(Ii) suggest that all the chosen demographic variables are correlated with income. A 1 percent increase in wages is associated with a 2.26-minute increase in commutes to work. The standard deviation for the distribution of the logarithmic transformation of wages is 1.3. Therefore, an increase in 1 standard deviation, roughly, is associated with about a 0.10 standard deviation increase in commute time. The coefficient for ln(Ii) is statistically significant at the one percent level, and the adjusted R2 value is 0.088. The positive correlation between income and commutes matches my hypothesis and theory outlined in Section I. Individuals prefer higher-paying jobs and are willing to broaden the radius of their job search to earn a higher income. I conducted F-tests for all the included coefficients and determined that all values were significant at the 5 percent significance level except for the variables “some college” and “advanced degree.” However, there is a strong positive correlation between wages and education and commute times and education that justifies its inclusion in my regression analysis. My study consisted of individuals in the labor force between 18 and 65 that answered both questions in the survey about their income and commutes. I believe that the selection on observables is a sufficient estimator for the true population. Between the fixed effect and demographic variables, there is reason to believe that the Ordinary Least Squares estimator is unbiased. The use of the absorbed fixed effect based on PUMA codes controls for many commuting behaviors and unobserved variables. Decisions such as the use of public transportation, carpooling, or leaving earlier in the morning are all factors that are influenced by where one lives. Large cities or more highly populated areas likely cause more individuals to carpool, leave earlier, or take public transit due to traffic flows, all of which are covered by my fixed effect. Therefore, any omitted variables surrounding commuting behaviors or tendencies do not directly affect the relationship of interest. The sample selection was also conducted in an unbiased process, eliminating the possibility of sampling bias. The ACS selected a random sample of citizens to complete a mandatory survey sent through the mail and ensured it was completed to satisfaction. Therefore, non-response and interviewer bias is eliminated because of the credibility of the US Census Bureau and the lack of in-person surveying. The positive correlation between income and commute times to work suggests significant economic information about the housing demographic. The results signify that higher-income individuals generally commute further to work. Given that most jobs are in metropolitan areas, this means that wealthier individuals who commute further to work often live outside of the city. This relation is commonly observed in the US, as wealthier individuals seem to prefer quieter lifestyles with more open, rural homes where plots are larger. In a society where neighborhoods are largely clustered based on economic status, the estimated results confirm the fact that commutes are divided by social classes, with wealthier individuals all tending to commute to work from outside metropolitan areas. The main underlying threat to the internal validity of my results is the possibility of reverse causality. There have been studies that claim that commute times to work have a causal effect on incomes. It is also plausible that simultaneity exists in the true relationship. It seems more plausible that individuals decide how far they would like to commute between two distances based on the job with the higher income. However, only the use of an instrumental variable would be able to eliminate this threat. IV. Conclusion Few studies specifically examine the impact of income on commute times to work in great depth. This paper provides more detailed information using comprehensive data from the American Community Survey. I found that a 1 percent increase in wages causes an expected 2.26-minute increase in commute time to work. This positive correlation represents the housing demographic in the US. Wealthier individuals live further away from their place of work, and individuals appear more willing to commute further to work if it means earning a higher income. My findings echo the same positive relationship as past studies while definitively determining the magnitude of the income effect on housing and commuting using the necessary control variables. These results offer relevant applications to individuals’ health and city commuting plans. Many individuals prefer to drive further for higher wages, which may deteriorate their mental and physical health due to extended periods in cars daily (Ding et al., 2014). Firms that offer higher wages have a larger radius of potential labor recruits. Ding et al. (2014) found that individuals who spend over 120 minutes driving each day experienced higher rates of obesity, sleep deprivation, and poor mental health. My results indicate that there may be implicit costs to travelling further to work for a higher income. Poorer individuals tend to live closer to their place of work. With most jobs being in metropolitan areas, this suggests that lower-class workers also live in these sectors. This information could help improve public transportation and route plans in cities where traffic is often the heaviest. Understanding the housing demographic of where workers are commuting from could provide relief for busy city streets. This study does have some limitations that I was not able to address. Because many respondents tend to round their responses for income and commutes, the precision and magnitude of my estimates may be slightly off. However, I believe the positive trend from my results sustains given that commutes are likely only rounded to the nearest five minutes, and incomes are likely only rounded to the nearest thousand dollars. The ACS is the closest dataset that details the entire US population. The large sample size and relatively low nonresponse rate present close estimates to the true relationship. This study does not account for unmeasurable variables, such as personality traits, which may explain why individuals choose where to live, or innate ability, which directly influences income. An instrumental variable would be useful to account for this and other possible confounding variables. In the end, despite these slight caveats and limitations in the data, this paper solidifies claims about the general trend of housing demographics and commuting behaviors based on income. References Bogomolov, Yuri, Mingyi He, Devashish Khulbe, and Stanislav Sobolevsky. (2021). Impact of income on urban commute across major cities in the US. Procedia Computer Science 193: 325-332. Ding, Ding, et al. (2014). Driving: a road to unhealthy lifestyles and poor health outcomes. Public Library of Science One 9 (6). French, Michael T., Ioana Popovici, and Andrew R. Timming. (2020). Analysing the effect of commuting time on earnings among young adults. Applied Econometrics 52 (48): 5282-5297. Johnston, Ahren. (2019). A note on commute times and average income levels. The Open Transportation Journal 13: 151-153. Dargay, Joyce M., and Jos Van Ommeren. (2005). The effect of income on commuting time – an analysis based on panel data. 45th Conference of the European Regional Science Association: “Land Use and Water Management in a Sustainable Network Society. Rosenzweig, Mark R. (1976). Nonlinear Earnings Functions, Age, and Experience: A Nondogmatic Reply and Some Additional Evidence. The Journal of Human Resources 11 (1): 23-27. Tefft, Brian C. (2018). American Driving Survey, 2015-2016 (Research Brief). AAA Foundation for Traffic Safety Tefft, Brian C. (2022). American Driving Survey, 2020-2021 (Research Brief). AAA Foundation for Traffic Safety. US Census. (2021). Urban Area Facts. TABLE I – Baseline Summary Statistics for Incomes and Commutes (1) N (2) Mean (3) Standard Deviation (4) 25th Percentile (5) 75th Percentile Commute Time to Work, minutes 1.395e+06 27.29 23.29 12 35 Adjusted Income, USD 1.574e+06 52,481 66,325 16,210 65,851 VARIABLES Notes: This table presents the summary statistics for the two main variables of interest: commute times to work and adjusted incomes. All statistics are calculated from the 2018 American Community Survey. Column 1 presents the total number of observations for each variable from the dataset. The total number of observations for commute time to work is 1,395,191. The total number of observations for adjusted income is 1,575,313. Commute times are measured in minutes, ranging from a 1-minute commute to 188 minutes. Wages are measured in US dollars, ranging from $4 to $727,404. Notes: This graph presents the kernel density distribution for incomes from the 2018 American Community Survey. The horizontal axis represents incomes of respondents measured in USD (2018 adjusted). The vertical axis represents the probability density function at each income level smoothed out using the kernel densities. The right-skewed nature of the distribution motivates our use of the logarithmic transformation to explain how income affects commute times to work. Notes: This graph presents the kernel density distribution for commute times from the 2018 American Community Survey. The horizontal axis represents the values for commute times to work measured in minutes. The vertical axis represents the densities for the probability density function for commutes, smoothed out using kernel densities. The distribution is clearly multimodal, likely because of responses being rounded to the nearest five or ten minutes. TABLE IV – Summary Demographic Statistics and Relationship with Wages (1) Frequency (2) Percentage (3) Mean Wage (USD) (4) SD Wage (USD) Male 1,639,921 1,574,618 51.02 49.98 41,778 62,513 49,733 77,433 White Black Hispanic AAPI Other 2,160,997 327,955 466,261 206,707 52,619 67.23 10.20 14.50 6.43 1.64 56,555 37,874 37,776 64,648 70,695 45,060 43,688 78,302 1,375,888 1,838,651 42.80 57.20 66,412 36,315 77,102 46,029 Class of Worker (if in LF) Private, for profit Federal Government State Government Local Government Self-Employed, incorporated Self-Employed, unincorporated Without pay Private, not for profit 992,986 36,756 72,003 107,396 60,587 94,793 3,314 130,532 66.27 2.45 4.81 7.17 4.04 6.33 0.22 8.71 51,171 66,505 50,175 47,379 81,283 40,752 66,915 49,955 45,546 39,820 108,995 70,722 51,710 63,695 Education Level Less Than High School High School Some College College Advanced Degree 834,548 705,274 785,220 486,531 307,819 26.75 22.61 25.17 15.60 9.87 23,861 34,608 40,104 70,131 101,245 33,297 36,750 42,860 75,557 105,803 VARIABLES Female Race Married Not married Notes: This table presents the summary statistics for demographic characteristics. The data comes from the 2018 American Community Survey. Column 1 presents the number of observations in each demographic’s subgroup. Column 2 presents each subgroup’s relative frequency, separated by each demographic. Columns 3 and 4 represent the twoway summary statistics between each subgroup and their relative income. The two-way summary statistics are rounded to the nearest whole number. Key patterns and differences in mean wages within each category are presented in Section II. Not all respondents included in the survey answered these demographic questions, missing observations exist. TABLE V – Descriptive Statistics for Commuting Behaviors (1) N VARIABLES Vehicle Occupancy to Work (2) Frequency 1.263e+06 Takes Public Transportation No Pub. Trans. Hour of Departure for Work, 24-hours (3) (4) Relative Mean Frequency 65,474 1,413,447 (5) Standard Deviation (6) 25th Percentile (7) 75th Percentile 1.158 0.594 1 1 9.027 3.544 7 9 4.43 95.57 1.395e+06 Notes: This table presents the summary statistics for variables associated with commuting behaviors. Vehicle occupancy and Time of Departure are continuous variables, and Public Transportation is a binary variable. Columns 1, 3, 4, 5, 6, and 7 present the summary statistics for the continuous variables. Columns 2 and 3 represent the frequency and relative frequency distribution for the public transportation variable. A total of 1,478,921 individuals answered whether they used public transportation to work. The ACS defines public transit as any of the following modes of transportation: bus, trolley bus, streetcar/trolley car, subway, railroad, or ferryboat. TABLE VI – Comparing Linear and Nonlinear Regressions Between Wages and Age VARIABLES Age (1) Linear Regression (2) Quadratic Regression Wages (USD) Wages (USD) 836.2*** (3.387) Constant 16,739*** (153.8) 5,843*** (18.66) -57.17*** (0.210) -79,394*** (383.3) Observations R-squared 1,574,313 0.037 1,574,313 0.081 Age2 *** significant at the 1 percent level, ** significant at the 5 percent level, * significant at the 10 percent level Notes: This table presents two types of regression to determine if age has a linear or non-linear relationship with wages. Standard errors for the coefficients are given in parentheses. All coefficients are significant at the 1 percent level. Based on previous studies and these results, wages can be more closely explained by age using a quadratic regression. The R-squared value for the quadratic regression is .081, greater than that for the linear regression (.037). TABLE VII –Income and Commutes Including PUMA Fixed Effect and Demographics (2) With Fixed Effects (3) Final Regression Commute Time Commute Time 2.732*** (0.0195) 2.259*** (.0238) Female -2.529*** (0.0413) High School Some College College Advanced Degree -0.409*** -0.121 0.684*** 0.0427 (0.0939) (0.0922) (0.0985) (0.108) Black Hispanic AAPI Other 1.851*** 0.328*** 0.662*** 1.193*** (0.0807) (0.0707) (0.0872) (0.196) Federal Government State Government Local Government Self-Employed, Inc. Self-Employed, not Inc. Without Pay Non-profit 1.473*** -1.799*** -4.647*** -4.706*** -0.649** -1.740*** -1.771*** (0.134) (0.0900) (0.0720) (0.119) (0.302) (0.526) (0.0684) 0.175*** 0.320*** -0.00351*** (0.0461) (0.0117) (0.000137) (0.273) VARIABLES (1) Single Regression (4) Standard Errors for final regression Commute Time log(income) 3.078*** (0.0195) Married Age Age2 Constant 4.612*** (0.205) -0.978*** (0.203) -1.096*** Observations R-squared PUMA Fixed Effect? 1,233,67 0.020 NO 1,233,670 0.080 YES 1,233,670 0.088 YES *** significant at the 1 percent level, ** significant at the 5 percent level, * significant at the 10 percent level Notes: This table presents the changes in the coefficient for log(income) when fixed effects and explanatory variables are added given the restricted dataset outlined in Section II. Standard errors are given in parentheses below columns 1 and 2, and standard errors for column 3 are given in parentheses in column 4. Column 1 presents the single regression without controls. Column 2 presents a regression using the PUMA fixed effect. Column 3 presents the final regression used for analysis with PUMA fixed effects and demographic explanatory variables.