Kenny Weiss CMSC838S March 1, 2005 Application Assignment Exploration of IRS Tax Refunds from 1987-2003 by Quarter Introduction: With April 15th just around the corner, I thought it would be interesting to look at the Internal Revenue Service (IRS) website (http://www.irs.gov/taxstats/) to find some useful information about taxes levied by the government. Since it is always more fun to receive a check than to write one, I have focused on tax refunds, which are funds that the government returns when it determines that an individual (or company) has paid too much tax. Data: The IRS offers many different reports and data sets dealing with taxes and the demographics of taxpayers. The dataset that I chose deals with quarterly tax refunds between 1987 and 2003 (http://www.irs.gov/pub/irs-soi/04tr19fy.xls). The data splits the fiscal year into four quarters (inclusive): October- December January – March April-June July – September There are six dimensions for the type of return given: Individual – the tax that individuals file ( i.e. “income tax”). This is typically paid through Form 1040, but includes other sources such as capital gains. Corporation – taxes paid by corporations on profits Excise – tax paid for purchases of specific goods like fuel, alcohol and tobacco. Employment – taxes employers pay because of their employees Estate and Gift – taxes levied on gifts and estates. In 2002, these taxes only begin for gifts greater than $11,000 and estates greater than $1,000,000) Total – the sum of all the preceding dimensions Hypothesis: With all the confusing IRS paperwork to fill out, there are many people who pay too much in taxes, or have too much deducted from their salaries. They are then owed the difference. I expect most of these refunds to take place during the period encompassing “tax day”, April 15. Since taxes are due by April 15th, the beginning of the third quarter, and it takes a few weeks for the IRS to process tax forms, I would expect that most individual tax refunds are given around the third quarter. Since there are always those who fill out returns early and those who fill them out late, I would assume that most other refunds occur during the ‘July-September’ and ‘January-March’ quarters. Estate and gift taxes seem to be unrelated to individual taxes, but are probably filed at the same time as individual taxes, so I would assume that refunds for these are distributed at the same time as the individual tax refunds. I expect Excise and Corporate taxes to be different. Excise taxes are usually included in the price of an item and paid throughout the year. Also, one might expect external factors such as war to increase the amount of taxes paid (and hence, the amount refunded). Additionally, since corporations have different tax deadlines and regulations to meet, I would expect their refund patterns to be different from those of the individual taxpayer. Analysis: Since the data depends on time and the temporal relationship between events, I used TimeSearcher.1 I (manually) exported the data from the excel sheet into TimeSearcher to try to get a better understanding of the data 2. An initial view of the data can be seen in Figure 1. 1 2 I used version 1.3.7 of TimeSearcher. Available at http://www.cs.umd.edu/hcil/timesearcher/ My converted dataset can be found here Figure 1: View of data in Timesearcher After interacting with the data, I was able to confirm my hypothesis that most returns for Individuals occur around April 15th, as seen in Figure 2 below. However, the data indicates that refunds occur during the 2nd and 3rd quarter, which seems to imply a correlation between those who submit taxes early and refunds. Figure 2: Individual tax refunds (highlighted) given during different fiscal quarters from 1987-2003. Notice that the 2nd and 3rd quarters have significantly higher refunds than the 1st and 4th quarters I then wanted to see how the other categories fit the above profile. I used the “Display all variables” option of the view menu in addition to the normalized and deviations views to get a holistic feel for the entire dataset. I discovered that while the patterns of Corporate and ‘Estate and Gift’ taxes closely matched those of individual taxes, and were quite steady throughout the course of the year, the pattern for Employment and Excise taxes seemed somewhat erratic. Figure3: Each variable is shown separately with all four quarters shown. The data is shown in ‘Deviations’ (left) and ‘Normalized’ (right) mode to highlight differences. whereas Individual, Excise and Estate seem to be highly correlated and regular, Employment and Estate seem to vary more sporadically During this previous experiment, I noticed two outliers/anomalies in the data. First, I noticed a dip in the excise refund around 1993 followed by a spike in 1995 (see Figure 4). Although I could not find evidence to support this claim, I suspect that this can be attributed to gas prices during the gulf war. During 1993 when the war started, gas prices were high, and taxes were increased to dissuade people from purchasing gas. By 1995, after the situation had stabilized, the government offered incentives (refunds), or removed taxes to stimulate the economy and encourage the purchase of gasoline. Additionally, a separate tax was removed from crude oil in 1995 that taxpayers might not have been aware of when paying taxes, and hence were owed a refund.3 3 http://www.unclefed.com/Tax-Bulls/1996/ANN96-9.PDF Figure 4: Dip in excise refund 1993 and spike in 1995 Lastly, I noticed a spike in the individual tax refund in the fourth quarter (July-September) of 2001 (see figure 5). After investigating this outlier, I discovered the Economic Growth and Tax Relief Reconciliation Act of 2001,4 which began in July, 2001. This directly lead to the increase in refunds during this period. Figure 5: Spike in 2001 4th quarter Individual refunds marks Economic Growth and Tax Relief Act of 2001 Critique of tool: I used HCIL’s TimeSearcher tool, created by Harry Hochheiser and Ben Shneiderman. I found most of the features to be very intuitive. Additionally, I found TimeSearcher to have most of the features that I needed to accomplish my tasks. My major criticique of the tool was its data input functionality. TimeSearcher has a custom filetype: tqd. I found conversion of my excel file to the tqd format to be nontrivial. In fact, it took me several hours before I could use 4 http://www.infoplease.com/spot/01taxrefund.html my data in a way that made sense. Later, when I tried to think of my data differently, I was unable to convert the data into that new format.5 When I tried to load a malformed data file into TimeSearcher, the application silently failed. I had no clue as to where the problem was located or what the cause was. Some sort of response from the application about the failure would be very beneficial to users. Figure 6: In ‘display all variables’ mode, The user is unable to select a specific query variable from the lower left panel. Another feature that was missing from the application was the ability to select a query variable from lower left panel when in “Display all variables” mode. This is somewhat frustrating, since this mode allows the user to discriminate details in specific variables by comparing it to other variables, but is not easily able to tell which variable is the different one. Conclusion: TimeSearcher was an invaluable tool in deciphering this cryptic IRS dataset. It enabled me to observe how outlying data-points related to historical events influenced the refund patterns. Having discussed the benefit of TimeSearcher, I also mentioned some areas of possible improvement, most noticeably, data input. 5 Rather than thinking of the quarters as my dynamic data, I wanted to use the various refund types as my query variables, and have quarters be the time points.