Exploration of IRS Tax Refunds from 1987-2003 by Quarter (TimeSearcher)

advertisement
Kenny Weiss
CMSC838S
March 1, 2005
Application Assignment
Exploration of IRS Tax Refunds from 1987-2003 by Quarter
Introduction:
With April 15th just around the corner, I thought it would be interesting to
look at the Internal Revenue Service (IRS) website (http://www.irs.gov/taxstats/)
to find some useful information about taxes levied by the government. Since it is
always more fun to receive a check than to write one, I have focused on tax
refunds, which are funds that the government returns when it determines that an
individual (or company) has paid too much tax.
Data:
The IRS offers many different reports and data sets dealing with taxes and
the demographics of taxpayers. The dataset that I chose deals with quarterly tax
refunds between 1987 and 2003 (http://www.irs.gov/pub/irs-soi/04tr19fy.xls). The
data splits the fiscal year into four quarters (inclusive):




October- December
January – March
April-June
July – September
There are six dimensions for the type of return given:






Individual – the tax that individuals file ( i.e. “income tax”). This is
typically paid through Form 1040, but includes other sources such as
capital gains.
Corporation – taxes paid by corporations on profits
Excise – tax paid for purchases of specific goods like fuel, alcohol and
tobacco.
Employment – taxes employers pay because of their employees
Estate and Gift – taxes levied on gifts and estates. In 2002, these taxes
only begin for gifts greater than $11,000 and estates greater than
$1,000,000)
Total – the sum of all the preceding dimensions
Hypothesis:
With all the confusing IRS paperwork to fill out, there are many people
who pay too much in taxes, or have too much deducted from their salaries. They
are then owed the difference. I expect most of these refunds to take place
during the period encompassing “tax day”, April 15.
Since taxes are due by April 15th, the beginning of the third quarter, and it
takes a few weeks for the IRS to process tax forms, I would expect that most
individual tax refunds are given around the third quarter. Since there are always
those who fill out returns early and those who fill them out late, I would assume
that most other refunds occur during the ‘July-September’ and ‘January-March’
quarters.
Estate and gift taxes seem to be unrelated to individual taxes, but are
probably filed at the same time as individual taxes, so I would assume that
refunds for these are distributed at the same time as the individual tax refunds.
I expect Excise and Corporate taxes to be different. Excise taxes are
usually included in the price of an item and paid throughout the year. Also, one
might expect external factors such as war to increase the amount of taxes paid
(and hence, the amount refunded). Additionally, since corporations have
different tax deadlines and regulations to meet, I would expect their refund
patterns to be different from those of the individual taxpayer.
Analysis:
Since the data depends on time and the temporal relationship between
events, I used TimeSearcher.1 I (manually) exported the data from the excel
sheet into TimeSearcher to try to get a better understanding of the data 2. An
initial view of the data can be seen in Figure 1.
1
2
I used version 1.3.7 of TimeSearcher. Available at http://www.cs.umd.edu/hcil/timesearcher/
My converted dataset can be found here
Figure 1: View of data in Timesearcher
After interacting with the data, I was able to confirm my hypothesis that most
returns for Individuals occur around April 15th, as seen in Figure 2 below.
However, the data indicates that refunds occur during the 2nd and 3rd quarter,
which seems to imply a correlation between those who submit taxes early and
refunds.
Figure 2: Individual tax refunds (highlighted) given during different fiscal quarters from 1987-2003.
Notice that the 2nd and 3rd quarters have significantly higher refunds than the 1st and 4th quarters
I then wanted to see how the other categories fit the above profile. I used
the “Display all variables” option of the view menu in addition to the normalized
and deviations views to get a holistic feel for the entire dataset. I discovered that
while the patterns of Corporate and ‘Estate and Gift’ taxes closely matched those
of individual taxes, and were quite steady throughout the course of the year, the
pattern for Employment and Excise taxes seemed somewhat erratic.
Figure3: Each variable is shown separately with all four quarters shown.
The data is shown in ‘Deviations’ (left) and ‘Normalized’ (right) mode to highlight differences.
whereas Individual, Excise and Estate seem to be highly correlated and regular, Employment and Estate seem to vary more sporadically
During this previous experiment, I noticed two outliers/anomalies in the
data. First, I noticed a dip in the excise refund around 1993 followed by a spike
in 1995 (see Figure 4). Although I could not find evidence to support this claim, I
suspect that this can be attributed to gas prices during the gulf war. During 1993
when the war started, gas prices were high, and taxes were increased to
dissuade people from purchasing gas. By 1995, after the situation had
stabilized, the government offered incentives (refunds), or removed taxes to
stimulate the economy and encourage the purchase of gasoline. Additionally, a
separate tax was removed from crude oil in 1995 that taxpayers might not have
been aware of when paying taxes, and hence were owed a refund.3
3
http://www.unclefed.com/Tax-Bulls/1996/ANN96-9.PDF
Figure 4: Dip in excise refund 1993 and spike in 1995
Lastly, I noticed a spike in the individual tax refund in the fourth quarter
(July-September) of 2001 (see figure 5). After investigating this outlier, I
discovered the Economic Growth and Tax Relief Reconciliation Act of 2001,4
which began in July, 2001. This directly lead to the increase in refunds during
this period.
Figure 5: Spike in 2001 4th quarter Individual refunds marks Economic Growth and Tax Relief Act of 2001
Critique of tool:
I used HCIL’s TimeSearcher tool, created by Harry Hochheiser and Ben
Shneiderman. I found most of the features to be very intuitive. Additionally, I
found TimeSearcher to have most of the features that I needed to accomplish my
tasks.
My major criticique of the tool was its data input functionality.
TimeSearcher has a custom filetype: tqd. I found conversion of my excel file to
the tqd format to be nontrivial. In fact, it took me several hours before I could use
4
http://www.infoplease.com/spot/01taxrefund.html
my data in a way that made sense. Later, when I tried to think of my data
differently, I was unable to convert the data into that new format.5
When I tried to load a malformed data file into TimeSearcher, the
application silently failed. I had no clue as to where the problem was located or
what the cause was. Some sort of response from the application about the
failure would be very beneficial to users.
Figure 6: In ‘display all variables’ mode,
The user is unable to select a specific query variable from the lower left panel.
Another feature that was missing from the application was the ability to
select a query variable from lower left panel when in “Display all variables” mode.
This is somewhat frustrating, since this mode allows the user to discriminate
details in specific variables by comparing it to other variables, but is not easily
able to tell which variable is the different one.
Conclusion:
TimeSearcher was an invaluable tool in deciphering this cryptic IRS
dataset. It enabled me to observe how outlying data-points related to historical
events influenced the refund patterns.
Having discussed the benefit of TimeSearcher, I also mentioned some
areas of possible improvement, most noticeably, data input.
5
Rather than thinking of the quarters as my dynamic data, I wanted to use the various refund types as my
query variables, and have quarters be the time points.
Download