Identifying and Understanding Dates and Times in Email Mia K. Stern Collaborative User Experience Group IBM Research 1 Rogers Street Cambridge, MA 02142 mia_stern@us.ibm.com ABSTRACT Email is one of the “killer applications” of the internet, yet it is a mixed blessing for many users. While it is vital for communication, email has become the repository for much of a user’s important information, thus overloading email with other responsibilities. One way in which users overload their email is by using it as a calendar and a todo list. People keep in their email reminders of meetings, events, and things to do by some deadline, all of which contain dates and times. Unfortunately, these emails can get lost amongst all the other emails. Because of the numerous ways dates and times can be expressed in written language, traditional searches are often not effective. In this paper, we discuss a technique to help users extract such calendar and deadline information from their emails by identifying dates and times within an email message. We believe identifying dates and times will help users organize their schedules better and find lost information more easily. In this paper, we discuss our approaches for this problem. We discuss syntactic methods used to find dates and times and semantic methods to understand them. We then present the results of two user studies conducted to determine the accuracy of our technique. Keywords Date extraction, Time extraction, Email, Evaluation INTRODUCTION Increasingly, people are using their email inboxes as a way of organizing their lives (Duchenaut & Bellotti, 2001)(Whittaker & Sidner, 1996). Inboxes are no longer simply repositories of incoming mail; they are where the details of peoples' lives reside. People use their inboxes as calendars, to-do lists, and address books, among other things. People keep documents in their inboxes because those documents have pieces of information that they do not want to delete. They also save documents to keep the information readily accessible. Unfortunately, the more documents that accumulate in the inbox, the harder it is to manage the information that is contained there. While email documents are semi-structured in that they contain well-defined fields, the bodies and the subjects are unstructured. These unstructured parts may contain some information that can help the user organize important information better, such as names of people and companies, URLs, phone numbers, and places where meetings take place. Email documents also contain dates and times that allow users to use their inboxes as calendars and to-do lists. In this paper, we focus on extracting this date and time information from email documents to make some user's tasks easier. We have developed an add-on to email (currently implemented in Lotus Notes) that can help users keep track of items in their inboxes by taking advantage of the semi-structured information provided by dates and times. This system identifies date and time phrases that appear in the bodies and subjects of email messages, and interprets these phrases into a canonical calendar format. There are a number of potential applications for this technology. For example, the system can help the user more easily make calendar entries (such as appointments and meetings) and to-do items. When the user wants to make such an entry, she can choose from the dates and times found and the entry will be made for that selection. Nardi, et.al. (Nardi, et al., 1998) present a similar idea to this, but in their system, the user must first select the text which contains the date to be parsed. Our system also allows this functionality, but it can also find all dates and times in an email message without requiring the intervention of the user. This technology can also be used for smart reminders, indicating messages with an approaching due date, or even messages that have “expired.” Furthermore, users can search through their email for a date, regardless of its textual format. This paper focuses primarily on the accuracy of the underlying text extraction techniques that will support a range of such applications. The rest of the document is organized as follows. We begin by discussing our technique for locating dates and times within an email message. We then describe the methods for semantically processing a date or time. The results of our user studies are then discussed. We conclude with some thoughts and future work. FINDING DATES AND TIMES The goal of this project is to identify and understand dates and times that appear within email messages. Although there has been previous work on identifying dates/times in standard corpora (Mani et al., 2001)(Grover et al., 2000)(Message understanding conference), in historical documents (Mckay, 2001), and in scheduling dialogs (Wiebe et al., 1998), there has not been a focus on the kinds of dates/times that appear in email messages. The Selection Recognition Agent (Pandit and Kalbag, 1997), an application-independent feature recognizer, could be applied to email, although that was not the main focus of that work. The purpose of this testing was to build a lightweight date/time extractor to start building applications specific to email. The first step was constructing a grammar that would detect date and time phrases in email messages. For this, we constructed regular expressions to find the dates and times, since regular expressions are a simple way to represent most of the date and time expressions we have discovered in email messages. Some of the regular expressions we are using can be found in Figure 1. The seventh regular expression, MONTH_DAY_YEAR, can identify, for example, January 1st, 2002. The last expression, TIME_AMPM, can identify times such as 10:03 a.m. SHORTMONTH = (Jan|Febr?|Mar|Apr|May|Jun|Jul|Aug|Sept?| Oct|Nov|Dec)\.? LONGMONTH = January|February|March|April|May|June|July| anticipate the needs of a set of meaningful email and calendar applications. These included figuring out when to present a date to the user as a single date and when as a range of dates, and how to define certain ranges. Single dates, occur on only one day whereas date ranges cover multiple days. We have determined a similar classification for times, where single times refer to a moment on a calendar and time ranges have distinct start and end times. Furthermore, both dates and times can be classified as explicit or inexact. Exact dates and times are those that are explicitly spelled out, while inexact dates and times are more vague in their references to a calendar. Inexact dates are typically in reference to a known date, and inexact times have a more fuzzy start and end than do explicit times. Table 1 shows some examples of dates and times that fall under these categories. Our use of regular expressions has some limitations. While regular expressions can be very fast and reasonably accurate for date and time detection, they are not sufficient for finding all dates and times, as there are a large variety of formats for those expressions. Similarly, they are not sufficient for linking a related date and time that are split by extraneous text. For example, in the phrase January 24, 2002, 1 Rogers Street Room 5003, 12:00-1:30pm, it is very hard to link January 24, 2002 with 12:00-1:30pm. Our goal, however, is not perfection with regards to feature detection, since heuristics are by their nature imperfect. Rather, we are hoping to be “close enough” so that users will find this technology useful and usable. Explicit August|September|October|November|De cember MONTH = SHORTMONTH | LONGMONTH DAY = [0-2]?[0-9]|3[0-1] YEAR = \d{4} SUFFIXES = st|rd|th|nd MONTHDAYYEAR = MONTH\s+DAY+SUFFIXES?,?\s+YEAR HOUR_12_RE = 0?[1-9]|1[0-2] HOUR_24_RE = [0-1]?[0-9]|2[0-3] MINUTE_RE = [0-5][0-9] AM_RE = (a|(\s*am)|(\s*a\.m\.)) PM_RE = (p|(\s*pm)|(\s*p\.m\.)) AM_OR_PM_RE = AM_RE | PM_RE TIME_AMPM = HOUR_12_RE(\s*:\s*MINUTE_RE)?\s*:\s* MINUTE_RE AM_OR_PM_RE Figure 1: Example regular expressions One of the most challenging parts of this project was defining qualities of dates and times to test for that would Dates Inexact Single Range 8/16/2002 August 8-12, 2002 4 August 2002 June 12 – July 5 Tomorrow Next week Next Thursday August Explicit 10am Inexact At 4 Times 10am – noon from 3 to 4pm Morning lunchtime Table 1: Examples of different kinds of dates and times Understanding dates and times Once a date/time has been located, it must be semantically parsed so that it can be used by the applications we have mentioned. To do this, the system must convert each date/time found into a canonical format. In this section, we discuss the heuristics we use for this semantic analysis. Kind of expression Heuristic assumption Example Date/ time interpretation for message received on January 28, 2003 No year given Check verb tense, if future assume within next 12 months We will meet on February 4 February 4, 2003 This <day> Before the end of this week This Thursday January 30, 2003 Next <day> During the week that starts on the upcoming Sunday Next Thursday February 6, 2003 Last <day> During the week that ended on the previous Saturday Last Thursday January 23, 2003 No a.m. or p.m. Assume during normal business hours Let’s meet at 1 January 28, 2003 at 1pm Table 2: Some heuristics used for filling in missing fields Missing fields example, if today is Monday, and the document mentions Dates and times that are fully specified are easy to convert into this canonical format. A fully specified date is one that has its calendar date(s) and time(s) explicitly stated. An example of a fully specified date/time is Thursday, March 28, 2002 from 1:00pm to 2:00pm. These kinds of dates tend to appear in formal meeting announcements or talk announcements. next Thursday, is the Thursday in question the day 3 However, many emails, especially those written using a more “conversational” tone, do not contain such formal dates and times. Dates and times occurring in email messages are more informal, often assuming human readers can fill in the unspecified portions. For example, people can easily process and understand Let's meet at 4 on Thursday by using their background common sense knowledge and experience. In our system, heuristics are needed to fill in the missing fields of both dates and times. For example, when writing times, people will frequently omit a time of day indication (a.m. or p.m.). If this omission occurs within email, we assume the hour referred to occurs during the regular business day, i.e. 7am through 6pm. People will also write times without associated dates, such as Let's meet at 10am. In this case, we assume the date to be the reference point date (described in the next section). Inexact date phrases and time-only phrases need a reference point from which the date for the phrase can be calculated. Some systems, such as LookOut (Horvitz, 1999), use the date the message was sent as this reference point. However, this is not always correct. For example, in Figure 2, there are two instances of the word “tomorrow.” If we were to use the date the email was sent as the reference point, the second “tomorrow” would be interpreted incorrectly. With dates, different details can be left unspecified. A very commonly unspecified detail for a date is the year, e.g. March 29. In this case, we assume the date to be within the next twelve months of the reference point date (March 29 would be interpreted as March 29, 2003). The header of the message starts the first header block, and subsequent headers found within the body start their own header block. Each block is terminated by the subsequent header. Each header contains the date indicating when that part of the document was sent. The system uses this date as the reference point for any dates or times found within the header block. Inexact dates frequently need many of the date/time fields filled in. These kinds of dates can be confusing to interpret, for human readers as well as computers. For days from the reference date or 10 days from that date? We are making the assumption that “next” anything is within the week that starts from the upcoming Sunday and ends the following Saturday. A sampling of some of the heuristics used can be found in Table 2. Setting reference point dates for inexact dates Email messages, however, provide additional clues as to what these reference points should be, namely those dates which can be found in headers within the body of an email message. Messages which are replies or forwards frequently contain these kinds of textual headers. Reference points are determined by treating the email as a series of headers and bodies, with each header and body making a header block. Using this new heuristic, both of the instances of “tomorrow” in Figure 2 would be interpreted correctly. The first is interpreted as May 18, 2002, while the second is interpreted as May 15, 2002. F ro m : M ia S te r n 0 5 /1 7 /2 0 0 2 0 3 :1 2 P M T o : D erek La m cc: S u b je c t: R e : M e e tin g S o rry I c o u ld n 't m a k e it th e n . Ho w a b o u t to m o r ro w in s te a d ? - M ia F ro m : D e re k L a m 0 5 /1 4 /2 0 0 2 0 1 :3 8 P M T o : M ia S te rn cc: S u b je c t: M e e tin g C a n y o u m a k e a m e e tin g to m o r ro w? D e re k Figure 2: Why headers are used for reference point dates, rather than using the sent date of the message Part of speech tagging We use another heuristic technique for determining the correct date being referred to in a document. We cannot always assume that dates are in the future. If we make such an assumption, we cannot process phrases like As we said on Thursday and accurately determine the correct date. Therefore, we use part of speech tagging (Brill, 1992) to determine if the date is in the past or the future. We look at the verb closest to the match, and if it is in the past tense, we assume the date is also in the past. A similar approach to using parts of speech is presented in (Mani & Wilson, 2001). F ro m : M ia S te rn 0 1 /1 5 /2 0 0 3 0 2 :4 3 P M T o : D a n ie l G ru e n cc: S u b je c t: w e b s ite Hi DanIn th e m e e tin g w e h a d o n M o n d a y , w e ta lk e d a b o u t s e ttin g u p a n e w w e b s ite . C a n w e m e e t o n F rid a y to ta lk a b o u t th is m o re ? T hanks, - M ia Figure 3: Using part of speech tagging for disambiguating the meaning of a date phrase. For example, consider the message given in Figure 3. Without part of speech tagging, Monday would have been interpreted as Monday, January 20, 2003. However, since the closest verb to that phrase is in the past tense (in this case “had”), the phrase is interpreted correctly as Monday, January 13, 2003. Similarly, since the verb closes to Friday, “meet”, is not in the past tense, that phrase is interpreted to mean Friday, January 17, 2003. Ambiguities There is another difficulty with the semantic processing of dates and times. There are ambiguities that arise with dates and times that can be difficult to resolve without context. For example, is Thursday, 7-10 a date, as in July 10th, or is it a time, from 7pm to 10pm (or 7am – 10am)? Our current method is to assume in this case a time, rather than a date. In the ideal case, both options and context surrounding the match would be presented to the user so she could disambiguate more easily. EVALUATION Before exploring whether any applications using this date/time understanding technology would be viable, we wanted to investigate whether the technology could accurately identify and interpret the specific dates and times that appear in email messages. We also wanted to discover if there were any differences in how users rate the dates and times that were found and how they were interpreted. Methodology We conducted a user study with 9 participants in which each user processed about 20 of her own emails. Each user was presented with her email one message at a time, and for each message, she is asked about the dates and times that were found. Each date/time found is presented one at a time and a series of questions appropriate to the kind of date/time is asked. The first question asked is obviously, “Is this phrase a date/time related phrase?” If the user answers “yes”, she is then asked if the date is classified correctly as a single date or a date range, if that classification is correct, and if the interpreted date(s) is correct? The user is then asked about the time classification and the time interpretation. We present our results in terms of precision and recall. Precision is the number of date/time phrases correctly processed in category x divided by the total number processed in category x (these can also be called false positives). Recall is the number of date/time phrases correctly processed in category x divided by the number of things that really should have been processed in category x. See Table 3 for an illustration of how precision and recall are calculated. Is this a date/time related phrase User Identifies as Date/Time User Rejects as Data/Time System Proposes as Date/Time System Does not Propose A B (correct) (misses) dates as date ranges, indicating that perhaps single dates are a more important focus for this kind of work. Recall = A/(A+B) C (false positives) Precision = A/(A+C) Table 3: How precision and recall are calculated. The grayed out box cannot be calculated. Results We collected 150 email messages from the 9 users in our study. Our user population consisted of interns, developers, researchers, and one executive. Of the 150 collected email messages, 78% had dates and/or times. Eleven of the 117 messages contained only machine generated dates (e.g. dates that appear within header blocks). Within these 117 messages, our system proposed 593 date/time phrases, of which 546 were actually date/time phrases. Thus our precision is 92.07%. Our system missed an additional 39 dates/time phases that users identified, giving us a recall value of 93.33%.1 These are just broad claims, however, indicating whether the system is focusing on the right types of phrases. To be effective, the system must also correctly classify the type of date/time phrase and accurately interpret the date or time found. In the sections that follow, we delve into more detail on how well the system classified and interpreted single dates and date ranges and then how well it did on both single times and time ranges. Dates Once a phrase has been identified as a date/time phrase, it must be classified as either a single date or a date range. On single date identification, the system achieved a precision of 89.65% and a recall of 92.92%. With date range identification, on the other hand, the precision is only 74.82% while the recall is only 83.2%. Clearly the system does not perform as well on identifying date ranges. However, there are almost 4 times as many single 1 It is possible that the documents we collected are skewed in the number dates and times, since users analyzed only those documents they wished to share; we do not know the proportion of date and time phrases in documents users did not allow us to see. Table 4 illustrates how well our system performs on both classifying and interpreting single dates. We have broken down the kinds of date phrases into whether they were an explicit date, an inexact date (e.g. tomorrow, next Tuesday, or this morning), a time without a date (e.g. 1pm or from 1-2pm), and all others. The rows in the table indicate the user responses, including whether the date was classified and interpreted correctly, whether it was either misclassified (the system claimed the phrase was a single date when it was not) or interpreted incorrectly (the system knew it was a single date but it got the date wrong), or whether it was not really a date at all. Overall, the system achieved 81.06% precision on interpreting single dates and 84.02% recall (the system missed 28 single dates and claimed that 3 actual single dates were date ranges) on locating and interpreting single dates. We expected the system to be able to interpret explicit dates relatively well but to have difficulties with other kinds of dates. As anticipated, the system had more difficulty classifying and interpreting inexact dates than explicit dates, since it needed to infer the date from context rather than just parse the phrase. It also had some difficulty associating the correct date with phrases which contained only a time. One possible reason for this difficulty is due to the heuristic we are using for filling in the missing date in those cases. We are currently using the reference point date of the header block, which is frequently not the correct date. However, if we change our heuristic to use the closest date to the time phrase in the sentence, we feel we can improve the accuracy on those phrases. In Table 5, we see how well our system does on identifying and understanding date ranges. Similar to our analysis of single dates, we have broken down date ranges into various kinds of ranges, including explicit ranges (e.g. November 6-8), months with years (e.g. June 2003), months without years (e.g. June), years or year ranges (e.g. 2002 or 2002-2003), and inexact date ranges (e.g. this week or next week). The rows are the same as in Table 4. Overall the precision for interpreting date ranges is 69.06% and the recall is 82.05% (the system missed 10 ranges and claimed 11 actual ranges were single dates). It is interesting to note how many kinds of date ranges there were, with most of them being somewhat ill defined, and only 5 explicit date ranges. There were a large number of inexact ranges, and the system did relatively well on those kinds of dates. The largest number of date ranges were years or year ranges. However, one reason the system performed poorly in general on date ranges is the number of phrases the system classified as years that users did not consider date ranges at all. One reason for this is our regular expression for detecting years is overly general; it detects every four digit number. By limiting the range for years to between 1900 and 2099, we should Extraction process: Single dates Specific date Inexact date Time only Other total 196 127 44 1 368 (95.6%) (74.27%) (64.71%) (10%) (81.06%) Incorrectly classified or interpreted 7 35 18 9 38 Not a date phrase 2 9 6 0 17 total 205 171 68 10 454 Correctly classified and interpreted Table 4: System's performance on identifying and interpreting single dates. The percentages given are precision values, calculated by dividing the value in the cell by the total for that column. Extraction process: Date ranges Explicit date range Month w/ year Month w/o year Year / year ranges Inexact date range Other Total 5 11 9 44 26 1 96 (100%) (91.67%) (81.82%) (57.14%) (78.79%) (100%) (69.06%) Incorrectly classified or interpreted 0 1 0 5 7 0 13 Not a date phrase 0 0 2 28 0 0 30 Total 5 12 11 77 33 1 139 Correctly classified and interpreted Table 5: System's performance on identifying and interpreting date ranges. The percentages given are precision values, calculated by dividing the value in the cell by the total for that column. be able to reduce the number of false positive year detections. By making this change, we would have avoided 21 false positive instances, increasing precision on date ranges to 81.36%, up from 69.06%. Overall, on dates, the system achieved a precision value of 78.25% and a recall result of 79.32%. We believe that with some simple changes to some of our heuristics, we can potentially improve these results. Times In addition to dates, we also investigated how well the system could find and interpret times. Table 6 shows the system’s results on finding times associated with dates. An interesting thing to note from this table is the frequency with which the system missed times. Many of these missed times occur when a date and time appear within the same sentence but are located far apart, similar to the problem we discussed with incorrect dates being associated with time only phrases. For example, let’s tomorrow, say around 1pm contains two date/time phrases, tomorrow and 1pm. Our system misses the time for tomorrow and assigns the incorrect date to 1pm. By using the heuristic discussed in the last section, we hope to alleviate both problems. meet For single times, the system was very accurate (precision of 91.5% and recall of 80.63%). For these times, there were 112 instances when the time was given with either a.m. or p.m. The system correctly interpreted all 112. For the 17 instances in which no a.m. or p.m. was given, the system interpreted all but 2 correctly. In both cases, the system mistook a day for a time (e.g. in 01 JUL 02, it interpreted the time as 1pm). With time ranges, the system achieved a precision of 75.36% but a recall of only 71.25%. However, of the time ranges the system found, it correctly interpreted all 40 explicit ranges. For inexact ranges (such as morning or afternoon), the system correctly interpreted only 12 out of 26 cases. We believe this is because our definitions of Extraction process: time No Correctly classified and interpreted Incorrectly classified and/or interpreted time Single time Time range Total 171 129 52 352 (83%) (91.5%) (75.4%) (84.6%) 35 10 54 3 7 10 1.5 1.0 0.0 USER1 206 141 69 Individual differences The results reported thus far are from pooling the data from all 9 participants. However, this kind of analysis does not help us determine if there are any individual differences in how users interpret dates and times. To determine if there are such differences, we performed an analysis on the accuracy of the date detection and understanding. We calculated the log odds2 of the precision for each document a user evaluated, and then calculated the mean of these values and the standard error of the mean for each person and their documents (see Figure 4). We can see from this analysis that there no detectable difference between the means for the individuals and the mean for the population. However, based on discussions with users, we are inclined to believe there are some individual differences that cannot be detected statistically. For example, some participants did not agree with our specifications for the beginning and end of a week (in phrases such as this week or next week). Three participants consistently challenged our definition of when a week starts and ends, while the other six agreed with our specifications. With respect to time, the only occasions on which there were some disagreements were in how times of day should be interpreted, such as morning, afternoon, and Log odds is calculated using the following formula: numcorrect + 0.5 The 0.5 correction is a ln( ). numfalsepositive + 0.5 Bayesian technique that may be used when the actual number of observations is small, e.g., one of the numbers might be zero. USER3 USER2 416 Table 6: Results for finding times. 2 2.0 .5 Not a time phrase Total 9 evening or night. Two participants disagreed about the start and end time for afternoon while one participant disagreed about morning. One possible approach to this is to model individual user preferences on these time ranges. Mean +- 1 SE when these inexact ranges begin and end do not agree with users’ opinions on these ranges. USER5 USER4 USER7 USER6 USER9 USER8 Figure 4: Comparing means of log odds between users. The dotted line shows the mean for the whole population and the surrounding box is one standard error of the mean. Second user study We took the results from the first user study and analyzed where we could make improvements in our algorithm. There appeared to be some clear areas where we could make improvements without significantly changing our strategy. Some of the improvements were designed to increase precision while others should help improve recall. We made these changes to the algorithm and ran a second user study to determine if the results in fact improved. Rather than running our algorithm over the data we collected in the first user study, we decided to run a second study to determine if our results would generalize over a different data set. Improvements One improvement to increase precision is to restrict our definition of a year. In the first user study, we were detecting all 4 digit numbers as years. This led to a large number of false positive finds. To reduce the number of false positives, we are now only recognizing years between 1900 and 2099. A second improvement to increase precision is a heuristic to link dates and times together. During the first user study, the system frequently did not find times associated with some dates when the participant indicated that there was in fact a time. Similarly, on many phrases that consisted of only a time or a time range, the system assumed an inappropriate date for that phrase. To fix this problem, the system now looks within a sentence to see if there is a time to associate with a date or a date to associate with a time. However, if there is more than one Extraction process: single dates (second study) Specific date Inexact date Time only Total 193 128 26 347 (88.53%) (89.51%) (26.53%) (75.60%) Incorrectly classified or interpreted 19 12 25 56 Not a date phrase 6 3 47 56 Total 218 143 98 459 Correctly classified and interpreted Table 7: Single date results from the second user study. Extraction process: date ranges (second study) Explicit date range Month w/o year Year or year range Inexact date range Deadline date Total 3 9 24 22 42 100 (100%) (69.23%) (40%) (91.67%) (85.71%) (67.11%) Incorrectly classified or interpreted 0 2 5 2 7 16 Not a date phrase 0 2 31 0 0 33 Total 3 13 60 24 49 149 Correctly classified and interpreted Table 8: Date range results from second user study. Extraction process: time (second study) Correctly classified and interpreted Incorrectly classified or interpreted No time Single time Time range Deadline times Total 185 129 60 46 420 (93.91%) (96.27%) (57.14%) (92%) (86.42%) 12 3 2 3 20 2 43 1 46 134 105 50 486 Not a time phrase Total 197 Table 9: Time results from the second user study. time in a sentence, the system will not choose one to match with a date, and vice versa. To improve recall, we are now also detecting date and time ranges we are calling deadline dates and times. These phrases are distinguished by including keywords such as “by”, “until”, or even “through”. Previously, we were finding the dates and times in these phrases, but we were not recognizing that those dates and times indicated a range, rather than a single date or time. For example, in the phrase “The paper is due by October 14”, we would find October 14, but we would not recognize that this date represented a deadline. Users indicated that the whole phrase should be identified as a date range, starting when the email was received and ending at the date in the phrase. The last improvement we made, also to improve recall, is to expand our notion of time ranges by detecting mealtimes, such as “lunch(time)” or “dinner”. We are also now detecting the phrases “AM” and “PM” as time ranges. A number of users indicated during the first user study that these were phrases that represented times. Results CONCLUSIONS For the second user study, we had 7 users evaluate a total of 162 documents, 115 of which had dates. Our system detected 608 date/time phrases, 519 of which were actually date/times (precision = 85.36%). The system missed only 19 date/time phrases (recall = 96.47%). In this paper, we have presented a technique to detect and interpret dates and times within email. We have attempted to determine the accuracy of our technique through two user studies. Our overall results for the second user study are fairly similar to the first user study (see Tables 8-10), with the only significant difference on precision being a lower precision in the second user study on single dates (t=2 on an independent samples t-test, p < 0.05). For recall, we increased our results significantly for single dates (t=3.874, p < 0.005) and for times (t=3.566, p < 0.005). What we are really interested in discovering, however, is the effects our specific changes made in these results. The first improvement we made, reducing the range of years detected as dates, lowered the number of false positives. In the first user study, 59.6% of false positive date detections were years, while in the second user study, only 35.2% of false positives were years. The second improvement we tried is to link dates and times that appear in the same sentence. Unfortunately, our heuristic did not apply in any instances in this dataset. However, there were still 23 cases in which a phrase contained only a time and the system could not correctly calculate the date. Similarly, there were 12 date phrases for which the system found no time but users indicated there was an associated time. In some of these cases, the dates and times were in adjacent sentences; in others, there were multiple times and/or multiple dates, which we specifically chose not to match up. Clearly we need another heuristic to help in these cases. Our third heuristic improvement was to look for “deadline” dates and times. These are phrases that start with “through”, “until”, or “by”. Our system found 50 deadline dates and times, only one of which was considered not a date phrase. The system achieved a precision of 84% identifying and interpreting these phrases. Thus it appears that adding deadline dates and times provides a good improvement to our algorithm. Finally, our last improvement, expanding our definition of time ranges, appeared to have a large negative impact on our results. Only 2 out of 27 instances (0.07%) where “AM” and “PM” were detected were considered to be legitimate time phrases, whereas 28.4% (25 out of 88) of our false positives involved these phrases. With respect to meal times, 9 instances were considered not to be date phrases and 9 instances were considered to be date phrases. Therefore, it seems in these cases, the false positive rate is too high to provide significant benefit from identifying these time ranges We have demonstrated that we can achieve a fairly reasonable accuracy (about 80%) on finding and interpreting dates and times that appear in email messages. We can detect and understand not only standard dates, such as “July 1, 2003”, but also inexact dates and times, such as “next Tuesday morning”. We are beginning to explore the use of this technology in some applications, including smart calendar entries and smart reminders. Additionally, we have built an application that lets users search for dates, regardless of the format of the search query or of the dates that are contained in the documents. However, we must determine if the accuracy we have achieved is “good enough” for these applications. There are certain classes of dates and times that we are currently unable to detect and understand: repeated dates. For example, Let’s meet every Thursday at 10 indicates that a meeting will occur every Thursday at 10am. Currently, however, we do not recognize that the word “every” indicates a repeated event, not a one time occurrence. We plan to include these kinds of dates and times in our next version. ACKNOWLEDGEMENTS I would like to thank John Patterson and Daniel Gruen for their help in designing the study and in analyzing the data. I would also like to thank all the study participants for their time and patience. REFERENCES Brill, E., A simple rule-based part of speech tagger. In Proceedings of Third Conference on Applied Natural Language Processing, Trento, Italy, 1992. Duchenaut, N. and Bellotti, V., Email as habitat: An exploration of embedded personal information management. ACM Interactions, 8(1):30-38, SeptemberOctober 2001. Grover, C., Matheson, C., Mikheev, A., and Moens, M., LT TTT – A Flexible Tokenisation Tool. In Proceedings of Second International Conference on Language Resources and Evaluation, 2000. Horvitz, E., Principles of Mixed-Initiative User Interfaces. In Proceedings of CHI '99, ACM SIGCHI Conference on Human Factors in Computing Systems, 159-166, 1999. Mani, I., Ferro, L., Sundheim, B., and Wilson, G., Guidelines for Annotating Temporal Information. In Proceedings of the Human Language Technology Conference, 2001. Mani, I. and Wilson, G., Robust temporal processing of news, in Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2001. Mckay. D., Mining dates in historical documents. In Fourth New Zealand Computer Science Research Students Conference, 2001. Message understanding conference. www.itl.nist.gov/iaui/894.02/related_projects/muc Nardi, B., Miller, J., and Wright, D., Collaborative programmable intelligent agents. Communications of the ACM, 41(3):96-104, March 1998. Pandit, M. and Kalbag, S., The Selection Recognition Agent: Instant access to relevant information and operations. In Proceedings of Intelligent User Interfaces, pages 47-52, ACM, 1997. Whittaker, S. and Sidner, C., Email overload: Exploring personal information management of email. In Conference Proceedings on Human Factors in Computing Systems, 276-283, 1996. Wiebe, J., O'Hara, T., Ohrstrom-Sandgren, T. and McKeever, K. An Empirical Approach to Temporal Reference Resolution. Journal of Artificial Intelligence Research, 9, 247-293, 1998.