SUTime JavaNLP time annotations What does SUTime do? Similar to GUTime Recognizes time expressions using patterns Deterministic, based on regular expression patterns Greedy (picks longest sequence of tokens that may represent a time expression) Normalizes time expressions Annotations follow TimeML TIMEX3 standard http://www.timeml.org/site/publications/timeMLdocs/timeml_1.2.1.html#timex3 XSD: http://www.timeml.org/timeMLdocs/TimeML.xsd Extensions for time expressions that are not supported by TIMEX3 standard Resolves relative times with respect to reference date SUTime Time Representation Main Temporal types Time – A instance in time (2011-08-11), can be partially specified (Friday), with limited granularity Duration - A length of time (3 days) Range – Time interval with start and end points Set – A set of temporals Periodic sets: Every Friday SUTime Representation Time Standard date and times (in years, months, days, day of week, hours, minutes, seconds, milliseconds) Common times: Seasons (e.g. winter), Time of day (e.g. morning), Weekend Partial Times (June => XXXX-06) Relative Time (last week) Duration Exact durations (specified in milliseconds or in fields) Inexact durations (a few years => PXY) Duration ranges (2 to 3 months => P2M/P3M) SUTime Limitations Holidays are not supported Support for ranges is poor from 3 to 4 p.m is identified as 15:57:00 12-13 March 2011 (12-13 is ignored) Resolving relative expressions with respect to the given reference date can be problematic Handling of ambiguous phrases is poor Some common words (e.g. spring/fall) are always identified as a temporal expression Patterns are language (English) specific … SUTime Usage TimeAnnotator TimeAnnotator timeAnnotator = new TimeAnnotator(“sutime”, properties); Properties: Specifies SUTime options (prefixed by “sutime.”) Pipeline TimeAnnotator should come after the tokenizer, sentence splitter, and pos tagger Optional (also before): NER or NumberAnnotator/QuantifiableEntityNormalizingAnnota tor SUTime Options Property Description sutime. markTimeRanges Whether time ranges should be marked (e.g. if markTimeRanges is true, July to August => range). Default = false. sutime. includeNested Whether nested time expressions should be included (e.g. if markTimeRanges is true, July to August => range, if includeNested is true, both July and August will also be marked as time expressions). Default = false. sutime. teRelHeurLevel Heuristics for determining how to resolve relative time NONE = no heuristics (default) (refdate = 2011-08-01, Friday => 2011-08-05) BASIC = basic heuristics taking into account past tense (refdate = 2011-08-01, It happened Friday => 2011-07-29) MORE = more heuristics with since/until sutime. includeRange Whether range attributes should be included in the TIMEX3 XML output. Default = false. SUTime input annotations DocDateAnnotation (String) If present, then the string is interpreted as a date/time and used as the reference document date with respect to which other temporal expressions are resolved SentencesAnnotation (List<CoreMap>) If present, time expressions will be extracted from each sentence and each sentence will be annotated individually. TokensAnnotations (List<CoreLabel>) Required either at the entire annotation level or per sentence level. SUTime output annotations Timex.Annotations (List<CoreMap>) List of time expressions (each a CoreMap) On the entire annotation and also for each sentence Time annotations (for each time expression/CoreMap) Annotation Description Timex.Annotation Timex object with TIMEX3 XML attributes. Use for exporting TIMEX3 information. TimeExpression.Annotation TimeExpression object. Use getTemporal() to get internal temporal representation. TimeExpression.ChildrenAnnotation (List<CoreMap>) List of chunks forming this time expression (inner chunks can be tokens, nested time expressions, numeric expressions, etc) SUTime output annotations Standard annotations (for each time expression) Annotation Description TextAnnotation (String) Text of this time expression. TokensAnnotation (List<CoreLabel>) Tokens that make up this time expression. CharacterOffsetBeginAnnotation (Integer) The index of the first character of this time expression. CharacterOffsetEndAnnotation (Integer) The index of the first character after this time expression. TokenBeginAnnotation (Integer) The index of the first token of this time expression. TokenEndAnnotation (Integer) The index of the first token after this time expression. Note: Indices are 0-based, and always relative to the original annotation. Begin indices are inclusive, end indices are exclusive. Comparison with GUTime SUTime GUTime Language Java Perl Timex TIMEX3 with extensions TIMEX3 tag, but follows ACE TIMEX2 mostly (extension of TempEx) Demo http://nlp.stanford.edu:8080/sutime http://nlp.stanford.edu:8080/gutime Comments No support for holidays. Limited Some support for holidays. No support support for ranges, ambiguous phrases. for ranges, poor support for years that are written out. TempEval2 (English Test) Time Expression Identification: P=0.89, R=0.94, F1=0.91 Attribute Accurate: Type=0.94, Value=0.72 Time Expression Identification: P=0.89, R=0.79, F1=0.84 Attribute Accurate: Type=0.95, Value=0.68 SUTime and GUTime examples Type SUTime GUTime Date <TIMEX3 tid="t1" value="1963-10" type="DATE">October of 1963</TIMEX3> <TIMEX3 tid="t1" TYPE="DATE" VAL="196310">October of 1963</TIMEX3> Duration <TIMEX3 tid="t1" TYPE="DURATION" VAL="P56Y">fifty six years</TIMEX3> <TIMEX3 tid="t1" TYPE="DURATION" VAL="P56Y">fifty six years</TIMEX3> Set <TIMEX3 tid="t1" value="XXXXWXX-7" type="SET" quant="every third" periodicity="P3W">Every third Sunday</TIMEX3> <TIMEX3 tid="t1" TYPE="DATE" SET="YES" VAL="XXXXWXX-0" PERIODICITY="F3W" GRANULARITY="G1D">Every third Sunday</TIMEX3> Examples (GUTime unsupported) Type SUTime GUTime Time <TIMEX3 tid="t1" value="2011-0801T17:05:00" type="TIME">5:05 in the afternoon</TIMEX3> 5:05 in the afternoon <TIMEX3 tid="t1" value="1994-WI" type="DATE">winter of nineteen ninety four</TIMEX3> <TIMEX3 tid="t1" TYPE="DATE">winter</TIMEX 3> of nineteen ninety four Date - Written out year Duration Range <TIMEX3 tid="t1" alt_value="P2M/P3M" type="DURATION">two to three months</TIMEX3> Reference Date is 2011-08-01 two to three months Examples (SUTime unsupported) Type SUTime GUTime Holiday last Christmas <TIMEX3 tid="t1" TYPE="DATE" ALT_VAL="20101225">last Christmas</TIMEX3> Ambiguous words The <TIMEX3 tid="t1" value="2011-SP" type="DATE">spring</TIMEX3> water was cool and refreshing The <TIMEX3 tid="t1" TYPE="DATE">spring</TIM EX3> water was cool and refreshing. Reference Date is 2011-08-01