SUTime

advertisement
SUTime
JavaNLP time annotations
What does SUTime do?
 Similar to GUTime
 Recognizes time expressions using patterns
 Deterministic, based on regular expression patterns
 Greedy (picks longest sequence of tokens that may
represent a time expression)
 Normalizes time expressions
 Annotations follow TimeML TIMEX3 standard
 http://www.timeml.org/site/publications/timeMLdocs/timeml_1.2.1.html#timex3
 XSD: http://www.timeml.org/timeMLdocs/TimeML.xsd
 Extensions for time expressions that are not supported
by TIMEX3 standard
 Resolves relative times with respect to reference date
SUTime Time Representation
 Main Temporal types
 Time – A instance in time (2011-08-11), can be partially
specified (Friday), with limited granularity
 Duration - A length of time (3 days)
 Range – Time interval with start and end points
 Set – A set of temporals
 Periodic sets: Every Friday
SUTime Representation
 Time
 Standard date and times (in years, months, days, day of
week, hours, minutes, seconds, milliseconds)
 Common times: Seasons (e.g. winter), Time of day (e.g.
morning), Weekend
 Partial Times (June => XXXX-06)
 Relative Time (last week)
 Duration
 Exact durations (specified in milliseconds or in fields)
 Inexact durations (a few years => PXY)
 Duration ranges (2 to 3 months => P2M/P3M)
SUTime Limitations
 Holidays are not supported
 Support for ranges is poor
 from 3 to 4 p.m is identified as 15:57:00
 12-13 March 2011 (12-13 is ignored)
 Resolving relative expressions with respect to the
given reference date can be problematic
 Handling of ambiguous phrases is poor
 Some common words (e.g. spring/fall) are always
identified as a temporal expression
 Patterns are language (English) specific
 …
SUTime Usage
 TimeAnnotator
 TimeAnnotator timeAnnotator = new
TimeAnnotator(“sutime”, properties);
 Properties:
 Specifies SUTime options (prefixed by “sutime.”)
 Pipeline
 TimeAnnotator should come after the tokenizer,
sentence splitter, and pos tagger
 Optional (also before): NER or
NumberAnnotator/QuantifiableEntityNormalizingAnnota
tor
SUTime Options
Property
Description
sutime.
markTimeRanges
Whether time ranges should be marked (e.g. if markTimeRanges is
true, July to August => range). Default = false.
sutime.
includeNested
Whether nested time expressions should be included (e.g. if
markTimeRanges is true, July to August => range, if includeNested
is true, both July and August will also be marked as time
expressions). Default = false.
sutime.
teRelHeurLevel
Heuristics for determining how to resolve relative time
NONE = no heuristics (default)
(refdate = 2011-08-01, Friday => 2011-08-05)
BASIC = basic heuristics taking into account past tense
(refdate = 2011-08-01, It happened Friday => 2011-07-29)
MORE = more heuristics with since/until
sutime.
includeRange
Whether range attributes should be included in the TIMEX3 XML
output. Default = false.
SUTime input annotations
 DocDateAnnotation (String)
 If present, then the string is interpreted as a date/time
and used as the reference document date with respect
to which other temporal expressions are resolved
 SentencesAnnotation (List<CoreMap>)
 If present, time expressions will be extracted from each
sentence and each sentence will be annotated
individually.
 TokensAnnotations (List<CoreLabel>)
 Required either at the entire annotation level or per
sentence level.
SUTime output annotations
 Timex.Annotations (List<CoreMap>)
 List of time expressions (each a CoreMap)
 On the entire annotation and also for each sentence
 Time annotations (for each time expression/CoreMap)
Annotation
Description
Timex.Annotation
Timex object with TIMEX3 XML attributes.
Use for exporting TIMEX3 information.
TimeExpression.Annotation
TimeExpression object. Use getTemporal() to
get internal temporal representation.
TimeExpression.ChildrenAnnotation
(List<CoreMap>)
List of chunks forming this time expression
(inner chunks can be tokens, nested time
expressions, numeric expressions, etc)
SUTime output annotations
 Standard annotations (for each time expression)
Annotation
Description
TextAnnotation (String)
Text of this time expression.
TokensAnnotation
(List<CoreLabel>)
Tokens that make up this time expression.
CharacterOffsetBeginAnnotation
(Integer)
The index of the first character of this time
expression.
CharacterOffsetEndAnnotation
(Integer)
The index of the first character after this time
expression.
TokenBeginAnnotation (Integer)
The index of the first token of this time expression.
TokenEndAnnotation (Integer)
The index of the first token after this time expression.
Note: Indices are 0-based, and always relative to the original annotation.
Begin indices are inclusive, end indices are exclusive.
Comparison with GUTime
SUTime
GUTime
Language
Java
Perl
Timex
TIMEX3 with extensions
TIMEX3 tag, but follows ACE TIMEX2
mostly (extension of TempEx)
Demo
http://nlp.stanford.edu:8080/sutime
http://nlp.stanford.edu:8080/gutime
Comments
No support for holidays. Limited
Some support for holidays. No support
support for ranges, ambiguous phrases. for ranges, poor support for years that
are written out.
TempEval2
(English
Test)
Time Expression Identification:
P=0.89, R=0.94, F1=0.91
Attribute Accurate:
Type=0.94, Value=0.72
Time Expression Identification:
P=0.89, R=0.79, F1=0.84
Attribute Accurate:
Type=0.95, Value=0.68
SUTime and GUTime examples
Type
SUTime
GUTime
Date
<TIMEX3 tid="t1" value="1963-10"
type="DATE">October of
1963</TIMEX3>
<TIMEX3 tid="t1" TYPE="DATE"
VAL="196310">October of
1963</TIMEX3>
Duration
<TIMEX3 tid="t1"
TYPE="DURATION"
VAL="P56Y">fifty six
years</TIMEX3>
<TIMEX3 tid="t1"
TYPE="DURATION"
VAL="P56Y">fifty six
years</TIMEX3>
Set
<TIMEX3 tid="t1" value="XXXXWXX-7" type="SET" quant="every
third" periodicity="P3W">Every third
Sunday</TIMEX3>
<TIMEX3 tid="t1" TYPE="DATE"
SET="YES" VAL="XXXXWXX-0"
PERIODICITY="F3W"
GRANULARITY="G1D">Every third
Sunday</TIMEX3>
Examples (GUTime unsupported)
Type
SUTime
GUTime
Time
<TIMEX3 tid="t1" value="2011-0801T17:05:00" type="TIME">5:05 in the
afternoon</TIMEX3>
5:05 in the afternoon
<TIMEX3 tid="t1" value="1994-WI"
type="DATE">winter of nineteen ninety
four</TIMEX3>
<TIMEX3 tid="t1"
TYPE="DATE">winter</TIMEX
3> of nineteen ninety four
Date - Written
out year
Duration Range <TIMEX3 tid="t1"
alt_value="P2M/P3M"
type="DURATION">two to three
months</TIMEX3>
Reference Date is 2011-08-01
two to three months
Examples (SUTime unsupported)
Type
SUTime
GUTime
Holiday
last Christmas
<TIMEX3 tid="t1"
TYPE="DATE"
ALT_VAL="20101225">last
Christmas</TIMEX3>
Ambiguous
words
The <TIMEX3 tid="t1"
value="2011-SP"
type="DATE">spring</TIMEX3>
water was cool and refreshing
The <TIMEX3 tid="t1"
TYPE="DATE">spring</TIM
EX3> water was cool and
refreshing.
Reference Date is 2011-08-01
Download