Using Corpora

advertisement
GATE Evaluation Tools
GATE Training Course
October 2006
Kalina Bontcheva
1/(19)
System development cycle
1.
2.
3.
4.
5.
Collect corpus of texts
Annotate manually gold standard
Develop system
Evaluate performance
Go back to step 3, until desired performance is
reached
2/(19)
Corpora and System Development
• “Gold standard” data created by manual annotation
• Corpora are divided typically into a training and testing
portion
• Rules and/or learning algorithms are developed or
trained on the training part
• Tuned on the testing portion in order to optimise
– Rule priorities, rules effectiveness, etc.
– Parameters of the learning algorithm and the features
used (typical routine: 10-fold cross validation)
• Evaluation set – the best system configuration is run on
this data and the system performance is obtained
• No further tuning once evaluation set is used!
3/(19)
Some NE Annotated Corpora
• MUC-6 and MUC-7 corpora - English
• CONLL shared task corpora
http://cnts.uia.ac.be/conll2003/ner/ NEs in English and German
http://cnts.uia.ac.be/conll2002/ner/ NEs in Spanish and Dutch
• TIDES surprise language exercise (NEs in Cebuano and
Hindi)
• ACE – English - http://www.ldc.upenn.edu/Projects/ACE/
4/(19)
Some NE Annotated Corpora
• MUC-6 and MUC-7 corpora - English
• CONLL shared task corpora
http://cnts.uia.ac.be/conll2003/ner/ NEs in English and German
http://cnts.uia.ac.be/conll2002/ner/ NEs in Spanish and Dutch
• TIDES surprise language exercise (NEs in Cebuano and
Hindi)
• ACE – English - http://www.ldc.upenn.edu/Projects/ACE/
5/(19)
The MUC-7 corpus
• 100 documents in SGML
• News domain
Named Entities:
• 1880 Organizations (46%)
• 1324 Locations (32%)
• 887 Persons (22%)
• Inter-annotator agreement very high (~97%)
• http://www.itl.nist.gov/iaui/894.02/related_projects/muc/pr
oceedings/muc_7_proceedings/marsh_slides.pdf
6/(19)
The MUC-7 Corpus (2)
<ENAMEX TYPE="LOCATION">CAPE CANAVERAL</ENAMEX>, <ENAMEX
TYPE="LOCATION">Fla.</ENAMEX> &MD; Working in chilly
temperatures <TIMEX TYPE="DATE">Wednesday</TIMEX> <TIMEX
TYPE="TIME">night</TIMEX>, <ENAMEX
TYPE="ORGANIZATION">NASA</ENAMEX> ground crews readied the
space shuttle Endeavour for launch on a Japanese satellite retrieval
mission.
<p>
Endeavour, with an international crew of six, was set to blast off from the
<ENAMEX TYPE="ORGANIZATION|LOCATION">Kennedy Space
Center</ENAMEX> on <TIMEX TYPE="DATE">Thursday</TIMEX> at
<TIMEX TYPE="TIME">4:18 a.m. EST</TIMEX>, the start of a 49-minute
launching period. The <TIMEX TYPE="DATE">nine day</TIMEX> shuttle
flight was to be the 12th launched in darkness.
7/(19)
ACE – Towards Semantic Tagging of
Entities
• MUC NE tags segments of text whenever that text
represents the name of an entity
• In ACE (Automated Content Extraction), these names are
viewed as mentions of the underlying entities. The main
task is to detect (or infer) the mentions in the text of the
entities themselves
• Rolls together the NE and CO tasks
• Domain- and genre-independent approaches
• ACE corpus contains newswire, broadcast news (ASR
output and cleaned), and newspaper reports (OCR output
and cleaned)
8/(19)
ACE Entities
• Dealing with
– Proper names – e.g., England, Mr. Smith, IBM
– Pronouns – e.g., he, she, it
– Nominal mentions – the company, the spokesman
• Identify which mentions in the text refer to which entities,
e.g.,
– Tony Blair, Mr. Blair, he, the prime minister, he
– Gordon Brown, he, Mr. Brown, the chancellor
9/(19)
ACE Example
<entity ID="ft-airlines-27-jul-2001-2"
GENERIC="FALSE"
entity_type = "ORGANIZATION">
<entity_mention ID="M003"
TYPE = "NAME"
string = "National Air Traffic Services">
</entity_mention>
<entity_mention ID="M004"
TYPE = "NAME"
string = "NATS">
</entity_mention>
<entity_mention ID="M005"
TYPE = "PRO"
string = "its">
</entity_mention>
<entity_mention ID="M006"
TYPE = "NAME"
string = "Nats">
</entity_mention>
</entity>
10/(19)
Annotate Gold Standard – Manual
Annotation in GATE GUI
11/(19)
Ontology-Based Annotation
(coming in GATE 4.0)
12/(19)
Two GATE evaluation tools
• AnnotationDiff
• Corpus Benchmark Tool
13/(19)
AnnotationDiff
• Graphical comparison of 2 sets of annotations
• Visual diff representation, like tkdiff
• Compares one document at a time, one
annotation type at a time
• Gives scores for precision, recall, F_measure
etc.
14/(19)
Annotation Diff
15/(19)
Corpus Benchmark Tool
• Compares annotations at the corpus level
• Compares all annotation types at the same time,
i.e. gives an overall score, as well as a score for
each annotation type
• Enables regression testing, i.e. comparison of 2
different versions against gold standard
• Visual display, can be exported to HTML
• Granularity of results: user can decide how
much information to display
• Results in terms of Precision, Recall, F-measure
16/(19)
Corpus structure
• Corpus benchmark tool requires particular
directory structure
• Each corpus must have a clean and marked
directory
• Clean holds the unannotated version, while
marked holds the marked (gold standard) ones
• There may also be a processed subdirectory –
this is a datastore (unlike the other two)
• Corresponding files in each subdirectory must
have the same name
17/(19)
How it works
• Clean, marked, and processed
• Corpus_tool.properties – must be in the directory
from where gate is executed
• Specifies configuration information about
– What annotation types are to be evaluated
– Threshold below which to print out debug info
– Input set name and key set name
• Modes
– Default – regression testing
– Human marked against already stored, processed
– Human marked against current processing results
18/(19)
Corpus Benchmark Tool
19/(19)
Download