Uploaded by cordyceps96

Guidelines TextHitapp v4

advertisement
Guidelines
There are 2 categories of judgements in this HitApp:
1) Judgements for “Description”
2) Judgements for “Fact”
Please Note That - New guidelines from 2022/09/09 are highlighted.
Description
The objective of this type of task is to assess the Description derived from the Wikipedia page. HitApp layout for the
Description looks as shown below.
Select Exact extraction from 1 paragraph(s) if –
Description is valid if and only If :
o
Description derived appears on the first two paragraphs of the Wikipedia page in the same order (first
paragraph comes first, second paragraph comes second).
o
It is also ok to have description only from the first paragraph.
o
It cannot have any missing words or lines. Sentences should be well formed that is not ending abruptly.
o
It’s ok for the description skip text from Wikipedia page which is inside brackets. The only exceptions to
this are – when the text in Wikipedia page is also part of “Name” of Wikipedia page or when the text
inside brackets is part of mathematical expression /formula.
o
Extraction shouldn’t have unexpected notes or symbols (see below).
Select Description contains unexpected symbols or notes if –
Description has any cite-notes, audio, superscript, wikidata sitelink etc).
Wikipedia page can have below information which should not be part of derived Description (highlighted below in red
boxes):

Any text inside brackets in Wikipedia Page should not be part of description. For example: in below the
entire text inside brackets should be skipped.
There are some exceptions to this rule though.
 If the text inside brackets is part of some mathematical expression/ formula then it can
be present.
 If the ‘Name’ of the Wikipedia Page includes parts/ whole of the text inside brackets
then it can be present

Only in both of the above cases the description can contain text as present in Wikipedia
and that’s ok.
Cite-Note.

Audio text to skip: pronunciation, pronounced, listen, play, audio

Cite-Note and Audio

Wiki Label

Wiki Note

Wiki Data Site Link

Superscript: Superscripts should be as is in Wikipedia. For instance, should be km² instead of km2
Select Description is not from first two paragraphs if –
Description, though extracted correctly, does not come from first two paragraphs, but comes from other paragraphs.
Select Description does not match the original if –
Description has missing words or lines compared to paragraphs where it is from. Description has different text compared
with paragraphs.
Select Other for any others.
Fact
The objective of this task is to evaluate and assess quality of facts represented by a pair of two strings (Fact Label and
Fact Value). HitApp layout for the Fact looks below.



Step 1: For a given Wikipedia page shown, locate the infobox table (note that a page can have one or more
infoboxes, hence vertical scrolling might be needed).
Step 2: Refer to the table below for a pair of facts represented by a Fact Label and a Fact Value.
Step 3: Assess the quality of the Fact Label and the Fact Value by comparing the text to the text in the Wikipedia
infobox table. Please refer to the examples below. Look for grammatical, syntactical, formatting anomalies.
Example - Infobox and Facts are match, select “Yes”.
Select Fact extracted accurately if –
Fact Label and Fact Value are correctly extracted from the infobox table.
Select Infobox not present on Wikipedia page if –
The Infobox table is not present in the reference Wikipedia page.
Select Fact value is partially missing if –
Fact value has missing text. Please refer to the following two examples. First sample is missing “; 72 years ago”. Second
sample is missing “(age 29)”.
Select Extracted fact doesn't match the fact in infobox if –
Fact label or Fact Value do not correctly match the infobox table. For example, it contains excluded content or different
text, etc.
Select Other for any others.
Download