File

advertisement
INDUSTRIAL PROJECT 234313
Automatic tagging tool for
Hebrew Wiki pages
User Manual
Supervisors: Dr. Miri rabinovitz, Dr. Haim Mizrahi.
Students: Eyal Sharabi Horwitz, Shiran Cohen.
Table of Contents
1.
Overview ................................................................................................3
2.
Quick Start – Tagging Taxonomy items in a Text ..........................................3
3.
Main Screen ............................................................................................5
4.
Algorithm Selection .................................................................................8
5.
Location algorithm...................................................................................9
6.
Aggregate algorithm .............................................................................. 10
7.
Add New Tag/Accept Suggested Tag ........................................................ 11
8.
Installation Guide .................................................................................. 11
9.
Miscellaneous ....................................................................................... 11
2
1. Overview
This program searches a text for entries in a given Taxonomy and tags them.
Additionally it suggests new items to be added to the taxonomy based on the
analysis of the current text and the analysis of the document corpus.
2. Quick Start – Tagging Taxonomy items in a Text
When beginning a new session please follow the following steps:
a. Load Taxonomy
From a file or from the Database, an up-to-date Taxonomy is a crucial
necessity for all the various tagging algorithms.
b. Load a Text File
Load the text you wish to the program to analyze and Tag.
In actuality, the program doesn't load a plain text file, but rather an xml file
containing the results of texts analysis by the morphology analyzer.
3
c. Click on Initiate Tagging
To begin the tagging operation
d. Modify Algorithm Selection (optional)
More Advanced users might like to select additional tagging algorithms and
or change the weight of currently selected algorithms. this can be done in
the Algorithm Selection screen.
Once the settings has been changed repeat stage c.
4
3. Main Screen
a
b
c
e
d
f
g
h
k
i
a. File Menu
i. Open Text file
Displays an Open file dialog window and loads the selected
document.
The program can open XML files which are the product of the
morphology analyzer on a text.
ii. Exit
Closes the program
5
j
b. Taxonomy Menu
i. Load From File
Displays an Open File Dialog and loads the selected taxonomy.
The program can open XML files which are the product of the
morphology analyzer on the taxonomy.
Please Note: Does not load duplicate entries.
ii. Load From DB
Loads the Taxonomy from the Database.
Please Note: Does not load duplicate entries.
iii. Save To DB
Saves the current Taxonomy to the Database.
Saves only the taxonomy items which are not currently in the
Database. If a new Taxonomy analysis is loaded it is advised to clear
the database before saving to avoid using antiquated analysis in the
future.
iv. Save To Excel
Opens a Save File dialog and saves the taxonomy to an excel file in
the selected location.
v. Save New Entries to Excel
Opens a Save File dialog and saves the any new taxonomy to an
excel file in the selected location.
New entries are entries which have been added manually by the
user, as opposed to loaded from file.
New entries which were saved and then loaded from the DB are still
considered New.
vi. Clear Session Taxonomy
Clears the current taxonomy from all entries. requires the user to
verify their choice before completing this action.
It is recommended that you clear the taxonomy before loading a
new taxonomy analysis from file.
vii. Clear Taxonomy from DB
Clears the taxonomy from the database.
Since this action is irreversible and potentially damaging the user
will be required to verify his choice twice.
6
It is recommended that you clear the DB before saving a new
taxonomy analysis from file
c. Algorithm Menu
i. Algorithm Selection
Opens the Algorithm configurations window.
ii. Location Tagging
Opens the Location Tagging configurations window.
iii. Aggregate Tagging
Opens the Aggregate Tagging configurations window.
d. Help
i. Quick start – a short explanation on how to get started with the
automatic tagging tool.
ii. About – the creators of the automatic tagging tool and the
supervisors.
e. Tag Tabs
Switches between the different tag types. the number on each tab is the
number if tags of the particular type. When a tab is changes the Tag list (f) is
shown and all the relevant tags are highlighted.
i. Taxonomy
Taxonomy entries found in the text.
ii. Entities
Entities found in the text (as provided by the Entity analyzer).
iii. Suggested
New Taxonomy entries that the various algorithms are suggesting to
the user.
f. Tag Lists
A list containing all the tags of the currently selected type. the list is
organized by alphabetical order and in the case of taxonomy tags also by
subject and category.
g. Tag Details
The details of the currently selected tag.
h. Tagging rational
A list displaying which algorithms decided to tag the current phrase.
i. Accept Tag
Opens a window that adds the currently selected Suggested Tag into the
taxonomy.
j. Initiate Tagging
7
Begin the Tagging operation with the currently selected setting. clears all
previous results.
k. Maximum Suggested Tag Slider
Controls the maximum amount of suggested tags to be displayed.
4. Algorithm Selection
In this screen the user can manipulate which tagging algorithms to employ and what
weight should be given to their results.
Additionally the user can configure the weight of the various phrase lengths.
There are four algorithms available:
a. Foreign Language
This algorithm suggests new phrases based on the theory that in a Hebrew
text any phrase in a foreign language is significant.
b. Frequency
This algorithm suggests new phrases based on their frequency in the text
and their rarity in the document corpus.
Document corpus – all the documents analyzed so far by this algorithm.
c. Location
This algorithm suggests new phrases based on their location within the text.
d. Aggregate
This algorithm aggregates the results of the Frequency and Location tagging
algorithms and suggests new phrases based on their combined results.
e. Accept – modifies the algorithm weight according to the new values.
8
f.
Cancel – cancels the current operation and closes the window.
5. Location algorithm
In this screen the user can manipulate the internal workings of the Location Tagging
algorithm.
The algorithm suggests new phrases to add to the taxonomy based on their location
in the text.
It uses three primary location indexes:
a. Distance from Start – the distance of the phrase from the beginning of the
text.
b. Life Span – the percentage of the text that the phrase encompasses
between its first and last appearance in the text.
c. Distance from End - the distance of the phrase from the end of the text.
d. Accept – modifies the Location algorithm according to the new values.
e. Clear – clear the fields to their initial values.
f. Cancel – cancels the current operation and closes the window.
9
6. Aggregate algorithm
In this screen the user can manipulate the internal workings of the Aggregate
Tagging algorithm.
The algorithm suggests new phrases to add to the taxonomy based on the cross
referenced and analyzed results of other algorithms.
It uses three primary Algorithms:
a. Location – suggests new phrases based on their location within the text.
b. Frequency – algorithm suggests new phrases based on their frequency in the
text and their rarity in the document corpus.
c. Noun – phrase with nouns are more likely to be selected by a user and are
given special treatment.
d. Accept – modifies the Aggregate algorithm according to the new values.
e. Clear – clear the fields to their initial values.
f. Cancel – cancels the current operation and closes the window.
10
7. Add New Tag/Accept Suggested Tag
In this screen the user can add a new entry into the taxonomy, whether it be from a
suggested tag, a passage in the current text or free text written by the user.
a. phrase – The Taxonomy entry to enter
b. Category – The category of the phrase. The user can choose from existing
categories or create a new one.
c. SubCategory – The Subcategory of the current phrase. the user can choose
from the existing subcategories or create a new one.
d. Accept –inputs the entry into the taxonomy.
e. Clear – clear the fields to their initial values.
f. Cancel – cancels the current operation and closes the window.
8. Installation Guide
Copy the execution files and all related dlls to the desired directory.
Additionally make sure that the Access file TagDB.accdb is at that location.
9. Miscellaneous
11
Download