INDUSTRIAL PROJECT 234313 Automatic tagging tool for Hebrew Wiki pages User Manual Supervisors: Dr. Miri rabinovitz, Dr. Haim Mizrahi. Students: Eyal Sharabi Horwitz, Shiran Cohen. Table of Contents 1. Overview ................................................................................................3 2. Quick Start – Tagging Taxonomy items in a Text ..........................................3 3. Main Screen ............................................................................................5 4. Algorithm Selection .................................................................................8 5. Location algorithm...................................................................................9 6. Aggregate algorithm .............................................................................. 10 7. Add New Tag/Accept Suggested Tag ........................................................ 11 8. Installation Guide .................................................................................. 11 9. Miscellaneous ....................................................................................... 11 2 1. Overview This program searches a text for entries in a given Taxonomy and tags them. Additionally it suggests new items to be added to the taxonomy based on the analysis of the current text and the analysis of the document corpus. 2. Quick Start – Tagging Taxonomy items in a Text When beginning a new session please follow the following steps: a. Load Taxonomy From a file or from the Database, an up-to-date Taxonomy is a crucial necessity for all the various tagging algorithms. b. Load a Text File Load the text you wish to the program to analyze and Tag. In actuality, the program doesn't load a plain text file, but rather an xml file containing the results of texts analysis by the morphology analyzer. 3 c. Click on Initiate Tagging To begin the tagging operation d. Modify Algorithm Selection (optional) More Advanced users might like to select additional tagging algorithms and or change the weight of currently selected algorithms. this can be done in the Algorithm Selection screen. Once the settings has been changed repeat stage c. 4 3. Main Screen a b c e d f g h k i a. File Menu i. Open Text file Displays an Open file dialog window and loads the selected document. The program can open XML files which are the product of the morphology analyzer on a text. ii. Exit Closes the program 5 j b. Taxonomy Menu i. Load From File Displays an Open File Dialog and loads the selected taxonomy. The program can open XML files which are the product of the morphology analyzer on the taxonomy. Please Note: Does not load duplicate entries. ii. Load From DB Loads the Taxonomy from the Database. Please Note: Does not load duplicate entries. iii. Save To DB Saves the current Taxonomy to the Database. Saves only the taxonomy items which are not currently in the Database. If a new Taxonomy analysis is loaded it is advised to clear the database before saving to avoid using antiquated analysis in the future. iv. Save To Excel Opens a Save File dialog and saves the taxonomy to an excel file in the selected location. v. Save New Entries to Excel Opens a Save File dialog and saves the any new taxonomy to an excel file in the selected location. New entries are entries which have been added manually by the user, as opposed to loaded from file. New entries which were saved and then loaded from the DB are still considered New. vi. Clear Session Taxonomy Clears the current taxonomy from all entries. requires the user to verify their choice before completing this action. It is recommended that you clear the taxonomy before loading a new taxonomy analysis from file. vii. Clear Taxonomy from DB Clears the taxonomy from the database. Since this action is irreversible and potentially damaging the user will be required to verify his choice twice. 6 It is recommended that you clear the DB before saving a new taxonomy analysis from file c. Algorithm Menu i. Algorithm Selection Opens the Algorithm configurations window. ii. Location Tagging Opens the Location Tagging configurations window. iii. Aggregate Tagging Opens the Aggregate Tagging configurations window. d. Help i. Quick start – a short explanation on how to get started with the automatic tagging tool. ii. About – the creators of the automatic tagging tool and the supervisors. e. Tag Tabs Switches between the different tag types. the number on each tab is the number if tags of the particular type. When a tab is changes the Tag list (f) is shown and all the relevant tags are highlighted. i. Taxonomy Taxonomy entries found in the text. ii. Entities Entities found in the text (as provided by the Entity analyzer). iii. Suggested New Taxonomy entries that the various algorithms are suggesting to the user. f. Tag Lists A list containing all the tags of the currently selected type. the list is organized by alphabetical order and in the case of taxonomy tags also by subject and category. g. Tag Details The details of the currently selected tag. h. Tagging rational A list displaying which algorithms decided to tag the current phrase. i. Accept Tag Opens a window that adds the currently selected Suggested Tag into the taxonomy. j. Initiate Tagging 7 Begin the Tagging operation with the currently selected setting. clears all previous results. k. Maximum Suggested Tag Slider Controls the maximum amount of suggested tags to be displayed. 4. Algorithm Selection In this screen the user can manipulate which tagging algorithms to employ and what weight should be given to their results. Additionally the user can configure the weight of the various phrase lengths. There are four algorithms available: a. Foreign Language This algorithm suggests new phrases based on the theory that in a Hebrew text any phrase in a foreign language is significant. b. Frequency This algorithm suggests new phrases based on their frequency in the text and their rarity in the document corpus. Document corpus – all the documents analyzed so far by this algorithm. c. Location This algorithm suggests new phrases based on their location within the text. d. Aggregate This algorithm aggregates the results of the Frequency and Location tagging algorithms and suggests new phrases based on their combined results. e. Accept – modifies the algorithm weight according to the new values. 8 f. Cancel – cancels the current operation and closes the window. 5. Location algorithm In this screen the user can manipulate the internal workings of the Location Tagging algorithm. The algorithm suggests new phrases to add to the taxonomy based on their location in the text. It uses three primary location indexes: a. Distance from Start – the distance of the phrase from the beginning of the text. b. Life Span – the percentage of the text that the phrase encompasses between its first and last appearance in the text. c. Distance from End - the distance of the phrase from the end of the text. d. Accept – modifies the Location algorithm according to the new values. e. Clear – clear the fields to their initial values. f. Cancel – cancels the current operation and closes the window. 9 6. Aggregate algorithm In this screen the user can manipulate the internal workings of the Aggregate Tagging algorithm. The algorithm suggests new phrases to add to the taxonomy based on the cross referenced and analyzed results of other algorithms. It uses three primary Algorithms: a. Location – suggests new phrases based on their location within the text. b. Frequency – algorithm suggests new phrases based on their frequency in the text and their rarity in the document corpus. c. Noun – phrase with nouns are more likely to be selected by a user and are given special treatment. d. Accept – modifies the Aggregate algorithm according to the new values. e. Clear – clear the fields to their initial values. f. Cancel – cancels the current operation and closes the window. 10 7. Add New Tag/Accept Suggested Tag In this screen the user can add a new entry into the taxonomy, whether it be from a suggested tag, a passage in the current text or free text written by the user. a. phrase – The Taxonomy entry to enter b. Category – The category of the phrase. The user can choose from existing categories or create a new one. c. SubCategory – The Subcategory of the current phrase. the user can choose from the existing subcategories or create a new one. d. Accept –inputs the entry into the taxonomy. e. Clear – clear the fields to their initial values. f. Cancel – cancels the current operation and closes the window. 8. Installation Guide Copy the execution files and all related dlls to the desired directory. Additionally make sure that the Access file TagDB.accdb is at that location. 9. Miscellaneous 11