Chapter 11 STATEMENT File Syntax This chapter describes the two formats you can use to enter thesaurus terms in a STATEMENT file. For more information, see “Entering Thesaurus Terms.” Directives, Etc. Directives are keywords that you can use in a file of thesaurus term transactions to convey processing instructions to the TM utility. These directives are available: BEGIN_LAYOUT indicates the start of a TEMPLATE or FREE format definition. END_LAYOUT indicates the end of a format definition. BEGIN_REL indicates the start of a group of term transactions. END_REL indicates the end of a group of term transactions. BEGIN_HIER indicates the start of a group of hierarchical terms entered using the Indentation method. In this method, you indicate relations by indenting subordinate terms. END_HIER indicates the end of a group of hierarchical terms entered using the Indentation method. OLD_RELATION is used with the EDIT or EDIT_Informative action to indicate the term or portion of the term to be changed. STATEMENT File Syntax 231 NEW_RELATION is used with OLD_RELATION to indicate the replacement term. BEGIN_REL Parameters You can specify with a BEGIN_REL directive optional parameters which determine the action code and thesaurus for a group of term transactions. The THES_NAME parameter can be specified on the same line as the BEGIN_REL directive, but the ACTION_CODE parameter must be on its own line; for example: BEGIN_REL THES_NAME=DE ACTION_CODE=ADD ACTION_CODE. Indicates the action the system will perform on a term or a group of terms. If you are using FREE format, the action code applies to all succeeding transactions until either a new ACTION_CODE parameter or END_REL directive is encountered. Specify one of these action codes: Action Code Description ADD | + | blank Adds terms plus reciprocals. Default. ADD_Only Adds new terms plus reciprocals. If any of the terms already exist in the thesaurus, the system will treat the transaction as an error. DELete | - Deletes terms plus reciprocals. Be aware of these cases with deletion: If you specify a lead term by itself, the system will delete that term plus all its relations and their reciprocals. If these deletions result in a lead term without relations, the system will delete that lead term. If you specify a lead term with relations, and if the deletion causes singleton lead term(s) to remain, the system will delete the remaining lead term(s). If you want to have the singleton lead term kept, use the action DELete_Keep. DEL_KLEAD DEL_KEEP_LEAD 232 STATEMENT File Syntax Deletes terms and reciprocals, but keeps any singleton lead terms resulting from the deletion. Action Code Description EDIT Changes an existing term or terms by replacing the old with the new. Use the OLD_RELATION and NEW_RELATION keywords to indicate the term to be changed and the new term. To make a global change (i.e., to edit every occurrence of a specified term), specify a term as a lead term (not as a relation). If you specify a relation as the OLD_RELATION, then the system will change only that relation and it’s reciprocal as indicated in the NEW_RELATION statement. EDIT_KLEAD EDIT_KEEP_LEAD Changes terms and reciprocals, but keeps any singleton lead terms resulting from the edit. EDIT_INFormative Changes the text in INFORMATIVE relation-type messages. Use the OLD_RELATION and NEW_RELATION keywords to indicate how you want the message changed. THES_NAME. Indicates the name of the thesaurus to which you want the terms entered. In addition to a thesaurus name, you can specify an alias. Do not specify a thesaurus list name. The thesaurus name applies to all succeeding transactions until either a new THES_NAME parameter or END_REL directive is encountered. Comments You can include comments or other information that you want the system to ignore by preceding the comment with <<<. The comment characters must begin on a new line; for example: WRONG: LT = Canada CORRECT: LT = Canada <<< This is a comment. <<< This is a comment. STATEMENT File Syntax 233 Line Continuation Typically, you will specify one term transaction per line in the STATEMENT file. However, for long thesaurus terms or INFORMATIVE messages, you may need to continue a transaction to the next line. To continue a transaction, type a plus sign (+) at the end of the line; for example: LT = Kuwait HN = A sheikdom before 1961 and under British protection. Kuwait is an important oil producing nation. + The system will concatenate the data in the current line to the next line, replacing the + with the first character of the next line. Blank Lines and Spaces The system will eliminate all leading and trailing blanks in a term or message. All multiple internal blanks are squeezed to one. If you want to include blank lines in a STATEMENT file for readability, use the comment characters; for example: BEGIN_LAYOUT FORMAT=FREE END_LAYOUT <<< THES_NAME=MD <<< ACTION_CODE=EDIT; OLD_RELATION: LT=ANIMAL; NT=DOG NEW_RELATION: LT=ANIMAL; NT=CANINE FREE Format The FREE format lets you use a “keyword=” type of approach to specify thesaurus term transactions. With FREE format you don’t have to enter term data in any particular columns. A typical term transaction consists of a relation-type, the term and optional qualifying parameters; for example: LT= Electron beam deflection tubes/TERM_TAG=MBT Syntax A FREE format definition is made up of these required and optional (shown in braces) parameters: 234 STATEMENT File Syntax BEGIN_LAYOUT FORMAT=FREE {,DATA_TERM_SEParator=sep_char} {,END_OF_DATA_STATement=stat_char} {,DATA_QUAL_SEParator=sep_char} END_LAYOUT Following is a description of each parameter: FORMAT - Defines the format you are using to enter term transactions. (TEMPLATE is the default.) END_OF_DATA_STATement - Defines a special character to mark the end of a statement. Enclose the character you are defining in single quotes. Default is a semicolon (;). For example, the percent sign (%) is defined as the end of statement marker: END_OF_DATA_STAT='%' LT=Canada % NT=Ontario % LT=Ontario % NT=Stratford DATA_QUAL_SEParator - Defines a special character to separate a term from a term qualifier (e.g., ANNOTATE). Enclose the character you are defining in single quotes. Default is a slash (/). For example, the ampersand (&) is defined as the data qualifier marker: DATA_QUAL_SEP='&' LT=Canada & ANNOTATE=* DATA_TERM_SEParator - Defines a special character to separate terms typed on the same line. Enclose the character you are defining in single quotes. Default is a comma (,). For example, the pound sign (#) is defined as the data_term separator: DATA_TERM_SEPARATOR='#' NT = Ontario# Quebec# British Columbia# New Brunswick Term Qualifiers You can specify optional term information using these qualifier parameters: ANNOTATE - Specifies an annotation character that you want the system to include with a subordinate term in displays and prints of the thesaurus. Use only with non-lead terms; for example: LT=Kuwait BT=Middle East/ANNOTATE=# STATEMENT File Syntax 235 RECIP_ANNOTate - Specifies an annotation character that you want the system to include with the reciprocal of a subordinate term in displays and prints of the thesaurus. Use only with non-lead terms for example: LT=TNT BT=Explosives/RECIP_ANNOT=* TERM_TAG - Specify any information that you want the system to carry along with the lead term. For example, you might specify a facet (level) number or subject category: LT=Europe/TERM_TAG=2 NT=France LT=France/TERM_TAG=2.3 NT=Paris Examples 1. DATA_TERM_SEParator and END_OF_DATA_STATement in this layout definition could have been omitted since they define the defaults. BEGIN_LAYOUT FORMAT=FREE DATA_TERM_SEP=',' END_OF_DATA_STAT=';' END_LAYOUT <<< <<< STILL CAMERAS has four narrower terms added <<< and 3 related terms <<< BEGIN_REL THES_NAME=MD LT = Still Cameras NT = Instant picture cameras, Miniature cameras, + Reflex cameras, View Cameras RT = Cameras, Television Cameras, Television END_REL 236 STATEMENT File Syntax 2. The following action will delete a lead term plus all its subordinate terms. If any singleton lead terms result, the system will preserve them. BEGIN_LAYOUT FORMAT=FREE END_LAYOUT BEGIN_REL ACTION_CODE=DELETE_KEEP LT=Land Cameras, Stereo Cameras, Underwater Cameras END_REL 3. These lead terms include term tag qualifiers: BEGIN_LAYOUT FORMAT=FREE END_LAYOUT BEGIN_REL ACTION_CODE=A THES_NAME=MD LT= Electron beam deflection tubes/TERM_TAG=MBT NT =Indicator tubes (tuning), Trochotrons THES_NAME=TDK LT= Tetrodes/TERM_TAG=MDA NT= Dynatrons, Resnatrons END_REL 4. If a statement is very long, you can continue it to the next line using a plus sign (+). LT= Television Camera Tubes; UF= Camera tubes (television), + Emitrons, Iconoscopes,Image orthicons;RT= Phototubes 5. If you want to edit only one relation (not all occurrences of the term), then specify the lead term and the subordinate term. BEGIN_LAYOUT FORMAT=FREE END_LAYOUT <<< <<< The relation ANIMAL NT DOG is modified to <<< ANIMAL NT CANINE <<< The reciprocal modification includes <<< CANINE NT ANIMAL <<< Note: If DOG contains other relations, the system will <<< not change the lead term DOG and its other <<< relations. <<< THES_NAME=MD ACTION_CODE=EDIT; OLD_RELATION: LT=ANIMAL; NT=DOG NEW_RELATION: LT=ANIMAL; NT=CANINE 6. To limit changes to the text of a specific INFORMATIVE message, specify the lead term to which the message belongs. Here the typo in the word “government” is corrected. STATEMENT File Syntax 237 BEGIN_LAYOUT FORMAT=FREE END_LAYOUT ACTION_CODE=EDIT_INFORMATIVE; THES_NAME=MD OLD_RELATION: LT=COMMUNES (China); SN=local governemnt NEW_RELATION: LT=COMMUNES (China); SN=local government Old Message New Message SN = Large-scale enterprise which includes collectivized agriculture, industry, social services and local governemnt functions. SN = Large-scale enterprise which includes collectivized agriculture, industry, social services and local government functions. TEMPLATE Format The TEMPLATE Format lets you define the layout of thesaurus term transactions by character position in the STATEMENT file. Syntax A TEMPLATE format definition is made up of these required parameters. You must specify all of these parameters, even though you may not use them. (Default start and end character positions are shown.) BEGIN_LAYOUT FORMAT=TEMPLATE ACTION_CODE= (1:3) REL_TYPE= (9:11) TERM= (17:80) THES_NAME=(13:15) EDIT_SUB_ACTION= (5:7) ANNOTATE= (82:84) RECIP_ANNOT= (86:88) TERM_TAG= (90:100) END_LAYOUT 238 STATEMENT File Syntax Following is a description of each parameter: FORMAT - Indicates the type of format you are using. Default is TEMPLATE. ACTION_CODE=(1:3) - Defines the location of the action code in a term transaction. REL_TYPE=(9:11) - Defines the location of the relation-type in a term transaction. TERM=(17:80) - Defines the location of the term in a term transaction. EDIT_SUB_ACTION=(5:7) - Defines the location of the OLD and NEW keywords for an edit transaction. THES_NAME=(13:15) - Defines the location of the thesaurus name. ANNOTATE=(82:84) - Defines the location of an annotation character that you want the system to include with a term in displays and prints of the thesaurus. Use only with non-lead terms. RECIP_ANNOT=(86:88) - Defines the location of an annotation character that you want the system to include with the reciprocal of a term in prints and displays of the thesaurus. Use only with non-lead terms. TERM_TAG=(90:100) - Defines the location of any additional information that you want the system to carry along with the lead term. You might specify a facet (level) number or subject category, for example. Examples 1. Since the following example uses the default TEMPLATE format, the file need not contain BEGIN_LAYOUT and END_LAYOUT directives. Also, the THES_NAME directive saves keystrokes by indicating in one place the target thesaurus for these terms. (The line of numbers is a comment to aid in positioning the term data.) STATEMENT File Syntax 239 <<<45678901234567890123456789012345678901234567890123456789012345678901 BEGIN_REL THES_NAME=MD LT View Cameras UF Land Cameras BT Still Cameras SN Cameras with through-the lens focusing and a range of + movements of the lens plane relative to the film plane. LT Cine Cameras NT Underwater cine cameras END_REL 2. Because the action code and thesaurus are specified in the BEGIN_REL directive, the user only has to specify the relation-type and term: BEGIN_LAYOUT FORMAT=TEMPLATE ACTION_CODE=(1:3) REL_TYPE=(5:8) TERM=(10:80) EDIT_SUB_ACTION=(84:86) THES_NAME=(87:89) ANNOTATE=(90:92) RECIP_ANNOT=(93:95) TERM_TAG=(96:100) END_LAYOUT <<< BEGIN_REL THES_NAME=DE <<<456789012345678901234567890 LT Sodium Salts NT Sodium Chlorate NT Sodium Chloride NT Sodium Bicarbonate NT Sodium Phosphate LT Sodium Chloride UF Salt 3. The action code EDIT will change all the occurrences of a misspelled word to the corrected form. Similar to a global search and replace. <<<45678901234567890123456789012345678901234567890123 EDIT OLD LT MD DOG EDIT NEW LT MD CANINE This transaction will change all occurrences of the term Dog to Canine. 240 STATEMENT File Syntax Original Thesaurus Modified Thesaurus after change from DOG to Canine Animal Animal NT DOG Boxer BT DOG Collie BT DOG DOG BT Animal NT Collie NT Boxer 4. NT Canine Boxer BT Canine Canine BT Animal NT Boxer NT Collie Collie BT Canine You can also use the EDIT action to edit optional information for a thesaurus term. These transactions edit an annotation symbol or classification number. <<<4567890123456789012345678901234567890...12345678901234567890123456789 EDIT OLD LT MD DOG EDIT OLD NT MD COLLIE EDIT NEW LT MD DOG * > C1.252.40.552.846 EDIT NEW NT MD COLLIE STATEMENT File Syntax 241 Indentation Method of Entering Hierarchies It’s often easier to enter hierarchies by simply typing the terms themselves, one per line, and indenting to indicate relations. This section describes the syntax you need for the indentation method. Syntax You can use the indentation method with TEMPLATE or FREE format definitions. Use these directives and parameters to indicate you are using the indentation method: BEGIN_HIER INDENT_LEVEL=integer ,REL_TYPE=rel_type ,THES_NAME=thesnam END_HIER Following is a description of each parameter: INDENT_LEVEL - Indicates the number of spaces you want to indent a subordinate relation. If the system finds more blanks than you specified with the INDENT_LEVEL parameter, it will consider it an error and the tree will be rejected. (Required) REL_TYPE - Indicates the GENERAL or SPECIFIC relation-type you want to use in the hierarchy. If you specify this parameter with the BEGIN_HIER directive, the system will assume this relation-type for any terms without a relation-type. (Optional) THES_NAME - Indicates the name of the thesaurus to load the terms. (Required) Using the Indentation Method The idea behind the BEGIN_HIER, END_HIER directives is that the information within the directives contains one and only one entire hierarchical tree (or a sub tree). Keep these points in mind when using the indentation method: TM will add the new tree if it does not already exist, or it will replace the tree that already exists with the terms you specify. If you only want to add only a few relations to a hierarchy, use the conventional input methods. Don’t use the indentation method for editing or deleting terms in a hierarchy. To change or delete terms in a hierarchy, use the BROWSE action with Update intent. 242 STATEMENT File Syntax You can use other relation-types within a hierarchy provided they are not defined as GENERAL or SPECIFIC. You must indent a non-hierarchical relation under the term it refers to; for example: Animals Mammals Dogs Poodle Great Dane SN Any breed of tall, massive powerful + smooth-coated dogs. Cats Examples 1. Since the user specified REL_TYPE=NT, the system will assume this relation-type for all the terms within the hierarchy directive. The user also indicated each level by indenting three spaces. <<<45678901234567890123456789012345678901234567890123 BEGIN_LAYOUT FORMAT=FREE END_LAYOUT BEGIN_HIER THES_NAME=MD, REL_TYPE=NT, INDENT_LEVEL=3 Animals Mammals Dogs Poodle Great Dane Cats Fish Sharks Hammer Head Great White Birds Humming Bird Turkey END_HIER STATEMENT File Syntax 243 2. You can use other relation-types within a hierarchy provided they are not defined as GENERAL or SPECIFIC. BEGIN_LAYOUT FORMAT=TEMPLATE ACTION_CODE=(1:3) REL_TYPE=(5:8) TERM=(9:80) EDIT_SUB_ACTION=(84:86) THES_NAME=(87:89) ANNOTATE=(90:92) RECIP_ANNOT=(93:95) TERM_TAG=(96:100) END_LAYOUT <<< BEGIN_HIER THES_NAME=MD, REL_TYPE=NT, INDENT_LEVEL=3 <<<456789012345678901234567890123456789012345678901234567890 TERM A SN THIS CONTAINS TEXTUAL INFORMATION REGARDING THE + SN USE OF TERM A - IT HAS IMPORTANT MEANING TERM B RT TERM B.A RT TERM B.B TERM C TERM D TERM E SN THIS CONTAINS INFORMATION REGARDING TERM E TERM F TERM G END_HIER 3. You can include facet (classification) numbers with the indentation method: BEGIN_LAYOUT FORMAT=TEMPLATE ACTION_CODE=(1:3) TERM_TAG=(5:11) TERM=(13:80) EDIT_SUB_ACTION=(84:86) THES_NAME=(87:89) ANNOTATE=(90:92) RECIP_ANNOT=(93:95) REL_TYPE=(96:100) END_LAYOUT <<< 244 STATEMENT File Syntax BEGIN_HIER INDENT_LEVEL=2,THES_NAME=MD, REL_TYPE=NT <<<456789012345678901234567890123456789012345678901234567890 1 TERM A 1.1 TERM B 1.1.1 TERM C1 1.1.1.1 TERM Z 1.1.1.2 TERM Y 1.1.1.3 TERM X 1.1.2 TERM D 1.2 TERM E 1.2.1 TERM C 1.2.1.1 TERM Z 1.2.1.2 TERM Y 1.2.1.3 TERM X 1.2.2 TERM F 1.2.3 TERM G END_HIER STATEMENT File Syntax 245 4. This example further demonstrates non-hierarchical relation-types included in an hierarchy. The position of the non-hierarchical relation-types, RT and SN, determine to which lead term they relate. BEGIN_LAYOUT ACTION_CODE=(1:3) REL_TYPE=(5:8) TERM_TAG=(10:14) TERM=(17:80) EDIT_SUB_ACTION=(84:86) THES_NAME=(87:89) ANNOTATE=(90:92) RECIP_ANNOT=(93:95) END_LAYOUT BEGIN_HIER INDENT_LEVEL=2,THES_NAME=MD <<<4567890123456789012345678901234567890123456789012345 67890 TERM A 1 RT THIS CONTAINS TEXTUAL INFORMATION + SN REGARDING THE+ USE OF TERM A - IT HAS GREAT + SN MEANING WERE BLAH TERM B 1.1 NT TERM B.A RT TERM B.B RT TERM C 1.1.1 NT TERM D 1.1.2 NT TERM E 1.2 NT THIS CONTAINS INFORMATION + SN REGARDING TERM E TERM F 1.2.1 NT TERM G 1.2.2 NT END_HIER 246 STATEMENT File Syntax