THE Chapter 11: STATEMENT File Syntax

advertisement
Chapter 11
STATEMENT File Syntax
This chapter describes the two formats you can use to enter thesaurus terms in a
STATEMENT file. For more information, see “Entering Thesaurus Terms.”
Directives, Etc.
Directives are keywords that you can use in a file of thesaurus term transactions to convey
processing instructions to the TM utility. These directives are available:
BEGIN_LAYOUT indicates the start of a TEMPLATE or FREE format definition.
END_LAYOUT indicates the end of a format definition.
BEGIN_REL indicates the start of a group of term transactions.
END_REL indicates the end of a group of term transactions.
BEGIN_HIER indicates the start of a group of hierarchical terms entered using the
Indentation method. In this method, you indicate relations by indenting subordinate
terms.
END_HIER indicates the end of a group of hierarchical terms entered using the
Indentation method.
OLD_RELATION is used with the EDIT or EDIT_Informative action to indicate
the term or portion of the term to be changed.
STATEMENT File Syntax  231
NEW_RELATION is used with OLD_RELATION to indicate the replacement
term.
BEGIN_REL Parameters
You can specify with a BEGIN_REL directive optional parameters which determine the
action code and thesaurus for a group of term transactions. The THES_NAME parameter
can be specified on the same line as the BEGIN_REL directive, but the ACTION_CODE
parameter must be on its own line; for example:
BEGIN_REL THES_NAME=DE
ACTION_CODE=ADD
ACTION_CODE. Indicates the action the system will perform on a term or a group of
terms. If you are using FREE format, the action code applies to all succeeding
transactions until either a new ACTION_CODE parameter or END_REL directive is
encountered. Specify one of these action codes:
Action Code
Description
ADD | + | blank
Adds terms plus reciprocals. Default.
ADD_Only
Adds new terms plus reciprocals. If any of the terms
already exist in the thesaurus, the system will treat the
transaction as an error.
DELete | -
Deletes terms plus reciprocals. Be aware of these cases
with deletion:
If you specify a lead term by itself, the system will delete
that term plus all its relations and their reciprocals. If
these deletions result in a lead term without relations, the
system will delete that lead term.
If you specify a lead term with relations, and if the deletion
causes singleton lead term(s) to remain, the system will
delete the remaining lead term(s). If you want to have the
singleton lead term kept, use the action DELete_Keep.
DEL_KLEAD
DEL_KEEP_LEAD
232  STATEMENT File Syntax
Deletes terms and reciprocals, but keeps any singleton lead
terms resulting from the deletion.
Action Code
Description
EDIT
Changes an existing term or terms by replacing the old
with the new. Use the OLD_RELATION and
NEW_RELATION keywords to indicate the term to be
changed and the new term.
To make a global change (i.e., to edit every occurrence of
a specified term), specify a term as a lead term (not as a
relation). If you specify a relation as the
OLD_RELATION, then the system will change only that
relation and it’s reciprocal as indicated in the
NEW_RELATION statement.
EDIT_KLEAD
EDIT_KEEP_LEAD
Changes terms and reciprocals, but keeps any singleton
lead terms resulting from the edit.
EDIT_INFormative
Changes the text in INFORMATIVE relation-type
messages. Use the OLD_RELATION and
NEW_RELATION keywords to indicate how you want the
message changed.
THES_NAME. Indicates the name of the thesaurus to which you want the terms entered.
In addition to a thesaurus name, you can specify an alias. Do not specify a thesaurus list
name. The thesaurus name applies to all succeeding transactions until either a new
THES_NAME parameter or END_REL directive is encountered.
Comments
You can include comments or other information that you want the system to ignore by
preceding the comment with <<<. The comment characters must begin on a new line; for
example:
WRONG:
LT = Canada
CORRECT:
LT = Canada
<<< This is a comment.
<<< This is a comment.
STATEMENT File Syntax  233
Line Continuation
Typically, you will specify one term transaction per line in the STATEMENT file.
However, for long thesaurus terms or INFORMATIVE messages, you may need to
continue a transaction to the next line. To continue a transaction, type a plus sign (+) at
the end of the line; for example:
LT = Kuwait
HN = A sheikdom before 1961 and under British protection.
Kuwait is an important oil producing nation.
+
The system will concatenate the data in the current line to the next line, replacing the +
with the first character of the next line.
Blank Lines and Spaces
The system will eliminate all leading and trailing blanks in a term or message. All
multiple internal blanks are squeezed to one.
If you want to include blank lines in a STATEMENT file for readability, use the comment
characters; for example:
BEGIN_LAYOUT
FORMAT=FREE
END_LAYOUT
<<<
THES_NAME=MD
<<<
ACTION_CODE=EDIT;
OLD_RELATION: LT=ANIMAL; NT=DOG
NEW_RELATION: LT=ANIMAL; NT=CANINE
FREE Format
The FREE format lets you use a “keyword=” type of approach to specify thesaurus term
transactions. With FREE format you don’t have to enter term data in any particular
columns. A typical term transaction consists of a relation-type, the term and optional
qualifying parameters; for example:
LT= Electron beam deflection tubes/TERM_TAG=MBT
Syntax
A FREE format definition is made up of these required and optional (shown in braces)
parameters:
234  STATEMENT File Syntax
BEGIN_LAYOUT
FORMAT=FREE
{,DATA_TERM_SEParator=sep_char}
{,END_OF_DATA_STATement=stat_char}
{,DATA_QUAL_SEParator=sep_char}
END_LAYOUT
Following is a description of each parameter:
FORMAT - Defines the format you are using to enter term transactions. (TEMPLATE is
the default.)
END_OF_DATA_STATement - Defines a special character to mark the end of a
statement. Enclose the character you are defining in single quotes. Default is a semicolon
(;). For example, the percent sign (%) is defined as the end of statement marker:
END_OF_DATA_STAT='%'
LT=Canada % NT=Ontario % LT=Ontario % NT=Stratford
DATA_QUAL_SEParator - Defines a special character to separate a term from a term
qualifier (e.g., ANNOTATE). Enclose the character you are defining in single quotes.
Default is a slash (/). For example, the ampersand (&) is defined as the data qualifier
marker:
DATA_QUAL_SEP='&'
LT=Canada & ANNOTATE=*
DATA_TERM_SEParator - Defines a special character to separate terms typed on the
same line. Enclose the character you are defining in single quotes. Default is a comma
(,). For example, the pound sign (#) is defined as the data_term separator:
DATA_TERM_SEPARATOR='#'
NT = Ontario# Quebec# British Columbia# New Brunswick
Term Qualifiers
You can specify optional term information using these qualifier parameters:
ANNOTATE - Specifies an annotation character that you want the system to include
with a subordinate term in displays and prints of the thesaurus. Use only with non-lead
terms; for example:
LT=Kuwait
BT=Middle East/ANNOTATE=#
STATEMENT File Syntax  235
RECIP_ANNOTate - Specifies an annotation character that you want the system to
include with the reciprocal of a subordinate term in displays and prints of the thesaurus.
Use only with non-lead terms for example:
LT=TNT
BT=Explosives/RECIP_ANNOT=*
TERM_TAG - Specify any information that you want the system to carry along with the
lead term. For example, you might specify a facet (level) number or subject category:
LT=Europe/TERM_TAG=2
NT=France
LT=France/TERM_TAG=2.3
NT=Paris
Examples
1.
DATA_TERM_SEParator and END_OF_DATA_STATement in this layout
definition could have been omitted since they define the defaults.
BEGIN_LAYOUT
FORMAT=FREE
DATA_TERM_SEP=','
END_OF_DATA_STAT=';'
END_LAYOUT
<<<
<<<
STILL CAMERAS has four narrower terms added
<<<
and 3 related terms
<<<
BEGIN_REL THES_NAME=MD
LT = Still Cameras
NT = Instant picture cameras, Miniature cameras, +
Reflex cameras, View Cameras
RT = Cameras, Television Cameras, Television
END_REL
236  STATEMENT File Syntax
2.
The following action will delete a lead term plus all its subordinate terms. If any
singleton lead terms result, the system will preserve them.
BEGIN_LAYOUT
FORMAT=FREE
END_LAYOUT
BEGIN_REL
ACTION_CODE=DELETE_KEEP
LT=Land Cameras, Stereo Cameras, Underwater Cameras
END_REL
3.
These lead terms include term tag qualifiers:
BEGIN_LAYOUT
FORMAT=FREE
END_LAYOUT
BEGIN_REL
ACTION_CODE=A
THES_NAME=MD
LT= Electron beam deflection tubes/TERM_TAG=MBT
NT =Indicator tubes (tuning), Trochotrons
THES_NAME=TDK
LT= Tetrodes/TERM_TAG=MDA
NT= Dynatrons, Resnatrons
END_REL
4.
If a statement is very long, you can continue it to the next line using a plus sign (+).
LT= Television Camera Tubes; UF= Camera tubes (television), +
Emitrons, Iconoscopes,Image orthicons;RT= Phototubes
5.
If you want to edit only one relation (not all occurrences of the term), then specify the
lead term and the subordinate term.
BEGIN_LAYOUT
FORMAT=FREE
END_LAYOUT
<<<
<<<
The relation
ANIMAL NT DOG
is modified to
<<<
ANIMAL NT CANINE
<<<
The reciprocal modification includes
<<<
CANINE NT ANIMAL
<<<
Note: If DOG contains other relations, the system will
<<<
not change the lead term DOG and its other
<<<
relations.
<<<
THES_NAME=MD
ACTION_CODE=EDIT;
OLD_RELATION: LT=ANIMAL; NT=DOG
NEW_RELATION: LT=ANIMAL; NT=CANINE
6.
To limit changes to the text of a specific INFORMATIVE message, specify the lead
term to which the message belongs. Here the typo in the word “government” is
corrected.
STATEMENT File Syntax  237
BEGIN_LAYOUT
FORMAT=FREE
END_LAYOUT
ACTION_CODE=EDIT_INFORMATIVE; THES_NAME=MD
OLD_RELATION: LT=COMMUNES (China); SN=local governemnt
NEW_RELATION: LT=COMMUNES (China); SN=local government
Old Message
New Message
SN = Large-scale enterprise
which includes collectivized
agriculture, industry, social
services and local governemnt
functions.
SN = Large-scale enterprise
which includes collectivized
agriculture, industry, social
services and local government
functions.
TEMPLATE Format
The TEMPLATE Format lets you define the layout of thesaurus term transactions by
character position in the STATEMENT file.
Syntax
A TEMPLATE format definition is made up of these required parameters. You must
specify all of these parameters, even though you may not use them. (Default start and end
character positions are shown.)
BEGIN_LAYOUT
FORMAT=TEMPLATE
ACTION_CODE= (1:3)
REL_TYPE= (9:11)
TERM= (17:80)
THES_NAME=(13:15)
EDIT_SUB_ACTION= (5:7)
ANNOTATE= (82:84)
RECIP_ANNOT= (86:88)
TERM_TAG= (90:100)
END_LAYOUT
238  STATEMENT File Syntax
Following is a description of each parameter:
FORMAT - Indicates the type of format you are using. Default is TEMPLATE.
ACTION_CODE=(1:3) - Defines the location of the action code in a term
transaction.
REL_TYPE=(9:11) - Defines the location of the relation-type in a term transaction.
TERM=(17:80) - Defines the location of the term in a term transaction.
EDIT_SUB_ACTION=(5:7) - Defines the location of the OLD and NEW keywords
for an edit transaction.
THES_NAME=(13:15) - Defines the location of the thesaurus name.
ANNOTATE=(82:84) - Defines the location of an annotation character that you
want the system to include with a term in displays and prints of the thesaurus. Use
only with non-lead terms.
RECIP_ANNOT=(86:88) - Defines the location of an annotation character that you
want the system to include with the reciprocal of a term in prints and displays of the
thesaurus. Use only with non-lead terms.
TERM_TAG=(90:100) - Defines the location of any additional information that you
want the system to carry along with the lead term. You might specify a facet (level)
number or subject category, for example.
Examples
1.
Since the following example uses the default TEMPLATE format, the file need not
contain BEGIN_LAYOUT and END_LAYOUT directives. Also, the
THES_NAME directive saves keystrokes by indicating in one place the target
thesaurus for these terms. (The line of numbers is a comment to aid in positioning
the term data.)
STATEMENT File Syntax  239
<<<45678901234567890123456789012345678901234567890123456789012345678901
BEGIN_REL THES_NAME=MD
LT
View Cameras
UF
Land Cameras
BT
Still Cameras
SN
Cameras with through-the lens focusing and a range of +
movements of the lens plane relative to the film plane.
LT
Cine Cameras
NT
Underwater cine cameras
END_REL
2.
Because the action code and thesaurus are specified in the BEGIN_REL directive,
the user only has to specify the relation-type and term:
BEGIN_LAYOUT
FORMAT=TEMPLATE
ACTION_CODE=(1:3)
REL_TYPE=(5:8)
TERM=(10:80)
EDIT_SUB_ACTION=(84:86)
THES_NAME=(87:89)
ANNOTATE=(90:92)
RECIP_ANNOT=(93:95)
TERM_TAG=(96:100)
END_LAYOUT
<<<
BEGIN_REL THES_NAME=DE
<<<456789012345678901234567890
LT
Sodium Salts
NT
Sodium Chlorate
NT
Sodium Chloride
NT
Sodium Bicarbonate
NT
Sodium Phosphate
LT
Sodium Chloride
UF
Salt
3.
The action code EDIT will change all the occurrences of a misspelled word to the
corrected form. Similar to a global search and replace.
<<<45678901234567890123456789012345678901234567890123
EDIT OLD LT MD DOG
EDIT NEW LT MD CANINE
This transaction will change all occurrences of the term Dog to Canine.
240  STATEMENT File Syntax
Original
Thesaurus
Modified Thesaurus
after change from DOG to Canine
Animal
Animal
NT DOG
Boxer
BT DOG
Collie
BT DOG
DOG
BT Animal
NT Collie
NT Boxer
4.
NT Canine
Boxer
BT Canine
Canine
BT Animal
NT Boxer
NT Collie
Collie
BT Canine
You can also use the EDIT action to edit optional information for a thesaurus term.
These transactions edit an annotation symbol or classification number.
<<<4567890123456789012345678901234567890...12345678901234567890123456789
EDIT OLD LT MD DOG
EDIT OLD NT MD COLLIE
EDIT NEW LT MD DOG
*
> C1.252.40.552.846
EDIT NEW NT MD COLLIE
STATEMENT File Syntax  241
Indentation Method of Entering Hierarchies
It’s often easier to enter hierarchies by simply typing the terms themselves, one per line,
and indenting to indicate relations. This section describes the syntax you need for the
indentation method.
Syntax
You can use the indentation method with TEMPLATE or FREE format definitions. Use
these directives and parameters to indicate you are using the indentation method:
BEGIN_HIER
INDENT_LEVEL=integer
,REL_TYPE=rel_type
,THES_NAME=thesnam
END_HIER
Following is a description of each parameter:
INDENT_LEVEL - Indicates the number of spaces you want to indent a subordinate
relation. If the system finds more blanks than you specified with the
INDENT_LEVEL parameter, it will consider it an error and the tree will be rejected.
(Required)
REL_TYPE - Indicates the GENERAL or SPECIFIC relation-type you want to use
in the hierarchy. If you specify this parameter with the BEGIN_HIER directive, the
system will assume this relation-type for any terms without a relation-type.
(Optional)
THES_NAME - Indicates the name of the thesaurus to load the terms. (Required)
Using the Indentation Method
The idea behind the BEGIN_HIER, END_HIER directives is that the information within
the directives contains one and only one entire hierarchical tree (or a sub tree). Keep
these points in mind when using the indentation method:

TM will add the new tree if it does not already exist, or it will replace the tree that
already exists with the terms you specify. If you only want to add only a few
relations to a hierarchy, use the conventional input methods.

Don’t use the indentation method for editing or deleting terms in a hierarchy. To
change or delete terms in a hierarchy, use the BROWSE action with Update intent.
242  STATEMENT File Syntax

You can use other relation-types within a hierarchy provided they are not defined as
GENERAL or SPECIFIC. You must indent a non-hierarchical relation under the
term it refers to; for example:
Animals
Mammals
Dogs
Poodle
Great Dane
SN
Any breed of tall, massive powerful +
smooth-coated dogs.
Cats
Examples
1.
Since the user specified REL_TYPE=NT, the system will assume this relation-type
for all the terms within the hierarchy directive. The user also indicated each level by
indenting three spaces.
<<<45678901234567890123456789012345678901234567890123
BEGIN_LAYOUT
FORMAT=FREE
END_LAYOUT
BEGIN_HIER THES_NAME=MD, REL_TYPE=NT, INDENT_LEVEL=3
Animals
Mammals
Dogs
Poodle
Great Dane
Cats
Fish
Sharks
Hammer Head
Great White
Birds
Humming Bird
Turkey
END_HIER
STATEMENT File Syntax  243
2.
You can use other relation-types within a hierarchy provided they are not defined as
GENERAL or SPECIFIC.
BEGIN_LAYOUT
FORMAT=TEMPLATE
ACTION_CODE=(1:3)
REL_TYPE=(5:8)
TERM=(9:80)
EDIT_SUB_ACTION=(84:86)
THES_NAME=(87:89)
ANNOTATE=(90:92)
RECIP_ANNOT=(93:95)
TERM_TAG=(96:100)
END_LAYOUT
<<<
BEGIN_HIER THES_NAME=MD, REL_TYPE=NT, INDENT_LEVEL=3
<<<456789012345678901234567890123456789012345678901234567890
TERM A
SN
THIS CONTAINS TEXTUAL INFORMATION REGARDING THE +
SN
USE OF TERM A - IT HAS IMPORTANT MEANING
TERM B
RT
TERM B.A
RT
TERM B.B
TERM C
TERM D
TERM E
SN
THIS CONTAINS INFORMATION REGARDING TERM E
TERM F
TERM G
END_HIER
3.
You can include facet (classification) numbers with the indentation method:
BEGIN_LAYOUT
FORMAT=TEMPLATE
ACTION_CODE=(1:3)
TERM_TAG=(5:11)
TERM=(13:80)
EDIT_SUB_ACTION=(84:86)
THES_NAME=(87:89)
ANNOTATE=(90:92)
RECIP_ANNOT=(93:95)
REL_TYPE=(96:100)
END_LAYOUT
<<<
244  STATEMENT File Syntax
BEGIN_HIER INDENT_LEVEL=2,THES_NAME=MD, REL_TYPE=NT
<<<456789012345678901234567890123456789012345678901234567890
1
TERM A
1.1
TERM B
1.1.1
TERM C1
1.1.1.1
TERM Z
1.1.1.2
TERM Y
1.1.1.3
TERM X
1.1.2
TERM D
1.2
TERM E
1.2.1
TERM C
1.2.1.1
TERM Z
1.2.1.2
TERM Y
1.2.1.3
TERM X
1.2.2
TERM F
1.2.3
TERM G
END_HIER
STATEMENT File Syntax  245
4.
This example further demonstrates non-hierarchical relation-types included in an
hierarchy. The position of the non-hierarchical relation-types, RT and SN, determine
to which lead term they relate.
BEGIN_LAYOUT
ACTION_CODE=(1:3)
REL_TYPE=(5:8)
TERM_TAG=(10:14)
TERM=(17:80)
EDIT_SUB_ACTION=(84:86)
THES_NAME=(87:89)
ANNOTATE=(90:92)
RECIP_ANNOT=(93:95)
END_LAYOUT
BEGIN_HIER INDENT_LEVEL=2,THES_NAME=MD
<<<4567890123456789012345678901234567890123456789012345
67890
TERM A
1
RT
THIS CONTAINS TEXTUAL INFORMATION +
SN
REGARDING THE+
USE OF TERM A - IT HAS GREAT +
SN
MEANING WERE BLAH
TERM B
1.1
NT
TERM B.A
RT
TERM B.B
RT
TERM C
1.1.1
NT
TERM D
1.1.2
NT
TERM E
1.2
NT
THIS CONTAINS INFORMATION +
SN
REGARDING TERM E
TERM F
1.2.1
NT
TERM G
1.2.2
NT
END_HIER
246  STATEMENT File Syntax
Download