input.doc

advertisement
HANZI INPUT in CXTERM Version 5.0
=================================
(C) 1994,1995 by YONGGUANG ZHANG
CXTERM provides an off-the-spot pre-editing style hanzi input.
When a user enters a hanzi, he/she types in a keystroke sequence in
the cxterm window. The keystroke sequence is a representation of the
hanzi according to the input method of the user's choice. Cxterm does
input conversion while processing the user's key strokes. When the
user types a key, cxterm will echo it in a dedicated input area below
the terminal window. At the same time, cxterm also translates what a
user types so far into a set of possible hanzi candidates. The list
of candidates is also shown in the input area with a label (usually a
number) preceding each candidate. By typing the number key, the user
picks one candidate to send it into the terminal and finish a hanzi
input. The cxterm will clear the input area and the keystroke buffer
to prepare for the next hanzi input.
1.
INPUT METHOD
Cxterm provides many popular hanzi input methods. The descriptions
of an input method (the mapping between keystroke sequences and hanzi)
along with a set of key bindings are stored in an external file and
loaded into cxterm in run time. Users can use their own input method
by writing input method specification file and load it into cxterm.
Many cxterm input features are "input method independent", which means
the user can choose whatever input method and get the same interface.
2.
SWITCHING INPUT METHODS
When cxterm starts up, it is usually in the "ASCII" input mode, i.e.,
what users type are all English. To input Chinese, cxterm must be
switched to one of the Chinese modes. Different Chinese modes are
different in its input method. Once in a Chinese mode, some keys
are interpreted as Chinese input keystrokes as explained above.
Users can freely switch from one input method to another (or from ASCII
or to ASCII) at any time after cxterm starts up. The input feature in
this text are applicable to any external input methods. Users can
switch an input method by one of the following three ways: 1) press a
predefined function key, 2) use the popup input configuration panel,
3) run the utility program "hzimctrl", or 4) write the corresponding
cxterm terminal escape sequence to the terminal (from the application
program).
The specification of the predefined function key for switching input
method is included in the "Translation" table in the user's X resources
for cxterm. Typical files to store the X resources are $HOME/.Xdefaults,
$HOME/.Xdefaults-hostname, or $HOME/.Xresources. However when the
script "CXterm" is used, the resource file is called "CXterm.ad" and
resides in the destination directory for cxterm input dictionary
(chosen by the users during cxterm installation).
The predefined function keys for switching input methods are usually
The <Fn> keys or <Shift>+<Fn> keys. Usually <F1> temporary disable
Chinese input function (<F1> again to resume) and <Shift>+<F1>
switches cxterm back to "ASCII" mode.
The input methods can also be changed by the configuration panel. To
pop up the panel, press (<Control> + Middle-Mouse-Button) then choose
the last item "HANZI configuration". (See Section 17.)
-------------------------------------------------------------------Ex 1. The default function keys used in CXterm (in GB encoding)
include:
<F4> TONEPY (pin-yin input with tone 1-5)
<F6> PY
(pin-yin input without tone)
<F7> WuBi (wu-bi-zi-xing (5-stroke) input)
<F8> English
(input Chinese words by English words)
After pressing <F4>, you will get
________________________________________________________
~{汉字输入∷带调拼音∷~}
After you type 'z' 'h' 'o' 'n' 'g' '1', you will get
________________________________________________________
~{汉字输入∷带调拼音∷~} zhong1
1.~{中~} 2.~{终~} 3.~{钟~} 4.~{忠~} 5~{.衷~} 6.~{锺~} >
Type a number, say '4', to choose the fourth Chinese character in
the list -- ~{忠~}.
-------------------------------------------------------------------3.
CHOICE LIST TRAVERSAL
If the current keystroke sequence yields more candidates than that can
be displayed in the input area, only a segment of the whole candidate
list is shown at a time. Conceptually it can be considered as a
"viewport" that reveals part of the list. The size of the viewport is
usually ten (10) candidates, which can be changed in the input method
specification file. The size of the viewport is also affected by
whether the viewport can fit into the input area. Initially the
viewport is at the beginning of the list. A '>' sign at the right
edge of the viewport indicates more choices to the right. A move-right
key (usually '>' or '.') will move the viewport to the right and reveal
the next ten choices. A '<' sign at the left edge of the viewport will
suggest more choices at the right. A move-left key (usually '<' or ',')
will move the viewport backward and reveal the previous ten choices.
The user can repeatly type the move-left or move-right key to traverse
the choice list in either direction.
4.
PHRASE INPUT
Cxterm provides two kinds of phrase input. First, cxterm supports the
keystroke sequence to hanzi phrase mapping. The cxterm input engine
does not distinguish single hanzi and phrase hanzi. In a phrase-based
input method, cxterm converts and displays phrases in the candidate
list; one can pick a phrase as a whole.
-------------------------------------------------------------------Ex 2.
After 't' 'a' 'b' 'l' 'e' keys are entered under the
"English" input method:
________________________________________________________
~{汉字输入∷英汉∷~} table
1.~{桌子~} 2.~{表~}
Choosing '1' will input ~{桌子~} -- two characters in a roll.
-------------------------------------------------------------------The second type of phrase input is composition by an association key
(usually the key '-'). If a keystroke sequence contains two or more
segments connected by '-', cxterm will match it against a predefined
list of phrases (called glossary, or the association list). The
candidate list only contains those phrases in which the first hanzi
matches the first segment in the keystroke sequence, and the second
hanzi matches the second segment, and so on.
-------------------------------------------------------------------Ex 3. In pinyin input, "jin4" matches all the following characters
~{仅进晋禁近浸尽劲~}
and "shi4" matches
~{式示士世柿事拭誓逝势是适仕侍释饰氏市恃室视试似~}
plus other infrequently-used ones.
The input "jin4-shi4" will matches any two-character word whose
first character is from the first list and the second character
from the second list: ~{近世~}, ~{进士~}, ~{尽是~}, ~{近视~}, etc.
Here is how the screen looks like when you input "jin4-shi4":
________________________________________________________
~{汉字输入∷带调拼音∷~} jin4-shi4
1.~{近似~} 2.~{近似值~} 3.~{近世~} 4.~{尽是~} 5~{.进士~} >
-------------------------------------------------------------------Although cxterm supports both kinds of phrase input, we highly recommend
the second one over the first, since the phrase composition based on a
glossary of phrases is "input method independent", i.e., a phrase in
the glossary is available regardless of the input method chosen.
5.
PREFIX INPUT
Hanzi input in cxterm requires only minimal keystroke. If a keystroke
sequence matches a hanzi, all its non-empty prefixes also match the
hanzi. The user can make a choice without finishing typing the whole
key sequence, although it is a common practice to type more keys to
reduce the number of candidates that one has to choose from.
-------------------------------------------------------------------Ex 4. You can input ~{中~} by only three keys 'z' 'h' 'o':
________________________________________________________
~{汉字输入∷带调拼音∷~} zho
1.~{中~} 2.~{终~} 3.~{钟~} 4.~{忠~} 5~{.衷~} 6.~{锺~} >
Then press key '1'.
--------------------------------------------------------------------------------------------------------------------------------------Ex 5. The prefix input is very useful in phrase input. For example,
"z-g" in pinyin input method yields all the phrases whose first
hanzi matches a pinyin beginning with 'z' and the second hanzi
matches a pinyin beginning with 'g':
________________________________________________________
~{汉字输入∷带调拼音∷~} z-g
1.~{这个~} 2.~{中国~} 3.~{整个~} 4.~{职工~} 5~{.最高~} >
-------------------------------------------------------------------6.
WILDCARD INPUT
Instead of typing the exact keystroke sequence for a hanzi, a user can
use the wildcard characters (usually '*' and '?'). The expansion is
similar to the "glob-style" file-name matching in many Unix shells.
A wildchar ('?') can substitute any single key, while a wildcard
(usually ``*'') can substitute any number (including zero) of keys.
It is to assist hanzi input when the user forgets the keystroke
sequence.
-------------------------------------------------------------------Ex 6. If you want to input the character ~{众~}, but forget whether
it is "zhong4" or "zong4", you may try "z*ong4":
________________________________________________________
~{汉字输入∷带调拼音∷~} z*ong4
1.~{中~} 2.~{种~} 3.~{重~} 4.~{众~} 5.~{仲~} 6.~{纵~} >
-------------------------------------------------------------------WARNING: unless you have a top-of-the-line computer, you should refrain
from using wildcards in the association phrase input, since it takes a
long time to exhaust the whole search space.
7.
AUTO SELECTION
Automatic selection refers to the automatic input of a candidate when
it is the only candidate matching the current input keystroke sequence.
Cxterm supports three kinds of automatic selection, "NEVER", "ALWAYS",
and "WHENNOMATCH". If it is set to "NEVER", there will be no automatic
selection; a user has to press the selection key to pick the candidate.
If the mode is "ALWAYS", cxterm will automatically pick the candidate,
if it is the only one, and sent it to the terminal. If the mode is
"WHENNOMATCH", automatic selection is done only when after the user
types another key and the new sequence on longer matches any hanzi.
The new input key will start a keystroke sequence for the next hanzi.
-------------------------------------------------------------------Ex 7. The mode is set to "WHENNOMATCH", and when "e3" is input,
There is only one choice:
________________________________________________________
~{汉字输入∷带调拼音∷~} e3
1.~{恶~}
Without doing any selection, the user immediately types 'x'.
Since there is no such input string as "e3x", cxterm picks and
inputs ~{恶~} and starts a new round of input for the unmatched
"x":
________________________________________________________
~{汉字输入∷带调拼音∷~} x
1.~{西~} 2.~{息~} 3.~{希~} 4.~{吸~} 5.~{惜~} 6.~{稀~} >
-------------------------------------------------------------------8.
AUTO SEGMENTATION AND CONTINUOUS INPUT
Continuous input in cxterm is based on automatic segmentation of input
keystrokes. If the current keystroke sequence matches some hanzi but
it no longer does when the user types a new input key, cxterm will
insert an association key (usually '-') before the new input key and
try to find the matches from the glossary list.
-------------------------------------------------------------------Ex 8. In pinyin input method, cxterm will turn "jisuanji" into
"ji-suan-ji". Here is how. After "ji" is typed, cxterm shows
________________________________________________________
~{汉字输入∷带调拼音∷~} ji
1.~{其~} 2.~{几~} 3.~{期~} 4.~{机~} 5.~{基~} 6.~{击~} >
When the user types the next 's', cxterm first tries the new sequence
"jis" and it matches nothing (since "jis" is not a valid pinyin).
It then tries to break the sequence by inserting a '-' before 's':
________________________________________________________
~{汉字输入∷带调拼音∷~} ji-s
1.~{就是~} 2.~{技术~} 3.~{建设~} 4.~{计算~} 5.~{精神~} >
Similarly, when the second 'j' is typed, cxterm breaks "suanj" by
another '-':
________________________________________________________
~{汉字输入∷带调拼音∷~} ji-suan
1.~{计算~} 2.~{计算机~} 3.~{就算~} 4.~{结算~} 5.~{强酸~} >
________________________________________________________
~{汉字输入∷带调拼音∷~} ji-suan-j
1.~{计算机~}
-------------------------------------------------------------------Automatic segmentation is based on the longest match of input keys
and it is input method independent. It can be used with prefix input.
-------------------------------------------------------------------Ex 9. "xyz" will be segmented into "x-y-z":
________________________________________________________
~{汉字输入∷带调拼音∷~} x-y-z
1.~{下意识~} 2.~{小业主~} 3.~{选言肢~} 4.~{信用证~}
-------------------------------------------------------------------Combining this auto segmentation with the "WHENNOMATCH" mode of autoselection, it is possible to achieve some degree of continuous input:
just keep typing input keys and hopefully cxterm will make the selection.
9.
ASSOCIATION INPUT
When the input of a hanzi is completed (by user selection or auto
selection), cxterm uses the glossary list to guess the possible
subsequent hanzi. If the word that the user intends to input is in
the list, he/she can simply pick it by a selection key, instead of
typing another keystroke sequence.
-------------------------------------------------------------------Ex 10. After "zhong" is input and ~{中~} is selected, a new choice
list will be presented:
________________________________________________________
~{汉字输入∷带调拼音∷~}
~{联想~}
1.~{国~} 2.~{心~} 3.~{央~} 4.~{间~} 5.~{华~} 6.~{学~} >
-------------------------------------------------------------------10.
ASSOCIATION LIST
Both phrase input by composition and (after selection) association
input require a predefined list of phrases. The list is stored as
an external file and it is loaded each time cxterm starts up. A user
can change the file to add new phrases and drop existing ones. The
list should be kept in frequency order, i.e., the more frequently used
words should be put earlier in the file. This is because both phrase
input and association will search for and display the phrase
candidates by the order of the list.
The file name for the associate list is also defined in the X resource
(See section 2). The default path as defined in the "CXterm.ad" is
the "dict/gb/simple.lx" for GB encoding.
11.
KEYSTROKE BUFFER EDITING
Cxterm provides some basic line editing for the input keystroke buffer.
The commands are in emacs-style, although the key binding can be
changed in the input method's specification file. The keys include:
^H, Del
delete the previous input key
^F
move cursor forward one key
^B
move cursor backward one key
^A
move cursor to start of keystroke buffer
^E
move cursor to end of keystroke buffer
^D
delete the input key at the cursor position
^U
delete all keys and clear the keystroke buffer
^P
fetch the keystrokes from previous input
After each editing action, cxterm will redo the conversion from the
current keystroke buffer and display the updated candidate list.
12.
MOUSE ACTION
Besides key actions, clicking the left mouse-button when the cursor
is inside the input area will trigger some input actions. A click at
the candidate or its label in the input area will input that choice.
A click at the '<' or '>' sign will move the display viewport to the
left or right of the choice list. A click at the input keystroke
buffer will place the cursor at the mouse position.
13.
TEMPORARILY DISABLE CHINESE INPUT
Very often when you are typing Chinese, you need to switch back to
ASCII once in a while. For example, you need to type some English
words in a Chinese text, or need to perform some editing task in
your editor. You may press Shift-<F1> to switch the input method
back to "ASCII". However, a better way is to temporarily disable
Chinese input conversion and save the current input context, then
later resume input from the previous save point. This is done by
pressing <F1>, which by default is bound to cxterm action
"set-HZ-parameter(input-conv=toggle)".
The action "set-HZ-parameter(input-conv=toggle)" alternatively
executes "set-HZ-parameter(input-conv=disable)" to freeze the input
and a third "set-HZ-parameter(input-conv=enable)" to unfreeze it.
The first time you press <F1>, the input area will have an foggy
look and become "insensitive". You can see the key buffer and the
choice list vaguely, but whatever you type is interpreted as ASCII
only. The next time you press <F1> the input area will become clear
and "sensitive" again, continuing the input from wherever you
stopped.
14.
INPUT DIRECTORIES
Input directories are where you put the external input methods. The
name of an input method is the same as the file name (minus the
suffix). Input methods included in this package have suffix ".tit"
in the filenames, which is textual input table format suitable for
human understanding and maintenance. The cxterm installation script
automatically compile these input methods into files with suffix
".cit", which is compiled input table format loadable by cxterm.
Cxterm will search the input directories for an input method. One
may specify the paths by command line options or environment variable
"HZINPUTDIR" before cxterm starts up (see cxterm(1)), or by control
program "hzimctrl", cxterm configuration panel after cxterm starts up.
More than one directories can be specified using ':' as delimiters.
The file that stores the association list must also reside in one
of the input directories.
15.
CUSTOMIZATION
Hanzi input in cxterm is built around a generic input engine. Most
input processing features are customizable:
Input processing:
a) encoding and fonts
b) key binding to invoke input methods (also see Section 2)
c) path to search for the input methods (also see Section 14)
d) association list (also see Section 10)
For each input method:
e) input conversion table (keys sequence to hanzi)
f) title of the input method (shown as the prompt)
g) the set of valid input keys
h) special keys -- wildcard keys, association keys, selection keys,
choice list traversal keys, input buffer pre-editing keys, etc.
i) default auto-selection mode (also see Section 7)
j) key prompts (displayed labels for each input key)
See manual page cxterm(1) on how to specify input processing values
(a-d). See manual page tit2cit(1) on how to change the values (e-j)
for each input method. Any change to the input method requires a
recompilation of the input method by tit2cit.
-------------------------------------------------------------------Ex 11. The input method WuBi provided in this package does not do
auto-selection; one has to press a selection key explicitly. It
can be turned on after the input method is loaded into cxterm
(see Section 17). Alternatively, you can change the default in
its method specification file: dict/gb/WuBi.tit (under cxterm
source).
The first few lines of the file used to look like:
# ....
# ....
ENCODE: GB
PROMPT: ....
AUTOSELECT: NEVER
....
Edit this file and change the auto-selection value from "NEVER"
to "ALWAYS":
# ....
# ....
ENCODE: GB
PROMPT: ....
AUTOSELECT: ALWAYS
....
Then recompile WuBi.cit:
tit2cit dict/gb/WuBi.tit > dict/gb/WuBi.cit
And if you store those .cit files elsewhere, copy WuBi.cit there.
-------------------------------------------------------------------16.
RUN-TIME CONFIGURATION
The following parameters can be changed after cxterm starts up.
Path to search for the input methods (Section 14)
Auto-selection mode (Section 7)
Auto-segmentation mode (Section 8)
Association mode (Section 9)
They can be changed through control program "hzimctrl" or the cxterm
configuration panel. Not all modes are available for all input methods.
For example, if an input method explicitly specifies no association
key, the auto-selection mode can only be "NEVER".
17.
POPUP CONFIGURATION PANEL
A popup panel for cxterm input configuration can be invoked by popping
up the cxterm main menu (<Control> + Middle-Mouse-Button) and choosing
the last item. It can be also popped by pressing a predefined key
(usually F3, or as specified in the user "Translation" table, see
Section 2).
The top part of the panel is for changing HZINPUTDIR. HZINPUTDIR is a
variable that contains a list of directories that cxterm searches for
the file that holds a given input method. It is properly predefined
to refer the input methods bundled in the cxterm package. This
variable can be changed during any time so that input methods stored
elsewhere can be loaded. When a change has been made, press "Confirm"
button to accept the new value.
The second part of the panel is for switching input method. Users can
type in the input method name in text window, or pick one from the
list of all input methods found in the current input directories (as
HZINPUTDIR).
Press "Confirm" button to trigger the switch.
The third part is for changing input parameters, including autoselection, auto-segmentation, and association described above. To
change a parameter, press the corresponding value button to see the
menu of all choices and choose one. Press "Confirm" button to do
the actual changes. If a value gets automatically changed back, the
selected value is not supported. (For example, if the association
list is not present during cxterm starts up, the "Association" value
will always be "No".)
Download