tlex.doc tlex - lexical analyzer for ATK text

advertisement
tlex.doc
tlex - lexical analyzer for ATK text
gentlex - create tables for tlex
ATK's 'parse' object requires as its stream of tokens an object which is
a
subclass of the 'lexan' class and thus provides a NextToken method.
'tlex' is one such subclass of lexan; its input stream is
an ATK text object. Tables for creating instantiations of the tlex
object
are generated by gentlex (pronounced gen-t-lex). Unlike the Unix 'lex'
package, gentlex/tlex does not
implement arbitrary regular expressions. However, because it is designed
specifically for tokenizing streams for parser input, tlex does
considerably more than 'lex' and even supports the most general
recognition scheme: C code.
The essence of the tlex approach is to determine what sort of token is
coming by assembling characters until they constitute a prefix for a
recognizable token. Then a recognizer function is called to determine
the
extent of the token. Builtin recognizers are provided for
the sorts of tokens required for popular modern programming languages.
______________________________
Using tlex
For each tlex application, say 'sample', the programmer writes a .tlx
file
describing the tokens to be found. This 'sample.tlx' file is then
processed by gentlex to create a
sample.tlc file. This file is then included in the application:
#include <sample.tlc>
Later in the application, a tlex object is declared and created as in
struct tlex *tl;
. . .
tl = tlex_Create(&sample_tlex_tables, self,
text, 0, text_GetLength(text));
where sample_tlex_tables is declared within sample.tlc, self is passed as
a
rock that is later available to recognizers, and text is an ATK text
object
containing the text to be parsed. The last two parameters can delimit
the
parse to a subsequence of the text. After this preparation, the tlex
object, tl, can be passed as the lexan argument to parse_Create. See the
appendix for a complete program example.
Gentlex takes two inputs--a file containing token class descriptions and
another file containing declarations for YYNTOKENS and yytname as
produced
by running bison with the -k switch. The output file is typically named
with the
same prefix as the first input file and the extension .tlc:
gentlex sample.tlx sample.tab.c
produces file sample.tlc.
-p -prefix
The declarations output by gentlex are static variables whose name
begins with a prefix value followed by an underscore, as in
sample_tlex_tables. If the first input file has the extension .tlx, the
prefix will be the filename prior to the extension. A different prefix
value may be specified by giving the -p switch on the command line. The
output file is always named with the prefix value and the extension .tlc.
For example
gentlex -p sample a b
will read the .tlx information from file a and the .tab.c information
from
file b. The output will be generated to a file sample.tlc and the
variables declared in that file will begin with `sample_'. If the -p
switch
is given the file names need not be specified; they will be geneerated
from the prefix value and the extensions .tlx and .tab.c. If the .tlx
file is named,
the .tab.c file need not be as long as its name begins with the same stem
name as the .tlx file.
-l
The .tlc output file will ordinarily contain #line lines to relate
compile
error messages back to the .tlx file. The -l switch will eliminate these
#line lines.
______________________________
Overview of the structure of .tlx Files
The purpose of tlex is to examine the input stream and find tokens,
reporting each to the parser by its token class number. Gentlex
determines the token class numbers from the yytname list
in the .tab.c file, which will have been generated by bison with the -n
switch
(and some other switch combinations).
A .tlx file is a sequence of tokenclass blocks, each responsible for
describing how tlex can identify the tokens of one class (or a group of
classes). The syntax is line oriented: the various allowed operations
each
occupy a single line. Comments begin with two dashes (--) and extend to
the end of the line.
Here is a typical tokenclass block containing a description of identifier
tokens--ones that start with an alphabetic and continue with an
alphabetic,
an underline, or a digit.
tokenclass setID
set
[a-zA-Z]
recognizer ScanID
charset
continueset [a-zA-Z_0-9]
The tokenclass line says that this block describes tokens the parser is
to
think of as satisfying the class setID (a name used in the grammar).
The set line says that this recognizer is triggered by any alphabetic
character. The recognizer line says to use the builtin recognizer called
ScanID. One of the parameters to ScanID is a charset value called
continueset. It is declared here to override the default value; the
declaration says that an identifier continues with an alphabetic, a
digit,
or an underline.
Each tokenclass block begins with a header line containing 'tokenclass'
and
a representation of the class--either a name, or a quoted string.
Following the header are four types of lines, each of which is described
in
more detail later.
'set' or 'seq' - these lines describe the prefix of tokens
that satisfy the tokenclass. seq specifies an initial
sequence of characters, while set lists a set of characters.
'recognizer' - this line names a builtin recognizer which will be
called to determine the remainder of the token.
struct element declaration - declares the type, name, and initial
value for an element of a struct. For some builtin
recognizers, fields of this struct provide information
that more precisely control the recognizer. The struct is
also passed to the function created from the following
body.
function body - If a recognizer is named, the function created from
this body is called after the recognizer has found the
token. Otherwise, the function is called as soon as the
prefix is recognized and the function finds the remainder
of the token itself.
Suppose that identifiers beginning with x_ are to be treated as tokens of
class setHEXCONSTANT. The above description could be augmented to
describe
this as follows:
tokenclass setID
set
[a-zA-Z]
recognizer ScanID
charset
continueset [a-zA-Z_0-9]
tokennumber hextok
setHEXCONSTANT
{
char *tok = tlex_GetTokenText(tlex);
if (*tok == 'x' && *(tok+1) == '_')
tlex_SetTokenNumber(tlex, parm->hextok);
return tlex_ACCEPT;
}
As earlier, this tokenclass block describes by default tokens of the
class setID.
They begin with a letter and continue as determined by the ScanID
recognizer.
As its last step, ScanID will call the function whose body is
the bracketed lines. Two parameters will be passed to this function:
tlex--a pointer to the current tlex object, and
parm--a pointer to the struct constructed from the struct
element declarations earlier in the block.
The code in the function gets the token text from the tlex and checks for
an initial 'x_'. If it is found, the token number
in the tlex is changed to the token class for setHEXCONSTANT, utilizing
the
hextok value installed in the struct by the earlier struct element
declaration
for hextok.
Special recognition of 'x_' would have been easier, however, by writing
two
tokenclass rules:
tokenclass setID
set
[a-zA-Z]
recognizer ScanID
charset
continueset [a-zA-Z_0-9]
tokenclass setHEXCONSTANT
seq
"x_"
recognizer ScanID
charset
continueset [a-zA-Z_0-9]
The setID class would recognize all tokens, even those beginning with x
but
followed by a character other than underline. The setHEXCONSTANT class
would recognize tokens beginning with x_. In practice, of course, the
continueset for hex constants might be [0-9a-f] instead of the value
given
and a function body might be provided to compute the actual hexadecimal
value.
______________________________
Tokenclass lines: Details
The value following the 'tokenclass' keyword is one of the token
identifer
values used in the bison description of the grammar. Typical examples
are
ELSE
setID
tokNULL
'+'
"<="
Token names beginning with "set" are assumed to describe classes and are
not treated as reserved words. Token names beginning with "tok" are
assumed to be reserved words consisting of the characters following the
initial three. Multicharacter tokens delimited with double quotes may
not be acceptable in all versions of Bison.
It is possible, but not necessary, to write tokenclass blocks for quoted
characters and strings like '+' and "<=". Gentlex automatically
generates tokenclass blocks for these sorts of tokens.
A recognizer for whitespace is also generated automatically and suffices
if
the desired whitespace set is the set of characters satisfying isspace().
To override this automatic set, include a block for tokenclass -none- and
specify for it the recognizer ScanWhitespace, as in
tokenclass -noneset [ \n\t\f\v\r]
recognizer ScanWhitespace
charset continueset [ \n\t\f\v\r]
For the `action' type declaration described below, a disambiguating
letter may
be appended in parentheses after a tokenclass representation or the
special tokenclass -none-; for example:
tokenclass setNUMBER (b)
There are several reserved token class names: -none-, -global-, reservedwords-, and -errorhandler-, as described in the followinf.
tokenclass -noneThe tokenclass -none- is used for whitespace and comments. It is assumed
that the function body, if any, in the tokenclass block returns
tlex_IGNORE,
as is done by ScanWhite and ScanComment. If it instead
returns tlex_ACCEPT, it should have reset the token number, because
the default token number established by -none- terminates the input
stream.
tokenclass -global-
The block for this tokenclass has no set, seq, or recognizer. Its sole
function is
to generate and initialize a struct, called PREFIX_global, where the
PREFIX value is the stem of the .tlx file name or the value of the -p
switch.
Fields of the PREFIX_global struct can be accessed from the C code
fragment associated with any of the tokenclasses.
This can be used to create a single charset, tokennumber, or action value
that can be referenced from multiple function bodies. If C code is
specified for the -global- block, it is executed when tlex_Create is
called
for the PREFIX_tlex_tables value created by this file; thus it can
initialize
further any variables in the global struct.
tokenclass -reservedwordsThe default treatment of reserved words in gentlex is to ignore them.
The
id recognizer is expected to identify them by looking up the identifier
in
the symbol table. (Entries are put in the table with
parse_EnumerateReservedWords; see parse.doc.)
However, the reserved
words
can be recognized directly by specifying the tokenclass name
'reservedwords'. When a reserved word is recognized, the function body
in
the tokenclass block is invoked.
tokenclass -errorhandlerTlex has a method tlex_Error which can be called by recognizers to
indicate
various problems, such as an 8 in an octal value. The default action of
tlex_Error is to print the error message which is the argument; however,
a
.tlx file may specify a different action by providing an errorhandler
block.
This block must not include a recognizer line, but must
include a
function body.
The function generated from the body is invoked for any
error.
The function body can access the proposed error message as
parm->msg.
______________________________
Set or seq lines, details
The set or seq line must appear for most token classes. It determines
when
this token class recognizer is initiated. The argument on a seq line is
a
double-quote delimited string:
seq "--"
The token class is activated whenever that sequence is found at the
beginning of a token. The argument on a set line is a sequence of
characters within square brackets; e.g. [ \t\n\r]. Any character in the
set will initiate the given tokenclass when it appears as the first
character in a token. Backslash, dash, and right square bracket may
appear
in the sequence if preceded with backslash as an escape character. A
consecutive subset of the collating sequence can be included by writing
the
two characters at the ends of the sequence separated with a dash:
[a-zA-Z_#$%0-9] would be the set of all alphabetic characters, the
digits,
and underline, hash, percent, and dollar sign.
______________________________
Recognizer lines, details
The operand following the keyword 'recognizer' must be the name of one of
the builtin recognizers. Each recognizer takes one or more operands
which
further describe the tokens the recognizer will accept. Most recognizers
return TRUE to the central token recognizer to indicate that they have
found a token.
ScanWhitespace, ScanComment, and ScanError normally
return
FALSE to indicate that further scanning is required to actually find a
token.
The individual builtin recognizers are describing in the following
paragraphs.
ScanNumber
The first character of the token must be a dot, a digit, or a single
quote
character. Subsequent characters are scanned as long as they correspond
to
a C numeric constant:
decimal integer - sequence of [0-9]
octal integer - 0 followed by sequence of [0-7]
hexadecimal integer - 0x followed by sequence of [0-9a-fA-F]
quoted character - two single quotes surrounding a character
or an escape sequence
real value - an appropriate sequence of characters from [0-9\-+.eE]
boolean IsInt
int
intval
double
realval
ScanID
The continueset parameter may be specified to indicate what characters
are
allowed in an identifier after the first.
charset continueset
ScanString
Parameters indicate the terminating character to the string, an escape
character, and an illegal character. For C strings these would be ", \,
and newline, respectively, and these are the default values.
char *endseq
char *escapechar
char *badchar
ScanToken
The initial character is treated as the entire token. ScanToken can be
used to specify the same tokenclass for two different characters. For
instance, to map left braces to left parentheses we could write
tokenclass '('
seq
"{"
recognizer ScanToken
Since ScanToken is the default, it can be omitted.
ScanComment
Parameters indicate the terminating sequence for the comment. The
recognizer returns FALSE so another token will be scanned for after
recognizing the comment.
char *endseq
The first character of the endseq value must not appear anywhere else in
that value.
ScanWhitespace
The set of whitespace must be specified both in the set line and by
writing
a continueset parameter. The recognizer returns FALSE so another token
will be scanned for after skipping the whitespace.
charset
continueset
ScanError
The set character or seq that initiates this token class is treated as an
error.
A msg parameter should be specified; if no body is
specified, the
message is passed to the errorhandler function (or printed if there is no
error handler).
char *msg
______________________________
Field lines, details
A field line has three elements--type, identifier, and value. The first
two are signle words; the form of the value depends on the type. The
described field becomes one field of the struct passed to the function
for
this token class. The value is used to initialize the field in the
instance of the struct created for this token class.
For example, if the line is
int basis 3
the generated struct declaration will have the form:
struct tlex_Sym000 {
...
int basis;
...
} tlex_Sym001 = {
...
3,
...
};
A limited set of types are allowed in a field line.
includes these standard C types:
int
long
float
double
This set
char*
and the semi-standard type
boolean
for which the value constants are TRUE and FALSE.
charset, tokennumber, and action.
Other types are
charset
The value portion is a character sequence in square brackets, just as for
'set' lines. Charset variables are used as the first argument to
tlex_BITISSET.
If v is a charset identifier and c is a character, the
expression
tlex_BITISSET(parm->v, c)
is TRUE if the value of c is one of the characters in the value of v.
tokennumber
The variable declared as a tokennumber is declared
initialization expression is a token name, exactly
operand to tokenclass. The C int is initialized to
number for the token given token name. This value
the second argument to tlex_SetTokenNumber.
as int in C. The
as may appear as the
the appropriate token
is appropriate as
action
The return value from a C code portion must be one of two constants or
must
be a value created by an action type field element. The initialization
for
a variable of type action is the operand of a tokenclass line; that is, a
token representation, possibly followed by a parenthesized letter.
______________________________
Function bodies, details
if the { C-code } section is present, the code is called as in a function
with two arguments: tlex and parm, where tlex is the current tlex object
and parm points to a struct containing at least the fields described in
the field description lines
the function is called as the recognizer if no recognizer is specified
otherwise it is called as a handler after the recognizer
has assembled the token
Must return a value telling tlex what to do with the token assembled.
This value may be a tlex_IGNORE, tlex_ACCEPT, or a variable from a
field element declared to have type action. tlex_IGNORE causes the tlex
to ignore the assembled token and begin looking for another at the
current
positin in the text. tlex_ACCEPT says to return the current tokennumber
and tokenvalue to the parser. Any action type indicates that the token
so
far is to be treated as if it were the operand of 'seq' for the
tokenclass
named in defining the action value.
For example, a Fortran lexer could treat "do" specially. If it were
followed by '5 i = 1, 10' the lexer would return the reserved word DO;
but otherwise the lexer would return some variable, say parm->idact,
where
the variable were defined as in
tokenclass DO
seq "do"
action idact setID
{
if ("do" is not the start of a DO stmt)
return parm->idact;
}
The tasks of a builtin recognizer
initial condition: tokpos is first char
first and prefix chars in tokenbuffer (with \0)
currchar and currpos are char after the initial char
set tokend
may reset tokpos
usually store chars in tokenbuffer
may reset default value for tokennumber
leave currchar at the character after the token
set tokenvalue to NULL
call appropriate handler (defined in .tlx file)
return handler value as scanning value
The tasks of a handler
initial condition: token is at tokpos...tokend
characters may be in tokenbuffer (with \0)
currchar and currpos are char after the final char
set tokenvalue
return new value for scanning (usually FALSE)
may also do all the tasks of a recognizer
The tasks of a user defined recognizer
initial condition: tokpos is first char
first and prefix chars are in tokenbuffer (with trailing \0)
currchar and currpos are char after the initial char or seq
set tokpos and tokend
may store chars in tokenbuffer
may reset default value for tokennumber
leave currchar at the character after the token
may set tokenvalue
return value for scanning (usually FALSE)
______________________________
Sample input: The ness.tlx file
Tokens in ness are much like those in C. Comments begin with -and extend to newline; --$ begins a pragmat, i.e., a special
comment processed by a pragmat parser; there is a long form of
string consants which can include newlines; and brackets and braces
are treated as parentheses. Note that the C code for numeric tokens
converts the tokennumber from setINTCON to setREALCON when
ScanNumber has detected a real value.
------------------------comment:
-- ... \n
tokenclass -nonerecognizer ScanComment
seq
"--"
char
*endseq
"\n"
-pragmat: --$ ... \n
tokenclass -nonerecognizer ScanComment
seq
"--$"
char
*endseq
"\n"
{
printf("pragmat: %s", tlex_GetTokenText(tlex)+3);
return tlex_IGNORE;
}
-identifier: [a-zA-Z_] [a-zA-Z0-9_]*
tokenclass setID
set
[a-zA-Z_]
recognizer ScanID
charset
continueset [a-zA-Z0-9_]
{
struct toksym *s;
s = toksym_TFind(tlex_GetTokenText(tlex), grammarscope);
if (s != NULL)
tlex_SetTokenNumber(tlex, s->toknum);
return tlex_ACCEPT;
}
-string:
tokenclass setSTRINGCON
seq
"\""
recognizer ScanString
" ... "
escape is \
-string: // ... \n\\\\
tokenclass setSTRINGCON
seq
"//"
{
register int c;
static char delim[4] = "\n\\\\";
char *dx;
no escape
dx = delim;
while (*dx && c != EOF) {
if (*dx == c)
dx++, c = tlex_NextChar(tlex);
else if (dx == delim)
c = tlex_NextChar(tlex);
else dx = delim;
}
if (c != EOF)
tlex_NextChar(tlex);
tlex_EndToken(tlex);
return tlex_ACCEPT;
}
-integers and real values
tokenclass setINTCON
set
[0-9'.]
recognizer ScanNumber
tokennumber realtok
setREALCON
{
if ( ! parm->IsInt)
tlex_SetTokenNumber(tlex, parm->realtok);
/* add value to symbol table */
return tlex_ACCEPT;
}
--tokenclass
set
tokenclass
set
[ and { map to (
] and } map to )
'('
[{\[]
')'
[}\]]
______________________________
Sample input: The ness.tab.c file
A full .tab.c file as generated by bison is quite long, but gentlex only
looks for certain features. First it must find somewhere a line defining
YYNTOKENS with the form
#define YYNTOKENS 56
(where the # is immediately after a newline).
the
token names as in the example:
Subsequently it must find
static const char * const yytname[] = {
"$","error","$illegal.","OR","AND",
"NOT","'='","\"/=\"","'<'","'>'","\">=\"","\"<=\"","'+'","''","'*'","'/'","'%'",
"'~'","UNARYOP","setID","setSTRINGCON","setINTCON","setREALCON","MARKER",
"BOOLEAN",
"INTEGER","REAL","OBJECT","VOID","FUNCTION","END","ON","EXTEND","FORWARD"
,"MOUSE",
"MENU","KEYS","EVENT","RETURN","WHILE","DO","IF","THEN","ELSE","ELIF","EX
IT",
"GOTOELSE","tokTRUE","tokFALSE","tokNULL","';'","\":=\"","'('","')'","','
","\"~:=\"",
"script","attributes","type","functype","eventstart","endtag","attrDecl",
"parmList",
The key identifying string is "yytname[] = {"; thereafter the tokens may
be separated by arbitrary white space and one comma. Note that scanning
terminates after reading YYNTOKENS token names, so the token list need
not continue to a correct C declaration.
______________________________
Sample compiler using both tlex and the parse object
This object module, nessparse.c, depends on ness.tlx
as given above and ness.y, a grammar for the ness language.
The ness.y file is processed with
bison -n ness.y
to produce the ness.act and ness.tab.c files.
Then gentlex is invoked
gentlex ness.tlx ness.tab.c
to generate the ness.tlc file #included in this module.
#include <text.ih>
#include <toksym.ih>
#include <parse.ih>
#include <lexan.ih>
#include <tlex.ih>
#include
#include
#include
#include
<ness.tab.c>
<parsedesc.h>
<parsepre.h>
<ness.act>
/*
/*
/*
/*
parse tables */
declare parse_description */
begin function 'action' */
body of function 'action' */
#include <parsepost.h> /* end of function 'action' */
static toksym_scopeType grammarscope;
static struct toksym *proto;
#include <ness.tlc>
static void
EnterReservedWord(rock, w, i)
void *rock;
char *w;
int i;
{
struct toksym *s;
boolean new;
s = toksym_TLocate(w, proto, grammarscope, &new);
s->toknum = i;
}
int
parsetext(input)
struct text *input;
{
struct parse *p;
struct tlex *lexalyzer;
proto = toksym_New();
grammarscope = toksym_TNewScope(toksym_GLOBAL);
lexalyzer = tlex_Create(&ness_tlex_tables, NULL,
input, 0, text_GetLength(input));
p = parse_Create(&parse_description, lexalyzer,
reduceActions, NULL, NULL);
parse_EnumerateReservedWords(p, EnterReservedWord, NULL);
return parse_Run(p);
/* do all the work */
}
______________________________
Tools available in tlex
tlex_Create(struct tlex_tables *description, void *rock,
struct text *text, long pos, long len)
returns struct tlex *;
/* the rock is available to any function passed this tlex
The text, pos, and len specify a portion of a text to be processed
*/
tlex_SetText(/* struct tlex *self, */ struct text *text,
long pos, long len);
/* sets the source text for the lexeme stream */
tlex_RecentPosition(/* struct tlex *self, */ int index, long *len)
returns long;
/* for token 'index', set len to length and return position.
index = 0 is the most recent token,
its predecessors are indexed with negative numbers:
-1 -2 ... -tlex_RECENTSIZE+1*/
tlex_RecentIndent(/* struct tlex *self, */
int index) returns long;
/* report the indentation of the 'index'th most recent token,
where index is as for RecentPosition .
A token preceded by anything other than white space
is reported as having indentation 999. */
tlex_Repeat(/* struct tlex *self, */
int index);
/* backup and repeat tokens starting with the index'th
most recent token, where index is as for RecentPosition */
tlex_Error(/* struct tlex *self, */ char *msg);
/* a lexical error is reported by calling the
error handler after setting up a dummy token
for the error text
The msg is expected to be in static storage.
*/
The "rock" is an argument to tlex_Create. It is an arbitrary value
that is accessible via the tlex object.
tlex_GetRock()
returns the current rock value
tlex_SetRock(void *r)
sets a new rock value
C code in tokenclass blocks can modify the values that will be returned
to the parser by calling macro methods to adjust these attributes:
Token number
Token value (yylval)
Current character and scan position in the source text
Position and length of the source for the current token
Token text generated to represent the token
tlex_operations to perform these operations are described in what
follows.
TokenNumber is the number to be
This is usually set by default
tokenclass line in the xxx.tlx
value created by a tokennumber
*/
tlex_SetTokenNumber(int n)
tlex_GetTokenNumber()
reported to the parser.
based on the argument to the
file. It may be reset to a
line within a tokenclass block.
/* the TokenValue is the value for yylval. These values serve
as the initial values in the value stack maintained
by the parser in parallel with the state stack */
tlex_SetTokenValue(void *v)
tlex_GetTokenValue()
/* the current position in the input is CurrPos where the
character is as given by CurrChar. By convention each
lexical analysis routine leaves CurrPos/CurrChar referring
to the first character to be considered for the next token.
NextChar moves CurrPos ahead one character, fetches the
next character, and returns it.
BackUp moves backward by n characters, resetting CurrPos/CurrChar
(a negative n value is acceptable and moves the position forward)
See also Advance, below, which combines NextChar with storing
the prior character in the tokentext.
*/
tlex_CurrPos()
tlex_CurrChar()
tlex_NextChar()
tlex_BackUp(int n)
/* The position of the token text in the input source is
recorded and is available via
GetTokPos - the position of the first character
GetTokEnd - the position of the last character
StartToken records CurrPos as the position at which the token begins.
EndToken records the token as ending one character before Currpos.
There is no harm in calling StartToken or EndToken more than once,
although these functions also affect the token text buffer,
as noted below.
*/
tlex_GetTokPos()
tlex_GetTokEnd()
tlex_StartToken()
tlex_EndToken()
/* Some tokens are recorded by the lexer as
a character string which can be retrieved by GetTokenText.
In particular, when C code is called from a tokenclass block,
the text is the sequence of characters from the source that
caused this tokenclass to be activated.
Saving of the token text can be controlled by setting the
SaveText parameter. Its default value is TRUE for ScanID, and
and FALSE for ScanWhitespace, ScanComment and ScanString.
The text is always stored for ScanToken.
A canonical form of the number is always stored for ScanNumber.
If the text is stored for a comment or string, only the contents are
stored--not the delimiters--and the TokPos/TokEnd are set to the
contents only. (Normally TokPos/End includes the delimiters.)
StartToken and EndToken (above) have the additional functionality,
respectively, of clearing the token buffer and finishing it with
a null character.
GetTokenText returns a pointer to the token text string.
PrevTokenText returns a pointer to the text of the previous token.
ClearTokenText clears the text to an empty string.
AppendToTokenText appends a character to the text.
TruncateTokenText removes n characters from its end.
Advance appends the current character to the token text and
then calls NextChar
*/
tlex_GetTokenText()
tlex_PrevTokenText()
tlex_ClearTokenText()
tlex_AppendToTokenText(int c)
tlex_TruncateTokenText(int n)
tlex_Advance()
Copyright 1992 Carnegie Mellon University.
All Rights Reserved.
$Disclaimer:
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose is hereby granted without fee,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice, this permission notice, and the following
# disclaimer appear in supporting documentation, and that the names of
# IBM, Carnegie Mellon University, and other copyright holders, not be
# used in advertising or publicity pertaining to distribution of the
software
# without specific, written prior permission.
#
# IBM, CARNEGIE MELLON UNIVERSITY, AND THE OTHER COPYRIGHT HOLDERS
# DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT
# SHALL IBM, CARNEGIE MELLON UNIVERSITY, OR ANY OTHER COPYRIGHT HOLDER
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# $
Download