\begindata{text,538290320} \textdsversion{12} \template{default} \define{global

advertisement
\begindata{text,538290320}
\textdsversion{12}
\template{default}
\define{global
}
\define{footnote
attr:[Flags OverBar Int Set]
attr:[FontSize PreviousFontSize Point -2]}
\center{\bold{\bigger{PARSEC
}A library for parsing C programs
}by Bob Glickstein
}
\heading{Introduction
}Parsec is a link-time object library which exports a number of functions
useful in the parsing of C program source. The main routine of parsec,
PC_Parse, returns a tree (whose structure is described in the Appendix)
representing the parsed version of the input. This tree can then be used
to
perform analysis of or transformations upon the input.
\heading{Library Routines
}These are the functions exported by libparsec.a.
include the header file parsec.h.
To use them, you must
\bold{PC_Child} \smaller{[macro]}
\indent{\typewriter{PC_ParseNode_t *\bold{PC_Child}(node, num)
PC_ParseNode_t *node;
int
num;
}\indent{Returns the \italic{num}\superscript{th} child of node
\italic{node}.
\italic{Num} can be any number between 0 and
\bold{PC_NumChildren}(\italic{node}) - 1, inclusive. The operation is
only
meaningful when performed on nodes of which both
\bold{PC_IsProductionNode}(\italic{node}) and
(\bold{PC_NumChildren}(\italic{node}) > 0) are true.
}}
\bold{PC_CountTokens}
\indent{\typewriter{int
\bold{PC_CountTokens}(node)
PC_ParseNode_t *node;
}\indent{Returns the number of tokens required to recreate the original
source
code associated with \italic{node}. Most useful when used in conjunction
with
malloc to create a properly-sized buffer for \bold{PC_DumpTokens}.
}}
\bold{PC_DumpTokens}
\indent{\typewriter{int
\bold{PC_DumpTokens}(node, tokvec)
PC_ParseNode_t *node;
PC_Token_t
*tokvec;
}\indent{\italic{Tokvec} is an array of tokens. \bold{PC_DumpTokens}
fills
this array with the sequence of tokens representing the input
corresponding to
\italic{node}. The result of this function should always equal
\bold{PC_CountTokens}(\italic{node}).
}}
\bold{PC_IsCharToken} \smaller{[macro]}
\indent{\typewriter{int
\bold{PC_IsCharToken}(tok)
PC_Token_t *tok;
}\indent{Returns non-zero if \italic{tok} is a token corresponding to a
single
character, otherwise returns zero.
}}
\bold{PC_IsEnumerator}
\indent{\typewriter{int
\bold{PC_IsEnumerator}(str)
char *str;
}\indent{Returns non-zero if the null-terminated string \italic{str} is
the
name of an enumeration constant defined in the input, otherwise returns
zero.
}}
\bold{PC_IsProductionNode} \smaller{[macro]}
\indent{\typewriter{int
PC_ParseNode_t *node;
\bold{PC_IsProductionNode}(node)
}\indent{Returns non-zero if \italic{node} is a production (non-token)
node,
otherwise returns zero.
}}
\bold{PC_IsTextToken} \smaller{[macro]}
\indent{\typewriter{int
\bold{PC_IsTextToken}(tok)
PC_Token_t *tok;
}\indent{Returns non-zero if \italic{tok} is a token containing a string
of
text, otherwise returns zero.
}}
\bold{PC_IsTokenNode} \smaller{[macro]}
\indent{\typewriter{int
\bold{PC_IsTokenNode}(node)
PC_ParseNode_t *node;
}\indent{Returns non-zero if \italic{node} is a token (non-production)
node,
otherwise returns zero.
}}
\bold{PC_IsTypedef}
\indent{\typewriter{int
PC_IsTypedef(str)
char *str;
}\indent{Returns non-zero if the null-terminated string \italic{str} is
the
name of a typedef defined in the input, otherwise returns zero.
}}
\bold{PC_NodeToken} \smaller{[macro]}
\indent{\typewriter{PC_Token_t
*\bold{PC_NodeToken}(node)
PC_ParseNode_t *node;
}\indent{Returns the token associated with \italic{node}.
is
only meaningful when performed on nodes of which
\bold{PC_IsTokenNode}(\italic{node}) is true.
This operation
}}
\bold{PC_NodeType} \smaller{[macro]}
\indent{\typewriter{PC_ParseNodeType_t
\bold{PC_NodeType}(node)
PC_ParseNode_t
*node;
}\indent{Returns the production type of \italic{node} (meaningful both
for
production nodes and token nodes). The possible values for ``production
types'' are outlined in the Appendix.
}}
\bold{PC_NumChildren} \smaller{[macro]}
\indent{\typewriter{int
\bold{PC_NumChildren}(node)
PC_ParseNode_t *node;
}\indent{Returns the number of children of \italic{node}.
is
only meaningful when performed on nodes of which
\bold{PC_IsProductionNode}(\italic{node}) is true.
This operation
}}
\bold{PC_Parse}
\indent{\typewriter{PC_ParseNode_t *\bold{PC_Parse}(fold)
int
fold;
}\indent{Parses a C program on the standard input and returns the parse
tree.
Will return NULL if a parsing error occurred, in which case the variable
\bold{PC_ParseError} (type: char *) \italic{might} point to a nullterminated
string describing the error (or it might be NULL). If \italic{fold} is
non-zero, then the parse tree will be abbreviated as described in the
Appendix. Such abbreviation makes some parse information slightly more
obscure, but saves a great deal of memory.
}}
\bold{PC_PostWalkTree}
\indent{\typewriter{int
combine, leafVal)
PC_ParseNode_t
int
\bold{PC_PostWalkTree}(node, func,
*node;
(*func)(), combine, leafVal;
}\indent{Performs a post-order traversal of the tree rooted at
\italic{node}.
After a node's children are recursively visited, \italic{func} is
applied to
the node. Returns the value of \italic{func} applied to the tree root
\italic{node}. \italic{Func} is a pointer to an integer function of four
arguments. These arguments are: PC_ParseNode_t *\italic{Node}, int
\italic{Depth}, int \italic{ChildrenVal} and int \italic{WhichChild}.
\italic{Node} is the current node to which \italic{func} is being
applied;
\italic{Depth} is \italic{Node}'s depth in the tree relative to
\italic{node}
(\italic{node} is at depth 0); \italic{ChildrenVal} is the combination of
the
results of \italic{func} applied to \italic{Node}'s children; and
\italic{WhichChild} is a non-negative integer specifying which child of
\italic{Node}'s parent \italic{Node} is (except that \italic{WhichChild}
is -1
for the tree root \italic{node}). The \italic{combine} argument to
\bold{PC_PostWalkTree} specifies how the values of \italic{func} applied
to
the children of a node should be combined and passed back to the parent
as the
\italic{ChildrenVal} argument. The possible values for \italic{combine}
are:
\indent{\description{PC_COMBINE_ADD, which adds
\italic{func}(child\subscript{0}), ..., \italic{func}(child\subscript{n})
together;
PC_COMBINE_LOR, which does a logical or of
\italic{func}(child\subscript{0}),
..., \italic{func}(child\subscript{n});
PC_COMBINE_BITOR, which does a bitwise or of
\italic{func}(child\subscript{0}), ...,
\italic{func}(child\subscript{n});
PC_COMBINE_LXOR, which does a logical exclusive-or of
\italic{func}(child\subscript{0}), ...,
\italic{func}(child\subscript{n});
PC_COMBINE_BITXOR, which does a bitwise exclusive-or of
\italic{func}(child\subscript{0}), ...,
\italic{func}(child\subscript{n});
PC_COMBINE_LAND, which does a logical and of
\italic{func}(child\subscript{0}), ...,
\italic{func}(child\subscript{n});
PC_COMBINE_BITAND, which does a bitwise and of
\italic{func}(child\subscript{0}), ...,
\italic{func}(child\subscript{n});
PC_COMBINE_MULTIPLY, which multiplies \italic{func}(child\subscript{0}),
...,
\italic{func}(child\subscript{n}) together.
}}The \italic{leafVal} argument specifies what to pass as
\italic{ChildrenVal}
when \italic{func} is applied to nodes with no children.
}}
\bold{PC_PreWalkTree}
\indent{\typewriter{int
rootVal)
PC_ParseNode_t
int
\bold{PC_PreWalkTree}(node, func,
*node;
(*func)(), rootVal;
}\indent{Performs a pre-order traversal of the tree rooted at
\italic{node}.
Applies \italic{func} to a node and then recursively descends its
children.
Returns the value of \italic{func} applied to the tree root
\italic{node}.
\italic{Func} is a pointer to an integer function of five arguments.
These
arguments are: PC_ParseNode_t *\italic{Node}, int \italic{Depth}, int
*\italic{Descend}, int \italic{ParentVal} and int \italic{WhichChild}.
\italic{Node} is the current node to which \italic{func} is being
applied;
\italic{Depth} is \italic{Node}'s depth in the tree relative to
\italic{node}
(\italic{node} is at depth 0); \italic{Descend} is a pointer to an
integer
(explained below); \italic{ParentVal} is the result of \italic{func}
applied
to \italic{Node}'s parent; and \italic{WhichChild} is a non-negative
integer
specifying which child of \italic{Node}'s parent \italic{Node} is (except
that
\italic{WhichChild} is -1 for the tree root \italic{node}). The integer
pointed to by \italic{Descend} is initially 1, but if it is set to zero,
it
indicates to \bold{PC_PreWalkTree} that \italic{Node}'s children are not
to be
recursively descended (this is useful in tree-pruning). The
\italic{rootVal}
argument specifies what to pass as \italic{ParentVal} when \italic{func}
is
applied to the tree root \italic{node}.
}}
\bold{PC_SetCharToken}
\indent{\typewriter{void
\bold{PC_SetCharToken}(tok, c)
PC_Token_t *tok;
char
c;
}\indent{Makes \italic{tok} a character-token containing the character
\italic{c}.
}}
\bold{PC_SetTextToken}
\indent{\typewriter{void
\bold{PC_SetTextToken}(tok, val, str)
PC_Token_t *tok;
int
char
val;
*str;
}\indent{Makes \italic{tok} a text-token with value \italic{val} and
containing the string \italic{str}. Possible values for \italic{val} are
outlined in the Appendix.
}}
\bold{PC_SetToken}
\indent{\typewriter{void
\bold{PC_SetToken}(tok, val)
PC_Token_t *tok;
int
val;
}\indent{Makes \italic{tok} a non-character, non-text token (that is to
say,
an encoded token) whose value is \italic{val}. Possible values for
\italic{val} are outlined in the Appendix.
}}
\bold{PC_SubType} \smaller{[macro]}
\indent{\typewriter{int
\bold{PC_SubType}(node)
PC_ParseNode_t *node;
}\indent{Returns the production subtype of \italic{node}; consult the
Appendix
for information about interpreting production types and subtypes. This
operation is only meaningful when performed on nodes of which
\bold{PC_IsProductionNode}(\italic{node}) is true.
}}
\bold{PC_TokenChar} \smaller{[macro]}
\indent{\typewriter{char
\bold{PC_TokenChar}(tok)
PC_Token_t *tok;
}\indent{Returns the character associated with the token \italic{tok}.
This
operation is only meaningful when performed on tokens of which
\bold{PC_IsCharToken}(\italic{tok}) is true.
}}
\bold{PC_TokenChars}
\indent{\typewriter{char
*\bold{PC_TokenChars}(tok)
PC_Token_t *tok;
}\indent{Returns a null-terminated string containing a representation of
the
token \italic{tok}; this representation corresponds to the input form of
the
token. The result is returned in a static buffer which is overwritten
with
each call.
}}
\bold{PC_TokenText} \smaller{[macro]}
\indent{\typewriter{char
*\bold{PC_TokenText}(tok)
PC_Token_t *tok;
}\indent{Returns the text string associated with \italic{tok}.
operation
is only meaningful when performed on tokens of which
\bold{PC_IsTextToken}(\italic{tok}) is true.
This
}}
\bold{PC_TokenVal} \smaller{[macro]}
\indent{\typewriter{int
\bold{PC_TokenVal}(tok)
PC_Token_t *tok;
}\indent{Returns the token code associated with \italic{tok}. Possible
values
are outlined in the Appendix. This operation is only meaningful when
performed on tokens of which neither \bold{PC_IsCharToken}(\italic{tok})
nor
\bold{PC_IsTextToken}(\italic{tok}) is true.
}}
\begindata{bp,538268488}
\enddata{bp,538268488}
\view{bpv,538268488,95,0,0}
\heading{Example
}Following is a trivial example of a parsec application. This program
tries
to parse its input. If the input is a syntactically valid C module, then
the
module is reproduced on the standard output, one token at a time;
otherwise,
an error is reported.
\smaller{\indent{\typewriter{#include <stdio.h>
#include <parsec.h>
\bold{main}()
\{
PC_ParseNode_t *Tree;
PC_Token_t
*tokenVector;
int
numTokens, i;
extern char
*malloc();
Tree = PC_Parse(1);
*}\typewriter{/
/}\italic{* Folding is "on"
if (!Tree) \{
fprintf(stderr,
"Input is not a syntactically valid C program\\n");
exit(1);
\}
numTokens = PC_CountTokens(Tree);
tokenVector = (PC_Token_t *)
malloc(numTokens *
(sizeof(PC_Token_t)));
if (!tokenVector) \{
fprintf(stderr, "Out of memory\\n");
exit(1);
\}
\typewriter{
token
PC_DumpTokens(Tree, tokenVector);
/}}\italic{* Fill the
vector with tokens *}\typewriter{\typewriter{/
}
for (i = 0; i < numTokens; ++i) \{
if (!(i % 6))
putchar('\\n');
print a
newline *}\typewriter{/
/}\italic{* Every six tokens,
printf("%s ",
PC_TokenChars(&(tokenVector[i])));
\}
putchar('\\n');
another
newline *}\typewriter{/
exit(0);
*}\typewriter{/
/}\italic{* Finish up with
/}\italic{* Normal termination
\}
}}}
\heading{Bugs
}PC_Parse currently can only read the standard input.
There is no facility for identifying syntax errors in the input, except
that
one did or did not occur.
The parser is pretty slow.
Pre-processor directives are not handled.
\heading{Author
}Bob Glickstein, Information Technology Center, Carnegie Mellon
University
July 1989
\begindata{bp,538268296}
\enddata{bp,538268296}
\view{bpv,538268296,96,0,0}
\center{\bold{\bigger{Appendix
}}}
The grammar recognized by the PC_Parse function
follows\footnote{\
\begindata{fnote,538598664}
\textdsversion{12}
\define{italic
menu:[Font~1,Italic~11]
attr:[FontFace Italic Int Set]}
This grammar corresponds almost exactly to the one given in Appendix B of
\italic{The C Programming Language, Second Edition}, by B.W. Kernighan
and
D.M. Ritchie [Prentice-Hall, NJ].\
\enddata{fnote,538598664}
\view{fnotev,538598664,97,0,0}}. In this grammar, lower_case strings are
production names, UPPER_CASE strings are input tokens, and single
typographical characters (like this exclamation point\bold{!}) are in
boldface
and are also input tokens. Italicized numbers in parentheses enumerate
the
different rules for a given production. When such a number is followed
by an
asterisk, it indicates that the node emitted is a token node rather than
a
production node.
\indent{module \italic{(1)} ::= external_declaration
module \italic{(2)} ::= module external_declaration
external_declaration \italic{(1)} ::= function_definition
external_declaration \italic{(2)} ::= declaration
function_definition \italic{(1)} ::= declarator function_body
function_definition \italic{(2)} ::= declaration_specifiers declarator
function_body
function_body \italic{(1)} ::= compound_statement
function_body \italic{(2)} ::= declaration_list compound_statement
declaration \italic{(1)} ::= declaration_specifiers \bold{;}
declaration \italic{(2)} ::= declaration_specifiers init_declarator_list
\bold{;}
declaration_list \italic{(1)} ::= declaration
declaration_list \italic{(2)} ::= declaration_list declaration
declaration_specifiers \italic{(1)} ::= storage_class_specifier
declaration_specifiers \italic{(2)} ::= storage_class_specifier
declaration_specifiers
declaration_specifiers \italic{(3)} ::= type_specifier
declaration_specifiers \italic{(4)} ::= type_specifier
declaration_specifiers
declaration_specifiers \italic{(5)} ::= type_qualifier
declaration_specifiers \italic{(6)} ::= type_qualifier
declaration_specifiers
storage_class_specifier \italic{(1*)} ::= PC_AUTO
storage_class_specifier \italic{(2*)} ::= PC_REGISTER
storage_class_specifier \italic{(3*)} ::= PC_STATIC
storage_class_specifier \italic{(4*)} ::= PC_EXTERN
storage_class_specifier \italic{(5*)} ::= PC_TYPEDEF
type_specifier \italic{(1*)} ::= PC_VOID
type_specifier \italic{(2*)} ::= PC_CHAR
type_specifier \italic{(3*)} ::= PC_SHORT
type_specifier \italic{(4*)} ::= PC_INT
type_specifier \italic{(5*)} ::= PC_LONG
type_specifier \italic{(6*)} ::= PC_FLOAT
type_specifier \italic{(7*)} ::= PC_DOUBLE
type_specifier \italic{(8*)} ::= PC_SIGNED
type_specifier \italic{(9*)} ::= PC_UNSIGNED
type_specifier \italic{(10)} ::= struct_or_union_specifier
type_specifier \italic{(11)} ::= enum_specifier
type_specifier \italic{(12*)} ::= PC_TYPEDEF_NAME
type_qualifier \italic{(1*)} ::= PC_CONST
type_qualifier \italic{(2*)} ::= PC_VOLATILE
struct_or_union_specifier \italic{(1)} ::= struct_or_union \bold{\{}
struct_declaration_list \bold{\}}
struct_or_union_specifier \italic{(2)} ::= struct_or_union identifier
\bold{\{}
struct_declaration_list \bold{\}}
struct_or_union_specifier \italic{(3)} ::= struct_or_union identifier
struct_or_union \italic{(1*)} ::= PC_STRUCT
struct_or_union \italic{(2*)} ::= PC_UNION
struct_declaration_list \italic{(1)} ::= struct_declaration
struct_declaration_list \italic{(2)} ::= struct_declaration_list
struct_declaration
init_declarator_list \italic{(1)} ::= init_declarator
init_declarator_list \italic{(2)} ::= init_declarator_list \bold{,}
init_declarator
init_declarator \italic{(1)} ::= declarator
init_declarator \italic{(2)} ::= declarator \bold{=} initializer
struct_declaration \italic{(1)} ::= specifier_qualifier_list
struct_declarator_list \bold{;}
specifier_qualifier_list \italic{(1)} ::= type_specifier
specifier_qualifier_list \italic{(2)} ::= type_specifier
specifier_qualifier_list
specifier_qualifier_list \italic{(3)} ::= type_qualifier
specifier_qualifier_list \italic{(4)} ::= type_qualifier
specifier_qualifier_list
struct_declarator_list \italic{(1)} ::= struct_declarator
struct_declarator_list \italic{(2)} ::= struct_declarator_list \bold{,}
struct_declarator
struct_declarator \italic{(1)} ::= declarator
struct_declarator \italic{(2)} ::= \bold{:} constant_expression
struct_declarator \italic{(3)} ::= declarator \bold{:}
constant_expression
enum_specifier \italic{(1)} ::= PC_ENUM \bold{\{} enumerator_list
\bold{\}}
enum_specifier \italic{(2)} ::= PC_ENUM identifier \bold{\{}
enumerator_list
\bold{\}}
enum_specifier \italic{(3)} ::= PC_ENUM identifier
enumerator_list \italic{(1)} ::= enumerator
enumerator_list \italic{(2)} ::= enumerator_list \bold{,} enumerator
enumerator \italic{(1)} ::= identifier
enumerator \italic{(2)} ::= identifier \bold{=} constant_expression
declarator \italic{(1)} ::= direct_declarator
declarator \italic{(2)} ::= pointer direct_declarator
direct_declarator \italic{(1)} ::= identifier
direct_declarator \italic{(2)} ::= \bold{(} declarator \bold{)}
direct_declarator \italic{(3)} ::= direct_declarator \bold{[} \bold{]}
direct_declarator \italic{(4)} ::= direct_declarator \bold{[}
constant_expression \bold{]}
direct_declarator \italic{(5)} ::= direct_declarator \bold{(}
parameter_type_list \bold{)}
direct_declarator \italic{(6)} ::= direct_declarator \bold{(} \bold{)}
direct_declarator \italic{(7)} ::= direct_declarator \bold{(}
identifier_list
\bold{)}
pointer \italic{(1)} ::= \bold{*}
pointer \italic{(2)} ::= \bold{*} type_qualifier_list
pointer \italic{(3)} ::= \bold{*} pointer
pointer \italic{(4)} ::= \bold{*} type_qualifier_list pointer
type_qualifier_list \italic{(1)} ::= type_qualifier
type_qualifier_list \italic{(2)} ::= type_qualifier_list type_qualifier
parameter_type_list \italic{(1)} ::= parameter_list
parameter_type_list \italic{(2)} ::= parameter_list \bold{,} PC_ELLIPSIS
parameter_list \italic{(1)} ::= parameter_declaration
parameter_list \italic{(2)} ::= parameter_list \bold{,}
parameter_declaration
parameter_declaration \italic{(1)} ::= declaration_specifiers declarator
parameter_declaration \italic{(2)} ::= declaration_specifiers
parameter_declaration \italic{(3)} ::= declaration_specifiers
abstract_declarator
identifier_list \italic{(1)} ::= identifier
identifier_list \italic{(2)} ::= identifier_list \bold{,} identifier
initializer \italic{(1)} ::= assignment_expression
initializer \italic{(2)} ::= \bold{\{} initializer_list \bold{\}}
initializer \italic{(3)} ::= \bold{\{} initializer_list \bold{,}
\bold{\}}
initializer_list \italic{(1)} ::= initializer
initializer_list \italic{(2)} ::= initializer_list \bold{,} initializer
type_name \italic{(1)} ::= specifier_qualifier_list
type_name \italic{(2)} ::= specifier_qualifier_list abstract_declarator
abstract_declarator \italic{(1)} ::= pointer
abstract_declarator \italic{(2)} ::= direct_abstract_declarator
abstract_declarator \italic{(3)} ::= pointer direct_abstract_declarator
direct_abstract_declarator \italic{(1)} ::= \bold{(} abstract_declarator
\bold{)}
direct_abstract_declarator \italic{(2)} ::= \bold{[} \bold{]}
direct_abstract_declarator \italic{(3)} ::= \bold{[} constant_expression
\bold{]}
direct_abstract_declarator \italic{(4)} ::= direct_abstract_declarator
\bold{[}
\bold{]}
direct_abstract_declarator \italic{(5)} ::= direct_abstract_declarator
\bold{[}
constant_expression \bold{]}
direct_abstract_declarator \italic{(6)} ::= \bold{(} \bold{)}
direct_abstract_declarator \italic{(7)} ::= \bold{(} parameter_type_list
\bold{)}
direct_abstract_declarator \italic{(8)} ::= direct_abstract_declarator
\bold{(}
\bold{)}
direct_abstract_declarator \italic{(9)} ::= direct_abstract_declarator
\bold{(}
parameter_type_list \bold{)}
statement \italic{(1)} ::= labeled_statement
statement \italic{(2)} ::= expression_statement
statement \italic{(3)} ::= compound_statement
statement \italic{(4)} ::= selection_statement
statement \italic{(5)} ::= iteration_statement
statement \italic{(6)} ::= jump_statement
labeled_statement \italic{(1)} ::= identifier \bold{:} statement
labeled_statement \italic{(2)} ::= PC_CASE constant_expression \bold{:}
statement
labeled_statement \italic{(3)} ::= PC_DEFAULT \bold{:} statement
expression_statement \italic{(1)} ::= \bold{;}
expression_statement \italic{(2)} ::= expression \bold{;}
compound_statement \italic{(1)} ::= \bold{\{} \bold{\}}
compound_statement \italic{(2)} ::= \bold{\{} declaration_list \bold{\}}
compound_statement \italic{(3)} ::= \bold{\{} statement_list \bold{\}}
compound_statement \italic{(4)} ::= \bold{\{} declaration_list
statement_list
\bold{\}}
statement_list \italic{(1)} ::= statement
statement_list \italic{(2)} ::= statement_list statement
selection_statement \italic{(1)} ::= PC_IF \bold{(} expression \bold{)}
statement
selection_statement \italic{(2)} ::= PC_IF \bold{(} expression \bold{)}
statement PC_ELSE statement
selection_statement \italic{(3)} ::= PC_SWITCH \bold{(} expression
\bold{)}
statement
iteration_statement \italic{(1)} ::= PC_WHILE \bold{(} expression
\bold{)}
statement
iteration_statement \italic{(2)} ::= PC_DO statement PC_WHILE \bold{(}
expression \bold{)} \bold{;}
iteration_statement \italic{(3)} ::= PC_FOR \bold{(} \bold{;} \bold{;}
\bold{)}
statement
iteration_statement \italic{(4)} ::= PC_FOR \bold{(} expression \bold{;}
\bold{;} \bold{)} statement
iteration_statement \italic{(5)} ::= PC_FOR \bold{(} \bold{;} expression
\bold{;} \bold{)} statement
iteration_statement \italic{(6)} ::= PC_FOR \bold{(} \bold{;} \bold{;}
expression \bold{)} statement
iteration_statement \italic{(7)} ::= PC_FOR \bold{(} \bold{;} expression
\bold{;} expression \bold{)} statement
iteration_statement \italic{(8)} ::= PC_FOR \bold{(} expression \bold{;}
\bold{;} expression \bold{)} statement
iteration_statement \italic{(9)} ::= PC_FOR \bold{(} expression \bold{;}
expression \bold{;} \bold{)} statement
iteration_statement \italic{(10)} ::= PC_FOR \bold{(} expression \bold{;}
expression \bold{;} expression \bold{)} statement
jump_statement \italic{(1)} ::= PC_GOTO identifier \bold{;}
jump_statement \italic{(2)} ::= PC_CONTINUE \bold{;}
jump_statement \italic{(3)} ::= PC_BREAK \bold{;}
jump_statement \italic{(4)} ::= PC_RETURN \bold{;}
jump_statement \italic{(5)} ::= PC_RETURN expression \bold{;}
expression \italic{(1)} ::= assignment_expression
expression \italic{(2)} ::= expression \bold{,} assignment_expression
assignment_expression \italic{(1)} ::= conditional_expression
assignment_expression \italic{(2)} ::= unary_expression
assignment_operator
assignment_expression
assignment_operator \italic{(1*)} ::= \bold{=}
assignment_operator \italic{(2*)} ::= PC_MULASSIGN
assignment_operator \italic{(3*)} ::= PC_DIVASSIGN
assignment_operator \italic{(4*)} ::= PC_MODASSIGN
assignment_operator \italic{(5*)} ::= PC_ADDASSIGN
assignment_operator \italic{(6*)} ::= PC_SUBASSIGN
assignment_operator \italic{(7*)} ::= PC_LEFTASSIGN
assignment_operator \italic{(8*)} ::= PC_RIGHTASSIGN
assignment_operator \italic{(9*)} ::= PC_ANDASSIGN
assignment_operator \italic{(10*)} ::= PC_XORASSIGN
assignment_operator \italic{(11*)} ::= PC_ORASSIGN
conditional_expression \italic{(1)} ::= logical_or_expression
conditional_expression \italic{(2)} ::= logical_or_expression \bold{?}
expression \bold{:} conditional_expression
constant_expression \italic{(1)} ::= conditional_expression
logical_or_expression \italic{(1)} ::= logical_and_expression
logical_or_expression \italic{(2)} ::= logical_or_expression
PC_LOGICAL_OR
logical_and_expression
logical_and_expression \italic{(1)} ::= inclusive_or_expression
logical_and_expression \italic{(2)} ::= logical_and_expression
PC_LOGICAL_AND
inclusive_or_expression
inclusive_or_expression \italic{(1)} ::= exclusive_or_expression
inclusive_or_expression \italic{(2)} ::= inclusive_or_expression \bold{|}
exclusive_or_expression
exclusive_or_expression \italic{(1)} ::= and_expression
exclusive_or_expression \italic{(2)} ::= exclusive_or_expression \bold{^}
and_expression
and_expression \italic{(1)} ::= equality_expression
and_expression \italic{(2)} ::= and_expression \bold{&}
equality_expression
equality_expression \italic{(1)} ::= relational_expression
equality_expression \italic{(2)} ::= equality_expression PC_EQUAL
relational_expression
equality_expression \italic{(3)} ::= equality_expression PC_NOT_EQUAL
relational_expression
relational_expression \italic{(1)} ::= shift_expression
relational_expression \italic{(2)} ::= relational_expression \bold{<}
shift_expression
relational_expression \italic{(3)} ::= relational_expression \bold{>}
shift_expression
relational_expression \italic{(4)} ::= relational_expression PC_LE
shift_expression
relational_expression \italic{(5)} ::= relational_expression PC_GE
shift_expression
shift_expression \italic{(1)} ::= additive_expression
shift_expression \italic{(2)} ::= shift_expression PC_LEFT
additive_expression
shift_expression \italic{(3)} ::= shift_expression PC_RIGHT
additive_expression
additive_expression \italic{(1)} ::= multiplicative_expression
additive_expression \italic{(2)} ::= additive_expression \bold{+}
multiplicative_expression
additive_expression \italic{(3)} ::= additive_expression \bold{-}
multiplicative_expression
multiplicative_expression \italic{(1)} ::= cast_expression
multiplicative_expression \italic{(2)} ::= multiplicative_expression
\bold{*}
cast_expression
multiplicative_expression \italic{(3)} ::= multiplicative_expression
\bold{/}
cast_expression
multiplicative_expression \italic{(4)} ::= multiplicative_expression
\bold{%}
cast_expression
cast_expression \italic{(1)} ::= unary_expression
cast_expression \italic{(2)} ::= \bold{(} type_name \bold{)}
cast_expression
unary_expression \italic{(1)} ::= postfix_expression
unary_expression \italic{(2)} ::= PC_INCR unary_expression
unary_expression \italic{(3)} ::= PC_DECR unary_expression
unary_expression \italic{(4)} ::= unary_operator cast_expression
unary_expression \italic{(5)} ::= PC_SIZEOF unary_expression
unary_expression \italic{(6)} ::= PC_SIZEOF \bold{(} type_name \bold{)}
unary_operator \italic{(1*)} ::= \bold{&}
unary_operator \italic{(2*)} ::= \bold{*}
unary_operator \italic{(3*)} ::= \bold{+}
unary_operator \italic{(4*)} ::= \bold{-}
unary_operator \italic{(5*)} ::= \bold{~}
unary_operator \italic{(6*)} ::= \bold{!}
postfix_expression \italic{(1)} ::= primary_expression
postfix_expression \italic{(2)} ::= postfix_expression \bold{[}
expression
\bold{]}
postfix_expression \italic{(3)} ::= postfix_expression \bold{(} \bold{)}
postfix_expression \italic{(4)} ::= postfix_expression \bold{(}
argument_expression_list \bold{)}
postfix_expression \italic{(5)} ::= postfix_expression \bold{.}
identifier
postfix_expression \italic{(6)} ::= postfix_expression PC_DEREF
identifier
postfix_expression \italic{(7)} ::= postfix_expression PC_INCR
postfix_expression \italic{(8)} ::= postfix_expression PC_DECR
primary_expression \italic{(1)} ::= identifier
primary_expression \italic{(2)} ::= constant
primary_expression \italic{(3*)} ::= PC_STRING_CONSTANT
primary_expression \italic{(4)} ::= \bold{(} expression \bold{)}
argument_expression_list \italic{(1)} ::= assignment_expression
argument_expression_list \italic{(2)} ::= argument_expression_list
\bold{,}
assignment_expression
constant \italic{(1*)} ::= PC_INTEGER_CONSTANT
constant \italic{(2*)} ::= PC_CHARACTER_CONSTANT
constant \italic{(3*)} ::= PC_FLOATING_CONSTANT
constant \italic{(4*)} ::= PC_ENUMERATION_CONSTANT
identifier \italic{(1*)} ::= PC_IDENTIFIER
}
A node is emitted for every production in the above list. The production
type
of the node can be retrieved with PC_NodeType. The value of this
function is
a constant whose name is the string "pnt_" ("Parsec Node Type") followed
by
the name of the production, such as pnt_module or
pnt_storage_class_specifier.
When a production node is emitted, it has a subtype in addition to a
production type. The subtype is the parenthesized number in the above
grammar, and can be retrieved with PC_SubType. When the subtype in the
above
grammar is followed by an asterisk, then a token node is emitted rather
than a
production node. A token node is a node PC_ParseNode_t object, but it
contains a token (PC_Token_t) rather than some number of children. Token
nodes do not have subtypes, but the contents of the token are enough to
determine the exact rule which was recignized for that node. Tokens come
in
three flavors: character tokens, text tokens and code tokens. A
character
token is one of which PC_IsCharToken is true; it contains a single
character
which corresponds exactly to a character that was recognized in the input
(accessible via PC_TokenChar). A text token is one of which
PC_IsTextToken is
true and contains a value (accessible via PC_TokenVal) and a string
(accessible via PC_TokenText). The value indicates what kind of input
was
recognized and the string contains the actual input. The following kinds
of
tokens have text associated with them: Identifiers (PC_IDENTIFIER),
Constants
(PC_STRING_CONSTANT, PC_INTEGER_CONSTANT, PC_CHARACTER_CONSTANT,
PC_FLOATING_CONSTANT, PC_ENUMERATION_CONSTANT), and typedef names
(PC_TYPEDEF_NAME). A code token is one containing one of the constant
values
given above in the grammar, such as PC_INCR (which corresponds to the
input
string "++"), or PC_TYPEDEF (which corresponds to the input string
"typedef").
When a production node has children, those children only correspond to
the
sub-productions recognized in a given rule; no intervening input tokens
are
saved. So, for example, if you have a node whose type is
pnt_enum_specifier
and it has one child, the only way to tell whether it corresponds to
\indent{\bold{enum} \bold{\{} \italic{child-tokens} \bold{\}}
}
or to
\indent{\bold{enum} \italic{child-tokens}
}
is to examine the subtype (in this case, the subtype will be 1 or 3).
There are many rules in the grammar of the form
\indent{foo ::= bar
}
that is, where a production contains exactly one subrule and no
intervening
tokens. If the "fold" argument to PC_Parse is non-zero, then the
resulting
tree will be abbreviated to eliminate nodes corresponding to productions
like
"foo" above. For example, consider the following (very short!) C source
file:
\indent{\typewriter{int num;
}}
Without folding, the parse tree for this file looks like this:
\begindata{zip,538598408}
%ViewWidth 438
%ViewHeight 268
*D;-1000,1400
N8.5X11
>-1000,1400
*A;34,1224
Fandysans8b
Tpnt_module (1)
MCM
*A;51,916
Fandysans8b
Tpnt_external_declaration (2)
MCM
*A;-942,179
Fandysans8b
Tpnt_declaration_specifiers (3)
MCM
*A;34,599
Fandysans8b
Tpnt_declaration (2)
MCM
*A;873,188
Fandysans8b
Tpnt_init_declarator_list (1)
MCM
*A;-950,-222
Fandysans8b
Tpnt_type_specifier: PC_INT
MCM
*A;839,-214
Fandysans8b
Tpnt_init_declarator (1)
MCM
*A;805,-573
Fandysans8b
Tpnt_declarator (1)
MCM
*A;805,-925
Fandysans8b
Tpnt_direct_declarator (1)
MCM
*A;779,-1233
Fandysans8b
Tpnt_identifier: num
MCM
*C;0,1147
>8,1002
*C;8,813
>8,693
*C;-17,505
>-916,274
*C;-916,111
>-916,-128
*C;8,505
>779,265
*C;796,137
>796,-111
*C;796,-265
>796,-496
*C;796,-625
>796,-830
*C;788,-985
>788,-1147
\enddata{zip,538598408}
\view{zipview,538598408,98,0,270}
\begindata{bp,538270984}
\enddata{bp,538270984}
\view{bpv,538270984,99,0,0}
With folding, the parse tree is abbreviated to this:
\begindata{zip,538599176}
%ViewWidth 526
%ViewHeight 180
%ObjectWidth 571
%ObjectHeight 306
*D;-1000,1400
N8.5X11
>-1000,1400
*A;34,253
Fandysans8b
Tpnt_declaration (2)
MCM
*A;-884,-386
Fandysans8b
Tpnt_type_specifier: PC_INT
MCM
*A;820,-386
Fandysans8b
Tpnt_identifier: num
MCM
*C;-25,176
>-995,-304
*C;-8,176
>801,-304
\enddata{zip,538599176}
\view{zipview,538599176,100,573,182}
The uses of Parsec are numerous, however its usefulness is limited by the
fact
that it can only recognize C code proper; all pre-processor macros must
be
resolved before the parser can recognize the input. Therefore, it is
customary to do the following to a C file before letting parsec process
it:
\indent{cc -E -I\italic{include-directories} ... -D\italic{defines} ...
source.c | your-parsec-application
}
Parsec has been used to implement yyhide, a Yacc/Lex postprocessor which
makes
selected identifiers static. It can also be used to write source-code
indenters, call-graph generators, etc.
\begindata{bp,537558784}
\enddata{bp,537558784}
\view{bpv,537558784,102,0,0}
Copyright 1992 Carnegie Mellon University and IBM.
All rights reserved.
\smaller{\smaller{$Disclaimer:
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose is hereby granted without fee,
provided that the above copyright notice appear in all copies and that
both that copyright notice, this permission notice, and the following
disclaimer appear in supporting documentation, and that the names of
IBM, Carnegie Mellon University, and other copyright holders, not be
used in advertising or publicity pertaining to distribution of the
software
without specific, written prior permission.
IBM, CARNEGIE MELLON UNIVERSITY, AND THE OTHER COPYRIGHT HOLDERS
DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
IN NO EVENT
SHALL IBM, CARNEGIE MELLON UNIVERSITY, OR ANY OTHER COPYRIGHT HOLDER
BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
OF THIS SOFTWARE.
$
}}\enddata{text,538290320}
Download