Introduction
For the building of CsBML generator we used a parser generator. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. Java Compiler Compiler or JavaCC is a parser generator for use with Java applications. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.
JavaCC works with any Java VM version 1.2 or greater.
JavaCC itself is not a parser or a lexical anaylzer but a generator. This means that it outputs lexical analyzers and parser according to a specification that it reads in from a file. JavaCC produces lexical analysers and parsers written in Java. Using a parser generator such as JavaCC, the rules for integer constants and floating-point constants are written separately and the commonality between them is extracted during the generation process. This increased modularity means that specification files are easier to write, read, and modify compared with a hand-written Java programs.
Building the Translator in javaCC
The language will be defined inside of a file named Translator.txt. This file will be "compiled" by the
JavaCC tool into a set of Java class files of type .java. To generate language, you need to do the following things:
1. Options and class declarations
2. Specifying the lexical analyzer
3. Specifying the parser
4. Generating the parser and lexical analyzer
Consider the file Translator.txt. This file contains the JavaCC specification for the parser and the lexical analyser and will be used as input to JavaCC the program.
The options section at the top specifies a set of options for that grammar. We specify a lookahead of 2.
Additional options control JavaCC 's debugging features and more. Those options can alternatively be specified on the JavaCC command line. JavaCC takes the portion of the file that lives between
PARSER_BEGIN and ends with PARSER_END and copies it directly into the resulting java file. This is the place where we can place all of our pre- and post-parsing activity related to the parser. This is also the place where you can link your java code to the parser actions.
The first part of the file is
____________________________________________________________________________
Options
{
LOOKAHEAD=2;
}
PARSER_BEGIN(Translator) public class Translator
{ public static void main( String[] args ) throws ParseException, TokenMgrError {
Translator parser = new Translator( System.in ) ;
while (true)
{
parser.parseFile();
}
}
}
PARSER_END(Translator)
_____________________________________________________________________________
The SKIP section identifies the characters we want to skip. In our case, those are the white-space characters, tab and new line. Next, we define the tokens of our language in the TOKEN section.
We define numbers and digits as tokens. Note that JavaCC differentiates between definitions for tokens and definitions for other production rules. The SKIP and TOKEN sections specify this grammar's lexical analysis. Each of these lines is called a regular expression production .
There is one more kind of token that the generated lexical analyser can produce, this
has the symbolic name EOF and represents the end of the input sequence. There is no need to have a regular expression production for EOF; JavaCC deals with the end of the file automatically.
______________________________________________________________________
SKIP : { " " }
SKIP : { "\t"| "\r"| "\n"}
TOKEN : {
<assign: "Assign">
| <BOTransaction: "BOTransaction">
| <BOInput: "BOInput">
| <abort: "Abort">
| <call: "Call">
| <put: "Put">
| <get: "Get">
| <IfThen: "IfThen">
| <IfThenElse: "IfThenElse">
| <getList: "GetList">
| <getRow: "GetRow">
| <BOReturn: "BOReturn">
| <BOReturnElement: "BOReturnElement">
| <BOReturnElementTag: "BOReturnElementTag">
| <HigherTagStart: "HigherTagStart">
| <HigherTagEnd: "HigherTagEnd">
| <relation: "EQ"|"NE"|"LT"|"GT"|"TE"|"GE"|"EQI"|">"|"=">
| <Identifier: <lowercaseAndOtherCharacters>>
| <variable: (<literalValue>|<Identifier>)+>
| <literalValue: <Identifier>>
| <lowercaseAndOtherCharacters: (["A"-"Z","a"-"z","0"-"9",".","_"])+>
| < br: ";">}
The specification of the parser consists of what is called a BNF production.
As an example let’s define the production rule for BOInput, the top-level grammar element.
It looks a little like a
Java method definition.
String BOInput():
{
Token t;
String a="";
String b="";
String c="";
String k="";
String f="";
String d="";
String g="";
}
{
<BOInput>
"{"
(c=parseFile3() {f+=c;})+
"}"
{ return "<BOInput>"+c+"</BOInput>"; }
}
Having constructed the Translator.txt file, we invoke JavaCC on it. Exactly how to do this depends a bit on the operating system. Below is how to do it on Windows NT, 2000, and XP.
First using the “command prompt” program (CMD.EXE) we run JavaCC:
C:\javacc-5.0\bin>javacc Translator.txt
Java Compiler Compiler Version 5.0 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file Translator.txt . . .
Warning: Lookahead adequacy checking not being performed since option LOOKAHEAD is more than 1. Set option FORCE_LA_CHECK to true to force checking.
File "TokenMgrError.java" does not exist. Will create one.
File "ParseException.java" does not exist. Will create one.
File "Token.java" does not exist. Will create one.
File "SimpleCharStream.java" does not exist. Will create one.
Parser generated with 0 errors and 1 warnings.
ScreenShot:
This generates seven Java classes, each in its own file:
• TokenMgrError is a simple error class; it is used for errors detected by the lexical analyser and is a subclass of Throwable.
• ParseException is another error class; it is used for errors detected by the parser and
is a subclass of Exception and hence of Throwable.
• Token is a class representing tokens. Each Token object has an integer field kind that represents the kind of the token (BOTransaction, BOInput, etc) and a String field image, which represents the sequence of characters from the input file that the token represents.
• SimpleCharStream is an adapter class that delivers characters to the lexical analyser.
• TranslatorConstants is an interface that defines a number of classes used in both the lexical analyser and the parser.
• TranslatorTokenManager is the lexical analyser.
• Translator is the parser.
We can now compile these classes with a Java compiler:
C:\javacc-5.0\bin>javac *.java
Then we run the input file against Translator class
C:\javacc-5.0\bin>java Translator <input.txt >op.xml
SampleInput file:
BOInput
{
BOTransaction WhatHenPush(task_id act_id)
{
// Fragment of the code;
Put Org (abc > def);
Put Act (abc = def);
GetRow act cat_code (act_id EQ task_act_id),(name_txt EQ abc);
Get act cat_code (act_id EQ task_id),(name_txt EQ abc);
IfThenElse(bbb EQ 56)
{
// its a comment;
Abort sorry for the inconvinience;
}
{
Assign abc_2 def_3;
}
Call Output .;
}
BOTransaction Output (act_id)
{
BOReturn
{
BOReturnElement
{
BOReturnElementTag Result ok;
}
}
}
}
Output File:::
<?xml version="1.0" encoding="UTF-8"?>
<BOInput>
<BusinessObjectTransaction>
<transactionName> WhatHenPush </transactionName>
<parameter> task_id </parameter>
<parameter> act_id </parameter>
<!--Fragment of the code -->
<tableQuery>
<databaseTable> Org </databaseTable>
<queryAction> PUT </queryAction>
<columnReference>
<columnName> abc </columnName>
<workingVariable increment = "yes" > def </workingVariable>
</columnReference>
</tableQuery>
<tableQuery>
<databaseTable> Act </databaseTable>
<queryAction> PUT </queryAction>
<columnReference>
<columnName> abc </columnName>
<workingVariable> def </workingVariable>
</columnReference>
</tableQuery>
<tableQuery>
<databaseTable> act </databaseTable>
<queryAction> GET </queryAction>
<resultName type = "Row" > cat_code </resultName>
<columnReference>
<columnName> act_id </columnName>
<workingVariable> task_act_id </workingVariable>
</columnReference>
<columnReference>
<columnName> name_txt </columnName>
<workingVariable> abc </workingVariable>
</columnReference>
</tableQuery>
<tableQuery>
<databaseTable> act </databaseTable>
<queryAction> GET </queryAction>
<resultName> cat_code </resultName>
<columnReference>
<columnName> act_id </columnName>
<workingVariable> task_id </workingVariable>
</columnReference>
<columnReference>
<columnName> name_txt </columnName>
<workingVariable> abc </workingVariable>
</columnReference>
</tableQuery>
<ifThenElse>
<compare> bbb </compare>
<relation> EQ </relation>
<literalValue> 56 </literalValue>
</ifThenElse>
<codeBlock>
<!--its a comment -->
<abort> sorry for the inconvinience </abort>
</codeBlock>
<codeBlock>
<Assign>
<workingVariable> abc_2 </workingVariable>
<to> def_3 </to>
</Assign>
</codeBlock>
<Call>
<boName> Output </boName>
<anchorTag> .
</anchorTag>
</Call>
</BusinessObjectTransaction>
<BusinessObjectTransaction>
<transactionName> Output </transactionName>
<parameter> act_id </parameter>
<BusinessObjectReturn>
<BOReturnElement>
<tag> Result </tag>
<literalValue> ok </literalValue>
</BOReturnElement>
</BusinessObjectReturn>
</BusinessObjectTransaction>
</BOInput>
Link to download JavaCC: https://javacc.dev.java.net/