DSL - A .NET Implementation Rajeswar Majumdar Architect- R&D- Global Product & Technology ADP India |2 Introduction Domain specific languages (DSL) are around for quite some time and widely in use for software development. In this article we will discuss what DSLs are and their characteristics,some basic concepts of computer science language theory relevant to DSL and finally an example of DSL implementation using .NET. What is a Domain Specific Language (DSL) A domain specific language (DSL) is a computer programming language of limited expressiveness that is focused on a particular domain. A DSL is worthwhile and conveys meaning only in the context of the particular domain. The characteristic which makes DSL distinct from general purpose languages is ‘limited expressiveness’. With a general purpose language like Java or C#, we can build almost any kind of system- small or big, PC or mobile or web enabled, with rich user interface or no user interface at all and in any functional domain like banking, HR etc. This requires supporting different data structures, control flow, abstraction and many more features. This also makes these languages too vast and difficult to understand or master for nonprogrammers. On the other hand, a DSL being focused on a particular problem domain can be defined using bare minimum of syntactical rules which are sufficient for the domain in question. This makes it possible to define a DSL which closely resembles a natural language like English (with a very small subset of words and strict syntactical rules). So it becomes easier for non-programmers like domain experts and business analysts to understand if not write programs in DSL. Kinds of DSL- Internal and External Some examples of DSLs which we use in everyday programming are HTML (domain- Web user interface), Sql (domain- RDBMS) or CSS (domain- HTML styling). These are examples of ‘external’ DSLs which are completely different from general purpose languages and do not use any feature of a general purpose language. However it is possible to create a so-called ‘internal’ DSL within a general purpose language itself that uses only limited features of that language. One such example is ‘Fluent Interface’ pattern in C# which uses a special chaining syntax for method call and is suitable where C# code needs to be mixed with other languages like HTML (e.g. ASP.NET MVC View pages). Rule engines are closely related to DSL but they are different from DSL because of the fact that rule engines are also general purpose rather than focused on a specific domain. However rule engines often uses some form of DSL for defining the rules. For example, IBM JRULE rule engine uses an Xml based DSL for defining rules and Microsoft .NET Workflow Foundation rule engine uses a C# internal DSL for rule definition. DSL Basic Concepts Optional DSL Script Parse Domain Model Architecture Code Generator Target Code DSL |3 The implementation of an external DSL requires a DSL script, a parser and optionally a code generator. The DSL script adheres to the limited grammatical or syntactical rules defined for the DSL. The parser parses the DSL to generate an instance of the domain model specific to the domain in consideration. This domain model is the crux of the design and should be conceptualized prior to designingthe DSL. The domain model drives the design of the DSL and parser. The generated instance of the domain model is manipulated by the application code to produce the DSL program output. Optionally the domain model can be further processed to generate code in some general purpose language like Java or C# which is then compiled and executed just like hand written code. Before we go into the details of DSL implementation, we need to understand a few important concepts related to general computer science language theory. Any so called ‘program’ written in any computer language, is nothing but a series of characters. The process of converting the stream of characters to a sequence of tokens is called lexical analysisor tokenization. A token is a string of one or more characters that is significant or meaningful as a group according to the grammar of that particular language.The component which does this tokenization is called a lexer or tokenizer. These tokens are then parsed or syntactically analyzed by a parserto generate a hierarchical tree structure of tokens which is called a parse tree. Often the parser itself will combine the functions of lexer and parser. The parse tree is also called a syntax tree or Abstract Syntax Tree (AST). The AST represents the syntactical structure of the program i.e. which code blocks or statements are contained within what. The syntax is "abstract" in not representing every detail appearing in the real syntax. This parse tree is then used by an interpreter to perform semantic analysis (and generate code in some lower level language in case of compiler). Then the interpreter takes appropriateactions using the validated expressions in the parse tree and contextual data supplied by the hosting run time system. Sequence of Characters Lexer (Lexical Analysis) Grammar Tokens Parser (Syntactical Analysis) Grammar Parse Tree Contextual Data Interpreter (Semantic Analysis) Output Language process flow Grammar |4 The grammar or rules of the language can be expressed or defined in various ways. One of the standard ways of specifying this is Backus-Naur Form (BNF). In this notation, the grammatical rules are specified as follows: <symbol>::= __<expression>__ For example <postal-address>::=<name-part><street-address><zip-part> In this article, we will use the BNF notation for specifying the DSL rules. External DSL Implementation - EmployeeLeave EligibilityDetermination As discussed previously, DSL are especially useful for implementing application code which is subjected to change often. This makes DSL suitable for Human Capital Management (HCM) domain where the employee benefits, entitlements, taxation etc. often change based on company policies or regulatory requirements. In this example, we will consider a simple use case from HCM domain which is validating the leave eligibility of an employee. External DSLs require a hosting environment which would create the run time and execute the DSL script in the context of a domain model. We would be using Microsoft .NET to provide this hosting environment. Irony (http://irony.codeplex.com/) is an open source language implementation kit developed in and for .NET framework. This provides in- built components for defining language grammar and comes with an inbuilt parser and interpreter to parse and execute the source code script based on a particular language grammar. In this example, we would be using Irony to define grammar, creating the AST from source code script and execute the source code script in the context of our domain model. Below is the domain model we would be working with: Domain Model The Employee entity represents an employee in the organization who can apply for a leave. The leave applied by an employee is represented by the LeaveApplication entity. One Employee instance can be associated with zero or more LeaveApplication instances. Each leave application can of any of the following leave types: Casual, Earned or Optional Holiday. This is represented by the LeaveType enumeration. For our purpose the following are the significant attributes of our domain entities: Employee: LeaveBalances (Dictionary<LeaveType,int>): A dictionary object which contains the leave balance days (integer) keyed by the corresponding leave type enum value. Obviously whether an employee is eligible for leave depends on his leave balance for that type of leave. |5 DaysWorked (int): The number of days worked by the employee in the current organization since his joining. She can avail certain types of leaves only if she has spent a minimum tenure with the organization. LeaveApplication: Duration (int): The number of days for which the leave is applied. For simplicity we assume that only full day leaves are allowed i.e. duration is always integer. LeaveType (LeaveType enum): The type of the leave being applied. Obviously eligibility varies depending on the leave type. With the domain entities defined, let’s now come to the business logic to be implementedthat will decide whether a leave application is valid or invalid based on employee’s eligibility. Our business rules require that: 1. If the employee’s leave balance is not sufficient for the duration of the leave applied, then the leave application is invalid 2. Employee cannot apply for Casual leave if the leave duration is more than a couple of days 3. Employee has to spend at least 30 days in the organization before she becomes eligible for Earned or Option Holiday leaves Since the above rules are often subjected to change, so we decided to implement the above logic as external DSL so that it can be understood and modified by the Human Resource department without the help of developers of HCM software. Also any change in the business logicwould not require recompilation of the HCM system’s source code. First step of implementing any external DSL is to define the language grammar. As mentioned before, we would use BNF to represent the grammar. <program>:=<statementList> <statementList>:= <statement>+ <statement>:= (<expression> ";") | <incaseStatement> <incaseStatement>:= "in case" <booleanExpression> "then" "{" <statementList> "}" [otherwiseStatement] <otherwiseStatement>:= "otherwise" "{" <statementList> "}" <expression>:= <Number>| <String> | <boolean> | <balance> | <daysWorked> | <duration>| <booleanExpression> | <arithmeticExpression> | <parenExpression> | <resultAssignmentExpression> <parenExpression>:= "(" <expression> ")" <booleanExpression>:= <expression><booleanOperator><expression> | <leaveTypeOptions> <arithmeticExpression>:= <expression><arithmeticOperator><expression> <resultAssignmentExpression>:= "application" <validInvalid> <booleanOperator>:= "="|"<"|">"|">="|"<="|"and"|"or" <arithmeticOperator>:= "+"|"-"|"*"|"/" <validInvalid>:= "valid"|"invalid" <balance>:= "balance" <daysWorked>:"daysWorked" <leaveTypeOptions>:= "OH"| "casual"| "earned" <duration>:= "duration" Leave Validation Grammar- BNF The grammar defines the rules for various elements (i.e. tokens) in a hierarchical structure. Note that at the lowest level, we have defined a number of keywords like “duration”, “balance” etc. which has specific meanings for only our DSL and in a particular context which is our domain model. The higher level elements consist of combinations of lower level elements. For example, an “arithmeticExpression” will consist of an “expression” followed by an “arithmeticOperator” followed by another “expression” (e.g. |6 balance – duration). Similarly an expression can be either a “number” or a “string” or “booleanExpression” or “arithmeticExpression” etc. The “resultAssignmentExpression” rule might be of particular interest. This would be used to set whether the leave application is valid or invalid and can be written in either of the following way: application valid; or application invalid; The conditional statements can be written using in case <condition> then { //statements } otherwise { //statements } This is equivalent to “if” statement in C#. Similar to C# where “else” statement is optional, “otherwise” if also optional and hence appear enclosed in square brackets in the “incaseStatement” rule definition. Note that element rules can be recursive for example the rule for “expression” element contains “expression” on the right hand side of the rule definition indirectly via “booleanExpression”, “arithmeticExpression” etc. The topmost element of the grammar is the “program” element which consists of “statementList” element which in turn consists of one or more “statement” elements. Hence the “+” sign appears following the “statement” element in the “statementList” rule definition. Similarly “*” denotes that zero or more elements and “|” denotes that either of the adjacent elements are allowed. With the grammar defined in BNF notation, we will proceed with defining the grammar using Irony and subsequently writing “code” in our DSL which would be parsed and interpreted in the context of our domain model. The sample code accompanying this article consists of the following solution structure: |7 It is written on .NET framework 4.5 and would require Visual Studio 2012 or later to open. It consists of the following projects: DSL.Sample.DomainEntities: Contains the domain entity classes as described before. DSL.Sample.Language: Contains the implementation of the language grammar. Also contains implementation of custom grammar elements to evaluate each element in the context of our domain. DSL.Sample.Tests: This is a visual studio unit test project to unit test the grammar and custom grammar elements against a given “source code” program of our DSL. Apart from these, the “Scripts” solution folder contains the grammar defined in BNF (LeaveValidationGrammar_BNF.txt) and the sample script written in this DSL (LeaveValidationScript.txt) which implements the business logic described before. The DSL.Sample.Language project contains reference to the following two Irony dlls: Irony.dll: It contains the support for defining the grammar. Also contains the parser among other framework elements. Irony.Interpretor.dll: Contains the support for execution of source code script written in a DSL based on a pre-defined grammar. The LeaveApplicationValidationGrammarclass in the DSL.Sample.Language defines the grammar (described before in BNF) as C# statements. This class has to derive from the Irony supplied Irony.Parsing.Grammar class which provides the base methods for defining the grammar. The grammar rules must be defined inside the constructor inside this class. We want our language to be case insensitive, so while calling the base constructor we pass the “caseSensitive” parameter as “false”. Also the whitespaces, new lines etc. are ignored automatically during lexical analysis. By default, Irony does not create Abstract Syntax Tree (AST) node while generating parse tree from source code script. Since we want to generate the AST node for each of our grammar elements (unless otherwise specified), we have to set the LanguageFlag accordingly.It is the custom implementation of AST Nodes where we would define our own node evaluation logic. The parse tree consists of “NonTerminals” (non-leaf nodes) and “Terminals” (leaf nodes). The “number” and “text” nodes use Irony provided “NumberLiteral” and “StringLiteral” terminal nodes for rule definition. It is important to set the TermFlags asNoAstNodefor these terms as we are not providing any AstNode implementation for these terms. All the other nodes are defined as “NonTerminal” so that we can implement our custom execution logic via the “nodeType” parameter of its constructor.More on this is explained later. Each non terminal definition corresponds to each rule in the BNF grammar definition. For example: The C# statement booleanOperator.Rule = ToTerm("<") | "=" | ">" | ">=" | "<=" |"and" | "or" | leaveTypeKeyword; is equivalent to corresponding BNF form <booleanOperator>:= "="|"<"|">"|">="|"<="|"and"|"or" | <leaveTypeKeyword> Note the ToTerm() call at the start of the C# rule which converts the string “<” to an object of Irony defined KeyTerm class (a subclass of Terminal class) and subsequently the overloaded “|” operator is used to build the rule. Irony has also overloaded “+” operator to indicate concatenation of BNF terms. |8 Similarly it provided methods like MakeStarRule(), MakePlusRule() etc. to define rules like <statement>*, <statement>+ etc. Each grammar requires a root to be defined for the parse tree and in our case “program” node is the root. Based on this grammar, the following is the source code script in our own DSL that implements the business logic we have described before: in case (duration > balance) then { application invalid; } otherwise { in case(casual and (duration > 2)) then { application invalid; } otherwise { in case((OH or earned) and (daysWorked < 30)) then { application invalid; } otherwise { application valid; } } } With the grammar defined and script written, we can generate the parse tree based on this grammar and script. The LeaveApplicationValidationProcessor class in the same project defines the method ProcessLeaveApplicationValidation() to generate the parse tree (and later to evaluate the parse tree based on domain model data). The parse tree generation code is very simple: LeaveApplicationValidationGrammarleaveValidationGrammar = newLeaveApplicationValidationGrammar(); var parser = newParser(leaveValidationGrammar); varparseTree = parser.Parse(script); In case there is any error ((according to the grammar) in the source script, the code does not throw any exception but it populates the ParserMessages property of the ParseTree with the error message. In case there is no error during parsing, the ParseTree.Root will contain the root element which will in turn contain subsequent tree elements in hierarchical structure. The ParseTree.Tokens property will contain all the parsed tokens in a flat list. Irony also provides a utility application to visualize the parse tree as a tree structure. |9 Once the parse tree is generated from the source code script, we need to evaluate or execute the parse tree based on a supplied domain model instance. This will in turn require evaluation of each node or term in the parse tree. Since how each node would be evaluated depends on our language and domain, we need to implement the evaluation logic for each non terminal node. This custom evaluation logic is defined in the classes under the “Nodes“ folder of DSL.Sample.Language project. Each of these custom nodes need to derive from Irony provided Irony.Interpreter.Ast.AstNode class. In our case we have provided an intermediate class named BaseLeaveAstNode which derives from AstNode and all the custom node classes are the subclass of this. This defines some application specific common methods to be used by all custom node classes. The AstNode class provides two virtual methods which are of importance to us: Init: This gets invoked during the creation of parse tree as opposed to the evaluation of parse tree. The parse tree node is one of the parameters to this which or whose attribute values can be stored internally for usage during evaluation. DoEvaluate: This gets invoked during the evaluation of the parse tree and accepts a parameter of type ScriptThread through which any runtime data can be supplied. We can use the tree node attributes cached in Init call to recursively call Evaluate of child nodes from this method. Let’s now look an example of such custom node implementation. We can recall that we have defined a non-terminal named “balance” which is intended to return the leave balance of an employee for a particular leave type depending on the leave application being validated. Following is the grammar definition of this element: var balance = newNonTerminal("balance", typeof(BalanceNode)); //leave balance balance.Rule = ToTerm("balance"); And following is the implementation of the “BalanceNode” class: protectedoverrideobject DoEvaluate(ScriptThread thread) { base.DoEvaluate(thread); varleaveBalances = GetDataContext(thread).Employee.LeaveBalances; varleaveType = GetDataContext(thread).LeaveApplication.LeaveType; returnleaveBalances != null&&leaveBalances.ContainsKey(leaveType) ? leaveBalances[leaveType] : 0; } The GetDataContext() method is defined in BaseLeaveAstNode class which returns an object of LeaveValidationDataContext class that contains the Employee and LeaveApplication objects in question and also exposes IsValidApplication property to be set by the parse tree evaluation. As we can see, the BalanceNode’s DoEvaluate method returns the number of leave balances (or zero in case not available) for the particular leave type from the Employee object’s LeaveBalances dictionary property. This balance value would in turn be used by higher level nodes in their own implementation of DoEvaluate method. | 10 Another example is of the BooelanExpressionNode which caches the child nodes in Init methods and then uses those in DoEvaluate. publicclassBoolExpressionNode : BaseLeaveAstNode { privateExpressionNodeleftNode; privateExpressionNoderightNode; privateBooleanOperatorNodeoperatorNode; privateLeaveTypeKeywordNodeleaveTypeKeywordNode; protectedoverrideobject DoEvaluate(ScriptThread thread) { base.DoEvaluate(thread); if (leaveTypeKeywordNode != null) { returnleaveTypeKeywordNode.Evaluate(thread); } objectleftValue = leftNode.Evaluate(thread); objectrightValue = rightNode.Evaluate(thread); returnoperatorNode.Function(leftValue, rightValue); } publicoverridevoid Init(AstContext context, ParseTreeNodetreeNode) { base.Init(context, treeNode); if (treeNode.ChildNodes.Exists(n =>n.Term.Name == "leavetypekeyword")) { leaveTypeKeywordNode = treeNode.ChildNodes.First(n =>n.Term.Name == "leavetypekeyword").AstNode asLeaveTypeKeywordNode; } else { leftNode = treeNode.ChildNodes[0].AstNode asExpressionNode; rightNode = treeNode.ChildNodes[2].AstNode asExpressionNode; operatorNode = treeNode.ChildNodes[1].AstNode asBooleanOperatorNode; } } } For evaluating a parse tree, we have to pass the domain objects. This can be passed via the ScriptThread parameterin the AstNode.Evaluate() method. Following is the code for invoking parse tree evaluation: LanguageDatalanguageData = newLanguageData(leaveValidationGrammar); LanguageRuntime runtime = newLanguageRuntime(languageData); ScriptApp app = newScriptApp(runtime); ScriptThread thread = newScriptThread(app); thread.App.Globals.Add("CONTEXT", context); varastNode = parseTree.Root.AstNodeasAstNode; | 11 if (astNode != null) astNode.Evaluate(thread); The ScriptThread.App.Globals is a dictionary data structure where any arbitrary data can be stored and this can be retrieved in the individual node’s DoEvaluate() method to access context sensitive data. With all the code and infrastructure set, we can now proceed to test our DSL implementation using Visual Studio unit tests. The LeaveValidationTests class of the unit tests project defines the unit test method to test the DSL implementation. Consider the below unit tests method: [TestMethod] publicvoidTestLeaveApplicationInValidCasual() { Employeeemployee = CreateEmployeeData(); LeaveApplication application = newLeaveApplication(); application.Duration = 4; application.LeaveType = LeaveType.Casual; var context = newLeaveValidationDataContext(employee, application); LeaveApplicationValidationProcessor.ProcessLeaveApplicationValidation(context, GetValidationScript()); Assert.IsFalse(context.IsValidApplication); } As per our business rule, casual leave duration cannot exceed 2 days. So in this case, the leave application is invalid. That is why we are checking whether context.IsValidApplication is “false” in the assert statement. If we run this unit test, it should pass. Conclusion In this article, we have discussed what are DSL and its basic concepts. Then we touched upon some important computer science language theory concepts relevant to DSL. And finally we implemented an external DSL for leave eligibility determination using .NET and Irony language implementation kit. Although we have not implemented in our example, some of the essential features of any external DSL should include: Support for code comments Support exception handling Support for versioning the DSL Logging and tracing in the parser We hope this gives the reader a head start for considering external DSL implementation in their specific domain using any .NET framework language. Bibliography: 1. Domain Specific Language- Martin Fowler & Rebecca Parson 2. Software Architecture in Practice- Bass, Clements & Kazman About the Author Rajeswar Majumdar is an Architect in Time & Labor Management (TLM) product line at ADP India. He is responsible for overall architecture and design of TLM products. He has over 11 years of experience in architecture and design. He is a Certified Project Management Professional (PMP) and member of Project Management Institute (PMI). He also worked with Cognizant Technology Solutions, India as ArchitectTechnology in Insurance and Vertical PriceWaterhouseCoopers as Principal Consultant in Technology Consulting. | 12 About ADP India ADP India is a fully-owned subsidiary of Automatic Data Processing, Inc. (ADP) and is engaged in providing Information Technology and Information Technology Enabled Services to ADP’s business divisions worldwide. ADP Private Limited currently operates in world-class facilities located in Hyderabad and Pune, providing over 8500 associates the opportunity to work on multiple processes, domains and technologies. ADP India provides a complete range of product development services, technical support services and solution center services that involve back office operations covering both voice and non-voice processes.