Abstracting code specific concepts to a graphical representation by pattern matching and refactoring David Flenstrup Kongens Lyngby 2009 IMM-M.Sc.-2009-43 ii Technical University of Denmark Department of Informatics and Mathematical Modeling Richard Petersens Plads, DTU – Building 321 DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 reception@imm.dtu.dk www.imm.dtu.dk Abstract This thesis deals with automatic analysis of Microsoft Dynamics NAV (NAV) application code. NAV (formerly Navision) is an advanced Enterprise Resource Planning (ERP) system like SAP and Axapta, which can handle anything from accounting to stock control for a company. NAV is customized to the individual enterprise using the application language C/AL. The NAV configuration application has grown incrementally, from version to version for more than 20 years, becoming a complex piece of software. With the improvements that have emerged in languages and technologies, the NAV organization now stands before a number of choices. It is clear that the application code has to be reorganized, but it is unclear if the code should be kept in C/AL or moved to C#, because there are obvious benefits from both choices. The aim of this thesis is to contribute to this process with knowledge, by uncovering what exists today and model some chosen suggestions for refactoring the present C/AL code. Our approach is to identify, analyze, and implement recognition of software patterns in the NAV application code. The software patterns we focus on, define relations between objects. The application code’s data model consists of table objects which, among other things, save all data for the enterprises. We use Unified Modeling Language (UML) diagrams to describe the relations we identify between tables. The concepts we identify are relationships which in UML terms are described as containment, aggregation, and generalization. This gives the NAV organization an insight into which implications a change in one table object will have on other objects. An example of one of the relations we identify is the relation between the Customer table, which keeps information on all the company’s customers, and the Customer Bank Account table, which contains information regarding the customers’ bank accounts. From our analysis we can see that the existence of objects of the type Customer Bank Account is conditioned by the existence of a Customer object. This is described with the concept containment in UML. We show that it is possible to analyze the application code and extract concepts which contribute with an overview of selected relations. Furthermore, we show that a graphic dynamic display of the relations is preferable. We have developed a good intuitive way to present our results, where it is possible to make sense of even very large diagrams with the provided filtering methods of the Concept Viewer tool. We extended the project scope underway in the project, and examined the possibility of identifying specific concepts from the accounting ontology Resource, Events, and Agents (REA) in the C/AL code. We found that Events can easily be identified, but that the classification of Resources and Agents has some complications which require a refinement of the chosen approach. iv Résumé Dette speciale omhandler automatiseret analyse af Microsoft Dynamics NAV (NAV) applikationskoden. NAV (tidligere Navision) er et avanceret Virksomhedsstyringssystem (ERP), i stil med SAP og Axapta, der kan håndtere alt fra regnskab til lagerstyring i en virksomhed. NAV tilpasses til den enkelte virksomhed i applikationssproget C/AL. NAV konfigurationsapplikationen har vokset inkrementalt fra version til version i mere end 20 år og er i dag et yderst komplekst stykke software. I takt med at der er kommet generelle forbedringer i sprog og teknologier, står NAV organisationen nu over for en række valg. Det står klart, at applikationskoden skal reorganiseres. Det er dog uklart om koden skal bibeholdes i C/AL eller flyttes til C#, da der er åbenlyse fordele ved begge dele. I dette speciale søger vi gennem omfattende kode analyse at bidrage med viden til denne proces ved at afdække hvad der findes i dag, samt ved at modellere nogle udvalgte forslag til kodeomskrivninger ud fra den nuværende C/AL kode. Vores fremgangsmåde er at identificere, analysere og implementere genkendelse af softwaremønstre i NAV applikationskoden. De softwaremønstre vi fokuserer på udtrykker relationer mellem objekter. Applikationskodens datamodel består af tabelobjekter der blandt andet bruges til at gemme alle data for virksomheden. Vi benytter Unified Modeling Language (UML) diagrammer til at beskrive de relationer vi identificerer mellem tabeller. De koncepter vi identificerer, er relationsbegreberne der i UML termer beskrives som containment, aggregation og generalization. Dette giver NAV organisationen et indblik i hvilke implikationer en ændring i et tabelobjekt vil have på andre objekter. Et eksempel på en af de relationer vi identificerer, er relationen mellem Customer tabellen, der indeholder oplysninger om alle virksomhedens kunder, og Customer Bank Account tabellen, der indeholder oplysninger om kunders bank konti. Fra vores analyse kan vi se at objekter af typen Customer Bank Account eksistens er betinget af eksistensen af et Customer objekt. Dette beskrives med UML begrebet containment. Vi viser at det er muligt at analysere applikationskoden og udtrække koncepter der bidrager med et overblik over udvalgte relationer. Derudover viser vi, at en dynamisk grafisk visning af relationerne er at foretrække. Vi har udviklet en god intuitiv måde at præsentere vores resultater, hvor der kan findes mening i selv meget store diagrammer med de filtreringsmetoder værktøjet Concept Viewer tilbyder. Vi udvidede projektes mål undervejs i projektet og undersøgte muligheden for at identificere specifikke koncepter fra regnskabsontologien Resource, Events, and Agents (REA) ud fra C/AL koden. Vi finder at Events nemt kan identificeres, men at klassificeringen af Resources og Agents har nogle komplikationer der kræver en videreudvikling af den valgte tilgang. vi Preface This thesis was prepared in collaboration with the Department of Informatics and Mathematical Modeling (IMM) at the Technical University of Denmark (DTU) and the Microsoft Dynamics NAV Application Team (APP Team) at Microsoft Development Center Copenhagen (MDCC). This thesis is prepared as the fulfillment of the final requirement for earning the degree of Master of Engineering in Computer Science. The thesis is the result of the work carried out from December 2008 to July 2009 with a workload of 35 ECTS credit. Kongens Lyngby, July 2009 David Flenstrup viii Acknowledgements I would like to thank Microsoft and especially Jesper Kiehn for his commitment to my project. I am really happy with the subject we have chosen to investigate and if it had not been for Jesper’s passion and humongous insight in NAV and REA (and his power of persuasion) this topic could not have been covered. From DTU I would like to thank Peter Falster and Jeppe Revall Frisvad. I am very grateful for the excessive interest and support my project has been given. It has truly been a great experience and an invaluable resource for me. x Table of Contents Abstract ....................................................................................................................................................... iii Résumé ........................................................................................................................................................ v Preface ....................................................................................................................................................... vii Acknowledgements ..................................................................................................................................... ix List of Figures.............................................................................................................................................. xv List of Tables ............................................................................................................................................. xvii List of Formulas ......................................................................................................................................... xix List of Code Samples .................................................................................................................................. xxi 1 2 Introduction ......................................................................................................................................... 1 1.1 Background for the project ........................................................................................................... 1 1.2 Project Aim................................................................................................................................... 1 Enterprise Resource Planning Domain .................................................................................................. 3 2.1 Introduction to Enterprise Resource Planning (ERP) (3)................................................................. 3 2.2 ERP Business Opportunities (10) ................................................................................................... 4 2.3 Microsoft Dynamics NAV (NAV) (11) ............................................................................................. 5 2.3.1 NAV architecture .................................................................................................................. 7 2.3.2 Application Code (14) VS. Product Code ................................................................................ 7 2.3.3 Application Language Transition ........................................................................................... 8 2.4 3 Fundamentals of the C/AL language ............................................................................................. 9 2.4.1 C/AL Design Criteria .............................................................................................................. 9 2.4.2 C/AL Syntax........................................................................................................................... 9 Related Work ......................................................................................................................................13 3.1 What has been accomplished inside Microsoft ............................................................................13 3.1.1 Partial C/AL parser ...............................................................................................................13 3.1.2 Codedub ..............................................................................................................................14 3.1.3 Navision Developer Toolkit (25) ...........................................................................................14 3.1.4 Object Map ..........................................................................................................................14 xii 4 5 3.2 What has not been accomplished inside Microsoft ......................................................................14 3.3 What has been accomplished outside Microsoft ..........................................................................14 3.3.1 Refactoring from Code to UML ............................................................................................14 3.3.2 UML designers .....................................................................................................................16 3.3.3 Refactoring to Resource, Events and Agents (REA) (30) ........................................................16 3.4 What has not been accomplished outside Microsoft....................................................................16 3.5 Related work summary................................................................................................................16 Foundational Relations (33) ................................................................................................................17 4.1 Definition of the Is_a relation ......................................................................................................17 4.2 Definition of the Part_of relation .................................................................................................17 Unified Modeling Language (UML)(34) ................................................................................................19 5.1 6 UML Terminology Overview ........................................................................................................19 5.1.1 Dependency.........................................................................................................................19 5.1.2 Generalization .....................................................................................................................20 5.1.3 Association ..........................................................................................................................20 5.1.4 Aggregation .........................................................................................................................20 5.1.5 Containment........................................................................................................................20 Analysis...............................................................................................................................................21 6.1 Identifying Containment Pattern .................................................................................................21 6.1.1 The role of the Sales Header and Sales Line tables in NAV ....................................................21 6.1.2 Manual analysis of Containment pattern .............................................................................24 6.1.3 Variations in containment pattern .......................................................................................25 6.1.4 Manual identification of additional containments ................................................................26 6.2 Identifying Generalization Pattern ...............................................................................................27 6.2.1 Code smell Large Class and solution Extracting Class ............................................................28 6.2.2 Manual analysis of the Generalization pattern .....................................................................28 6.2.3 Manual analysis of the generalization relationship from Sales Line ......................................30 6.3 Identifying REA Concepts .............................................................................................................31 6.3.1 The Resource, Event and Agent (REA) model (40).................................................................31 6.3.2 Introduction to Accounting Theory ......................................................................................34 6.3.3 Naming of tables in NAV ......................................................................................................36 6.3.4 REA concepts in NAV............................................................................................................36 6.3.5 Manual analysis of REA Events in NAV..................................................................................37 xiii 7 Tools ...................................................................................................................................................41 7.1 .NET Framework ..........................................................................................................................41 7.2 Interoperability ...........................................................................................................................41 7.3 F# ................................................................................................................................................41 7.3.1 8 7.4 LEX, YACC and Abstract Syntax Trees (54) ....................................................................................45 7.5 C#................................................................................................................................................46 7.6 Regular expressions (58), (59) ......................................................................................................47 7.7 Lambda expressions (60), (61) .....................................................................................................47 7.8 LINQ (63) (64) ..............................................................................................................................48 7.8.1 Example 1: Without LINQ .....................................................................................................49 7.8.2 Example 2: With LINQ ..........................................................................................................50 Implementation ..................................................................................................................................51 8.1 Problem with Parser ....................................................................................................................51 8.1.1 CALParser AST to XML AST ...................................................................................................51 8.2 New data representation .............................................................................................................51 8.3 CAL parser extension for table relations ......................................................................................52 8.3.1 Generic parser vs. specific parsing rules ...............................................................................53 8.3.2 Parser Implementation ........................................................................................................53 8.4 Parsing rules (68) .........................................................................................................................54 8.5 LINQ for Querying........................................................................................................................55 8.6 Lambda Expressions in action ......................................................................................................57 8.7 Graph generation ........................................................................................................................58 8.7.1 9 Language syntax ..................................................................................................................42 Microsoft Automatic Graph Layout (69) ...............................................................................59 8.8 Performance boosts ....................................................................................................................59 8.9 Algorithm design .........................................................................................................................60 8.9.1 Containment........................................................................................................................60 8.9.2 Inheritance ..........................................................................................................................61 8.9.3 Inheritance – Reusing generalization objects .......................................................................62 8.9.4 Implementation of REA identification ..................................................................................62 8.9.5 Limitations by approach and solution suggestion .................................................................64 8.9.6 Analysis of initial REA results ................................................................................................65 Results ................................................................................................................................................67 xiv 9.1 TableRelation Analysis and TableRelation Parser Quality Assurance.............................................68 9.1.1 Matches on Key fields ..........................................................................................................68 9.1.2 Matches on all fields ............................................................................................................68 9.2 Results for Containment Pattern..................................................................................................69 9.3 Results for Generalization Pattern ...............................................................................................71 9.4 Results from refactoring the generalization objects .....................................................................72 9.5 The Concept Viewer and its output ..............................................................................................72 9.5.1 Sales Header and Sales Line – Aggregations and Containments ............................................73 9.5.2 Sales Header and Sales Line – Associations ..........................................................................74 9.5.3 Sales Header and Sales Line – Associations refactored via Generalization objects ................74 9.5.4 Sales Header and Sales Line – Associations refactored via refactored Generalization objects 75 9.5.5 Sales Line, Standard Sales Line, and Sales Line Archive –Reused Generalization objects explored 75 9.5.6 Sales Line, Standard Sales Line, and Sales Line Archive – Generalization objects explored ....76 9.5.7 Sales Line, Standard Sales Line and Sales Line Archive – Associations explored ....................76 9.5.8 Item – Containments and Aggregations mapped with reused Generalization objects ...........77 9.5.9 Item –Reused Generalization objects containing Item ..........................................................78 9.6 Feedback from the Application team ...........................................................................................78 10 Conclusion ..........................................................................................................................................81 10.1 Parsing the C/AL application ........................................................................................................81 10.2 UML Relationship pattern matching ............................................................................................81 10.2.1 Generalization .....................................................................................................................81 10.2.2 Containment........................................................................................................................81 10.3 The Concept Viewer ....................................................................................................................82 10.4 Resource, Events and Agents (REA) relationship pattern matching ..............................................82 11 Future work and Perspective ...............................................................................................................83 12 Abbreviations......................................................................................................................................85 13 Works Cited ........................................................................................................................................87 14 Appendix.............................................................................................................................................91 14.1 Content on the enclosed DVD ......................................................................................................91 List of Figures Figure 2-1 Integration Data Flow (5) ............................................................................................................. 3 Figure 2-2 ERP process flow (5) .................................................................................................................... 4 Figure 2-3 Home page of an Order Processor in Microsoft Dynamics NAV 2009 (Role Tailored Client) .......... 6 Figure 2-4 Current C/AL Compilation for new and old product stack ............................................................ 8 Figure 3-1 UML diagram manually created with Visio ..................................................................................15 Figure 3-2 UML diagram generated from code with MagicDraw ..................................................................15 Figure 5-1 UML mapping of a dependency ..................................................................................................19 Figure 5-2 UML mapping of a generalization ...............................................................................................20 Figure 5-3 UML mapping of an association ..................................................................................................20 Figure 5-4 UML mapping of an aggregation.................................................................................................20 Figure 5-5 UML mapping of a containment .................................................................................................20 Figure 6-1 Overview of all Sales Orders in the Role Tailored Client ..............................................................22 Figure 6-2 New Sales Order in the Role Tailored Client ................................................................................23 Figure 6-3 Viewed as an OO UML diagram ..................................................................................................24 Figure 6-4 Viewed as a database diagram with primary keys (PK) and foreign keys (FK) ..............................24 Figure 6-5 Table field association ................................................................................................................27 Figure 6-6 Multiple associations refactored to a single association..............................................................27 Figure 6-7 Selection box with elements .......................................................................................................29 Figure 6-8 Property window displaying the OptionString of the field Type...................................................29 Figure 6-9 Candidates for Generalization Refactoring..................................................................................30 Figure 6-10 Refactoring multiple associations to single associations............................................................31 Figure 6-11 The cookie company (42)..........................................................................................................33 Figure 7-1 The structure of a abstract syntax tree .......................................................................................46 Figure 8-1 Arrow heads for Containment, Aggregation and Generalization .................................................59 Figure 8-2 Illustrating procedure references between CU12 and CU13 ........................................................64 Figure 9-1 Template, Batch, Line pattern found in the initial Containment work .........................................67 Figure 9-2 Template, Batch, Line pattern in the Concept Viewer .................................................................67 Figure 9-3 The Concept Viewer ...................................................................................................................73 Figure 9-4 Sales Header and Sales Line – Aggregations and Containments ..................................................74 Figure 9-5 Sales Header and Sales Line Associations....................................................................................74 Figure 9-6 Sales Header and Sales Line Generalizations ...............................................................................75 Figure 9-7 Sales Header and Sales Line Generalizations refactored..............................................................75 Figure 9-8 Sales Line, Standard Sales Line, and Sales Line Archive reused Generalization objects explored ..76 Figure 9-9 Sales Line, Standard Sales Line, and Sales Line Archive Generalization objects explored .............76 Figure 9-10 Sales Line, Standard Sales Line, and Sales Line Archive – Associations explored ........................77 Figure 9-11 Sales Line, Standard Sales Line and Sales Line Archive – Associations explored .........................77 Figure 9-12 Item –Reused Generalization objects containing Item ..............................................................78 xvi List of Tables Table 2-1 NAV Customer segment................................................................................................................ 5 Table 6-1 Sales Header and Sales Line Containment ....................................................................................24 Table 6-2 Sales Header Containment in detail .............................................................................................25 Table 6-3 Purchase Header and Purchase Line Containment .......................................................................26 Table 6-4 Profile Questionnaire Header and Profile Questionnaire Line Aggregation ...................................26 Table 6-5 Generalizations in Sales Line ........................................................................................................30 Table 6-6 Color definition for Figure 6-9 ......................................................................................................31 Table 6-7 The double-entry accounting system ...........................................................................................34 Table 6-8 Table naming in NAV ...................................................................................................................36 Table 6-9 Code for posting Journals to Entries.............................................................................................38 Table 7-1 Sync vs. Async execution .............................................................................................................45 Table 8-1 REA results from step 1 ...............................................................................................................63 Table 8-2 REA candidate sets ......................................................................................................................65 Table 9-1 Key fields .....................................................................................................................................68 Table 9-2 Unique key fields .........................................................................................................................68 Table 9-3 All fields ......................................................................................................................................69 Table 9-4 All unique fields ...........................................................................................................................69 xviii List of Formulas Formula 4-1 is_a .........................................................................................................................................17 Formula 4-2 part_for ..................................................................................................................................17 Formula 4-3 has_part..................................................................................................................................18 Formula 4-4 part_of ....................................................................................................................................18 xx List of Code Samples Code 2-1 Table 3 Payment Terms ...............................................................................................................11 Code 6-1 Requirements for Generalizations ................................................................................................28 Code 6-2 Requirements for Generalizations ................................................................................................28 Code 7-1 Example with complex numbers – Definition of active patterns ...................................................42 Code 7-2 Example with complex numbers – Add function ...........................................................................43 Code 7-3 Example with complex numbers – Multiply functions ...................................................................43 Code 7-4 Example with Sync and Async execution - Synchronous function ..................................................44 Code 7-5 Example with Sync and Async execution - Asynchronous function ................................................44 Code 7-6 C/AL variable assignment .............................................................................................................46 Code 7-7 API description of the First method ..............................................................................................47 Code 7-8 Example with lambda expression .................................................................................................48 Code 7-9 Example without lambda expression ............................................................................................48 Code 7-10 Example without LINQ................................................................................................................49 Code 7-11 Example with LINQ .....................................................................................................................50 Code 8-1 C/AL IF statement ........................................................................................................................51 Code 8-2 Segment from our abstract syntax tree XML representation ........................................................52 Code 8-3 TableRelation matching Expression2 ............................................................................................54 Code 8-4 Appendix DVD, file \Code\RegularExpressionParser\RegularExpressionParser\CALRegularExpressions.fs .................................54 Code 8-5 TableRelation matching Expression3 ............................................................................................55 Code 8-6 Appendix DVD, file \Code\RegularExpressionParser\RegularExpressionParser\CALRegularExpressions.fs .................................55 Code 8-7 LINQ Example1 - Select record variables with number equals var ...............................................56 Code 8-8 Querying the abstract syntax tree ................................................................................................56 Code 8-9 Querying the abstract syntax tree ................................................................................................56 Code 8-10 LINQ Example2 – Select all statements .......................................................................................57 Code 8-11 LINQ Example3 – Select all Exp1 variables with ID equals var......................................................57 Code 8-12 Appendix DVD, file \Code\MatchingInLinq\IdentifyInheritance.cs, method RefactorInheritance 58 Code 8-13 Appendix DVD, file \Code\MatchingInLinq\IdentifyREAConcepts.cs, method FindProcedureReferences ...........................................................................................................................58 Code 8-14 Appendix DVD, file \Code\MatchingInLinq\IdentifyREAConcepts.cs, method Call .......................58 Code 8-15 Appendix DVD, file \Code\MatchingInLinq\ParseRemaningElements.cs, method ParseElementsToXML .................................................................................................................................58 Code 8-16 Pseudo code for the Containment algorithm ..............................................................................61 Code 8-17 Pseudo code for Generalization algorithm..................................................................................62 Code 8-18 Pseudo code for refactoring Generalization objects ...................................................................62 Code 9-1 Special case of TableRelation ignored in Containment analysis .....................................................70 xxii Code 9-2 Special case of TableRelation ignored in Generalization analysis ..................................................71 Code 9-3 Special case of TableRelation dependent on YES/NO values ignored in Generalization analysis ....72 Chapter 1 1 Introduction 1.1 Background for the project The overall scope for projects sponsored by the NAV Application Team is to find the primitives for the best path towards the next generation NAV application. This is a very interesting task due to a number of reasons we will cover later. The application team is highly motivated in finding the best steps towards a new application design and student projects are used as one type of contribution for uncovering and analyzing this challenge. The application has grown incrementally from version to version, and the long term goal is to refactor the application code, but the first steps are to analyze the available code and provide knowledge on how the application is tied together. Due to the way the application has been developed, see section 2.3.2, no clear overview of the application exists in terms of Unified Modeling Language (UML) (see section 12, for a list of all abbreviations), or similar, models. These are necessary because the code is too complex for developers to comprehend, and the learning curve for new developers is very steep. One of the key drivers for a redesign is to reduce the code base. J. Kiehn stated that the code base has been growing with a factor 10 from Navision v. 1. This is of course an exaggeration, but the application code base is now counting more than 2.4 million code lines, which indicates, the complexity of the application. Earlier work (1) has shown that the number of dependencies in the code base is high, which is confirmed by J. Kiehn. Dependencies increase the complexity of code, and complex code is more prone to contain bugs and costs more man hours to maintain. These factors are important and needs to be taken care of and the NAV team are aware of the potential problems. 1.2 Project Aim This thesis aims at contributing to a solution for the above problem, by uncovering what exists today and by modeling some chosen suggestions for refactorizations from the present C/AL code. Our approach is to identify, analyze, and implement recognition of software patterns in the NAV application code. The software patterns we will focus on, defines foundational relations, see section 4, between objects. We will use the Unified Modeling Language (UML) (2) terminology to describe the identified relations. The relations we focus on identifying is the concepts containment and generalization introduced in section 5. The project aim was extended underway in the project. The goal was to examine the possibility of identifying specific concepts from the accounting ontology Resource, Events, and Agents (REA), see section 3.3.3, from the C/AL code. The goal from such an approach is that REA offers domain specific knowledge as opposed to the general domain knowledge provided by UML. 2 Introduction Chapter 2 2 Enterprise Resource Planning Domain The following section is provided to give an introduction to Enterprise Resource Planning (ERP) solutions in general. Following this is a more specific introduction to the fundamentals of the Microsoft Dynamics NAV system and the C/AL language. 2.1 Introduction to Enterprise Resource Planning (ERP) (3) Today, ERP systems are an invaluable tool for most companies counting more than a few employees. An ERP system is a tool for running an enterprise, and the ERP system is the backbone of the organization for providing data that can be used in decision making (4). In the past, every department of a company made decisions independently of each other. ERP provides a platform for collaboration and a common ground for decision making. Purchasing Marketing and Sales Accounting and Finance Information Manufacturing Human Resources Inventory Figure 2-1 Integration Data Flow (5) The model on the left illustrates how ERP systems are based on storing all information in one central database, enabling all departments in the ERP solution to work with the same data. (6) Common modules in an ERP system are: Financial Management (FM) Customer Relationship Management (CRM) Supply Chain Management (SCM) Business Intelligence (BI) The most common application of an ERP system is the automation of business processes. One of the most important chains of business processes for a manufacturing company, is the support for selling the manufactured goods (7). Figure 2-2 illustrates how an arbitrary ERP system would support this scenario by allowing the Sales, Warehouse, Accounting and Receiving departments to work together to handle the entire flow, from sale and shipping to receiving payment, for the goods by sharing a common ground in the central information database, and forwarding an order fulfillment to the department responsible for the next step in the process. 4 Enterprise Resource Planning Domain Warehouse Sales Sales quote Pack and ship Sales order Information Receiving Returns Accounting Payment Billing Figure 2-2 ERP process flow (5) One of the key drivers for implementing an ERP system is that it can often substitute the mess of different applications that emerges in a company along the way (8), and thereby streamline and simplify the use of software within the company and, as a result, allow the organization to do better with the same, or even a smaller, amount of money. Another aspect is the possibilities emerging from having all the organizations information stored centrally. This makes it possible to derive important byproducts which, for instance, could enable the organization to do forecasting of production schedules during a holiday period, based on expected order income. This would allow management to adjust the available workforce to make sure that production can cope with the demand from incoming orders. The extensibility is also an important attribute of an ERP system. To fully qualify as an ERP system, it is expected to offer more than ”just” a comprehensive integration of various organizational processes(9). ERP systems should thus strive to be: Flexible –Systems should be able to grow with the organization Modular and open –Systems should offer open interfaces allowing easy interoperability and extensibility to third-party add-on components. Beyond the company –Systems should support integration to customers, partners and vendors because many business processes require interaction with actors outside the organization. 2.2 ERP Business Opportunities (10) There is a large market for ERP software, and many ERP suppliers. SAP has published a report from AMR Research listing the estimated revenues in 2006 for the 17 most dominant ERP vendors. The report estimates the total revenue on ERP software to be $ 28.8 Billion with a revenue growth from 2005 of 14 %. In comparison, the Danish gross domestic product (GDP) 2006 was $ 202.9 Billion, which means that the 5 1 revenue of ERP software is a market of the entire GDP of Denmark, underlining that the ERP market is of 7 great importance. The AMR report lists SAP as the number one ERP supplier with revenues nearly the double of number two on the list. During the last 10 years there has been a consolidation on the market for ERP software. Oracle acquired PeopleSoft (2004), Navision acquired Axapta (2000) and Microsoft acquired Navision (2002) and Great Plains (2000) amongst others. The five most dominant ERP vendors in 2006 were SAP, Oracle, Infor, Sage Group, and Microsoft. 2.3 Microsoft Dynamics NAV (NAV) (11) The ERP product we focus on in this thesis is, as previously stated, the Microsoft Dynamics NAV product. This following section introduces the NAV user segment, product, and architecture. Furthermore, we describe the difference between product code and application code in NAV, which is completely essential in order to understand this project. Microsoft has focused on getting a position on the global ERP market as described in the previous section. The Microsoft Dynamics product group was kick started by acquisitions of successful ERP vendors in the beginning of the 2000’s, and this product group has been developed continuously to gain market shares. Microsoft Dynamics NAV is one of these products. NAV focuses on the Mid-Market+ (defined as companies with 1-5000 employees) segment of the ERP market, leaving the enterprise market to other ERP products. The company size definitions used are the following: Enterprise Corporate Account Segment(CAS) Midmarket Small Business 5000+ employees 1000-5000 employees 50-1000 employees 1-49 employees Table 2-1 NAV Customer segment The following numbers are provided to give an idea about the forces driving NAV. These numbers are from the beginning of 2008(12) > 65.000 Customers (companies that have bought NAV) > 3.300 certified partners (IT professionals working with sale and customization of NAV) > 1.800 add-on solutions (products offered for specific needs by partners) > 40 localized versions (supporting local languages and date formats) > 1.000.000 licensed users (employees in customer companies using NAV) Microsoft Dynamics NAV was originally created by the three college friends, J. Balser, T. Wind and P. Bang, from the Technical University of Denmark (DTU) under the product name Navigator and later Navision. NAV is currently in version 6 (project Corsica). It has been developed in Vedbæk since 1984. It has grown incrementally from version to version becoming a very elaborate ERP system. The latest new addition to the product is the Role Tailored Client (RTC), offering customized user profiles. This enables every user role in the system to have a personalized User Interface (UI) with focus on the tasks they perform, making it 6 Enterprise Resource Planning Domain easier for employees to understand and interact with the system. Figure 2-3 shows the home page for an Order Processor. The Order Processor is the role in a company who is responsible for shipping incoming orders to customers and putting customer returns back in stock. The Order Processor is one of 21 role centers that ship with NAV out of the box and partners can add more to suit individual customer needs. Figure 2-3 Home page of an Order Processor in Microsoft Dynamics NAV 2009 (Role Tailored Client) 7 2.3.1 NAV architecture NAV supports three architectures: one- and two-tier for legacy purposes, and the new three-tier architecture that offers new features and a better scalability. In time, the one- and two- tier architectures are going to be discontinued. One-tier setups consists of the C/SIDE client simply using a file as C/SIDE Client database (not shown in figure) Two-tier setups consists of the C/SIDE client and a database (either Microsoft SQL or the NAV Database Server (legacy product)) Three-tier setups are following the Application Model View Control pattern (13): Service Tier C/SIDE Client o Role Tailored Client (RTC) for users (View) o Application Service Tier (AST) (Control) o Microsoft SQL or a NAV Database Server (Model) Microsoft SQL Server or Furthermore, the C/SIDE client NAV Cserver Database Microsoft SQL Server or is also part of the three-tier NAV Cserver Database setups and is mostly used for 2-1 Two-tier archictecture 2-2 Three-tier architecture development. The Role Tailored Client is only running on the three-tier setup and does, for now, not support development. The C/SIDE client is the old client that is still used for development, but with time, the plan is to provide a new development environment and discontinue the old C/SIDE client. Role Tailored Client 2.3.2 Application Code (14) VS. Product Code When we describe NAV, it is important to note the difference between product code and application code. New product code is written in C# and legacy code is written primarily in C++ (15). New application code is written in C/AL (often written AL) and has always been developed in C/AL. The NAV product is the platform that hosts the NAV ERP application. The product is also running on a platform, namely Windows. Application code is, as stated above, written in the NAV specific language C/AL, and the application implementation is the actual code forming the ERP solution. The application is hosted inside the product and can be custom tailored to support specific customer needs. According to Jesper Kiehn, the application codebase has grown considerably from version to version. One reason for this is the sales channel for NAV. In general, Microsoft does not sell NAV directly to customers. The sale is done via a partner that gets the customer up and running with NAV. As mentioned above, more than 1.800 add-on products exist, and a number of add-on products have propagated back into the product over time to fulfill general customer requests. This has been done by buying the add-on from the partner and merging their code into the shipping application. As described before, the NAV system has been developed continuously over the last 25 years. The result is an application of more than 2.4 million lines of C/AL code. 8 Enterprise Resource Planning Domain The NAV ERP application is divided into 192 granules. A granule is a pack of Pages, Forms, Codeunits, and Tables, introduced in section 2.4.2, together adding a feature to the ERP application. One example of a granule is the Commerce Gateway granule, allowing the NAV ERP system to interact via Commerce Gateway. The commerce gateway granule adds support for business to business transactions. One example of such a transaction could be electronic exchange of sales documents. The NAV system is designed to grow with the company, allowing the company to extend their use of the system incrementally. This is in compliance with theory for implementing ERP software in companies. In general, it is advised to implement ERP systems incrementally in a company, increasing the scope along the way. Furthermore, it is advised to design the system so it is able to handle future company growth (16). As described above, this can easily be achieved with the introduction of new granules in the implemented NAV ERP. In fact, the granules are already installed in the customers NAV. If the customer system runs on the standard application, they just have to pay for an extension of their license to get the appropriate access rights. 2.3.3 Application Language Transition As described in section 2.3.2, the configuration of NAV is done with C/AL. Until NAV version 6.0, C/AL has been compiled solely to a NAV specific binary format, but the new three-tier architecture described in section 2.3.1, supported by the Role Tailored Client and the Application Service Tier, is running directly on C# code. The transformation from C/AL to C# is done using a token parser. The generated C# code is fully running C# code, but not nice readable code. The C# code is then compiled to Microsoft Intermediate Language (17) (MSIL), allowing the binary (CLR) application code to run on any platform supported by the .NET framework. C/AL Compiled to NAV specific binary format Transformed to C# Old Product Platform (C/SIDE) (C++ Code) New Product Platform (RTC, AST) (C# Code) Large parts of the product were designed to run with the NAV specific binary format, and still do. Therefore, the product is in a transition phase where it needs both the C/AL and the C# representation. Product Interaction Figure 2-4 Current C/AL Compilation for new and old product stack A NAV specific object model has been created in C#, which enables application code to be written in C#, but the product will need both the C/AL representation and the C# representation. At this point, only parsing from C/AL to C# is possible, not the other way around. This causes some inconvenience when debugging application code on the new product stack (three-tier). When debugging the application code on either the Role Tailored Client or the Application Service Tier, debugging can only be done in the generated C# code. This is far from ideal because there is no direct link between generated C# code and the C/AL code. Fixing an issue identified in the generated C# application code has to be fixed in C/AL from the C/SIDE client. 9 2.4 Fundamentals of the C/AL language The code we analyze throughout this project is written in the language C/AL. This section is provided to give an introduction to the design criteria and syntax for C/AL. C/AL is not an object oriented language because only eight predefined non extendable object types can be used. This implicitly means that C/AL does not support concepts such as inheritance. The language is strongly typed, meaning that variable types cannot be inferred and have to be defined explicitly. C/AL is a procedural language in the family of imperative programming languages. Imperative languages specify the individual steps of a computation (how we want it) in contradiction to declarative languages that specify what the program should do (what we want) (18). 2.4.1 C/AL Design Criteria Michael Nielsen (Director of Development in NAV) has a long history with NAV and was one of the original designers of the language. M. Nielsen stated that the design criteria’s (19) for C/AL were to provide an environment that could be used without: Dealing with memory and other resource handling Thinking about exception handling and state Thinking about database transactions and rollbacks Knowing about set operations (SQL) Knowing about OLAP(20) and SIFT(21) 2.4.2 C/AL Syntax The C/AL syntax is heavily inspired by Pascal, but the language is simpler. The full language reference can be found on MSDN (22). The C/AL language provides a limited set of predefined object types, which also help to reduce the complexity. Overall, the general design goal has been to provide a flexible language that enables developers to quickly assimilate themselves with developing for NAV. C/AL has eight kinds of objects: Tables, Forms, Reports, Dataports, XMLports, Codeunits, MenuSuites and Pages. Pages are Forms (Graphical User Interface) for the Role Tailored Client. Forms and MenuSuites objects will probably be discontinued as the old C/SIDE client is outfaced. Dataports and XMLports are objects to setup import and export of data via text files or web services. Report is an object type to define Business Intelligence (BI) reports. We will not spend more time with the above object types. They contain no information we need in our further analysis. All relevant information in regards to our analysis is placed in Tables and Codeunits. The following section will give an introduction to these two object types. 10 Enterprise Resource Planning Domain 2.4.2.1 Tables Tables are data containers and represent the foundation of the application. Tables express data and logic in the form of triggers. Tables contain two kinds of triggers: Trigger Events are activated on specific actions. One Trigger Event is the OnDelete trigger that is activated every time a row is deleted in the table. Trigger Functions are defined by developers and can be activated on any instance of the table. An example of the structure of a table can be seen in Code 2-1. This table is the first table in the application, and is also one of the shortest. We will briefly describe each component of the table: OBJECT-PROPERTIES contain meta data for the table such as creation data etc. PROPERTIES contain general properties stored on the table and Trigger Events. This table only contains a definition of the OnDelete trigger. FIELDS contain the definition of the fields in the table. KEYS contain a list of the fields forming the table’s primary key. In this case, the field Code is the primary key. FIELDGROUPS contain a list of prioritized fields used by the Role Tailored Client. CODE contains the defined Trigger Functions. Every Trigger Function is defined as a procedure 11 OBJECT Table 3 Payment Terms { OBJECT-PROPERTIES { Date=05-11-08; Time=12:00:00; Version List=NAVW16.00; } PROPERTIES { DataCaptionFields=Code,Description; OnDelete=VAR PaymentTermsTranslation@1000 : Record 462; BEGIN WITH PaymentTermsTranslation DO BEGIN SETRANGE("Payment Term",Code); DELETEALL END; END; CaptionML=ENU=Payment Terms; LookupFormID=Form4; } FIELDS { { 1;;Code;Code10;CaptionML=ENU=Code; NotBlank=Yes } { 2;;Due Date Calculation;DateFormula;CaptionML=ENU=Due Date Calculation } { 3;;Discount Date Calculation;DateFormula; CaptionML=ENU=Discount Date Calculation } { 4;;Discount %;Decimal;CaptionML=ENU=Discount %; DecimalPlaces=0:5; MinValue=0; MaxValue=100 } } KEYS { { ;Code;Clustered=Yes } } FIELDGROUPS { { 1 ;DropDown;Code,Description,Due Date Calculation } } CODE { PROCEDURE TranslateDescription@1(VAR PaymentTerms@1000: Record3; Language@1001: Code[10]); VAR PaymentTermsTranslation@1002 : Record 462; BEGIN IF PaymentTermsTranslation.GET(PaymentTerms.Code,Language) THEN PaymentTerms.Description := PaymentTermsTranslation.Description; END; BEGIN END. } } Code 2-1 Table 3 Payment Terms 12 Enterprise Resource Planning Domain 2.4.2.2 Codeunits Codeunits act only as a container for functions. The standard function libraries are stored in Codeunits and consist of utility routines that serve a general purpose in NAV. User defined Codeunits can contain user defined functions. Codeunits can advantageously be used to: Reduce the code in a table. Code can be extracted and stored in a Codeunit procedure. If code is of a more general nature, and influences more than one object, it will be more correct to store it in a Codeunit instead of storing it in a table. The structure of Codeunits is identical to the structure of Tables presented above with the only exception being that Codeunits only contain the components OBJECT-PROPERTIES, PROPERTIES and CODE. Chapter 3 3 Related Work The following section describes the prior work and accomplishments with relevance for our project. The section is divided in two main parts. The first section is regarding related work done within Microsoft and the Microsoft Dynamics NAV organization in particular. In the second section, we widen our scope and see what has been accomplished elsewhere. 3.1 What has been accomplished inside Microsoft The following section describes the achievements from former student projects and projects carried out by Microsoft internally, that we know of. Elements of the previous work described later in this section have worked as a great starting point for this project. 3.1.1 Partial C/AL parser The core theme of this project relies on the ability to work with the application code in an abstract queryable representation, see section 1. A parser (CALParser) was developed by a former student project carried out by T. Hvitved 2008 (1). The parser was developed to analyze the application to examine if the code could be modularized, to reduce the number of dependencies in the code. The complete code for this project can be seen on the appendix DVD in the folder \Code\Work From T. Hvitved\oo parser\. CALParser is implemented in F# and uses the F# variant of the tools LEX (23) and YACC (24) to define the parsing rules. The mentioned technologies are described in section 7.3. The parser is of high quality, but unfortunately it is a complex piece of software that requires more work to be complete. From working with the parser we have found the following inexpedient issues we need to find solutions for. The parser is not able to parse the actual application code. Preprocessing removing multiline comments is necessary for the parser to work with the actual application code. The parser does not parse to the level of detail we need. Table keys, table fields and field properties OptionString, TableRelationString and other options are handled as strings and require subparsing. The parser is implemented with F#’s immutable data types. This is a problem because we cannot modify the data that needs subparsing. We had a meeting with Tom Hvitved (Phd. Student at DIKU) who wrote the parser, and from his introduction we learned that the intension with the parser was to implement enough level of detail to 14 Related Work support his thesis, not to develop a complete C/AL parser. Taking this information into consideration, and from our initial work with the parser, we know that we will be able to use the CALParser but not as-is. 3.1.2 Codedub Codedub is a project carried out in collaboration between J. Kiehn (MDCC) and a Master’s thesis student, Till Blume (IT University of Copenhagen). The focus for this thesis has been to find candidates for refactoring. The short story is, that due to the lack of support for inheritance in C/AL, a lot of code blocks are copied here and there. Code duplicates can be identified by comparing hashing values from code blocks. The project deadline is June 2009. 3.1.3 Navision Developer Toolkit (25) The Navision Developer toolkit (NDT) is a legacy tool for partners and internal users. The tool is able to assist in upgrade operations and can display a number of details of the code. The NDT is not meant as a development environment in the same sense as Visual Studio or Eclipse, but more as an analysis tool. According to J. Kiehn, the tool has a limited success and is probably going to be discontinued. Partners and developers rather use text comparison and search tools, such as Beyond Compare (26), in their analysis of AL code. 3.1.4 Object Map Object Map is an internal tool for displaying identified relations. The tool is a generic viewer of relations, which are identified from an XML file. The XML file is produced with the Navision Developer Toolkit (NDT) and it contains relations between Tables, Forms, Codeunits and Pages. The Navision Developer Toolkit does, however, not support methods to determine the nature of a relation. 3.2 What has not been accomplished inside Microsoft In relation to our project, the parser is still not complete, and needs further work to fulfill our requirements. Tools exist to display relations, but the view is static and based on a dump from the NDT. The available tools do not support any kind of relation classification, i.e. it is possible to map a relation but there is no way to know the deeper meaning of the relation (association, dependency, etc.). The overall goal for Object Map is to display an interactive graph which allows us to obtain knowledge about the structure of the application. The lack of relation classification reduces the overall information we are able to read from a diagram displaying relations. Furthermore, if we were able to apply general terms to how the relations are in the code, people with general computer science knowledge will be able to get a deep insight into the application code organization more quickly. Suggestions for refactoring have in many ways been unsuccessful. A good way to move from C/AL to C# has not been found. 3.3 What has been accomplished outside Microsoft 3.3.1 Refactoring from Code to UML There are many projects offering code generation from an abstract representation. A search on Google reveals that there seems to be an overweight in products offering generation of C++ and Java code. 15 A few products offer both refactoring from code to UML and vice versa. We have looked at two of the larger products: IBM’s Rational Rose (27) and No Magic’s MagicDraw (28). We have previously worked with Rational Rose without being completely satisfied with the UML generation. We therefore chose to look at MagicDraw to see what the product offers when it comes to generating UML from code. The Enterprise edition of MagicDraw comes with refactoring capabilities for Java, C++ and C# among other languages. We installed a demo of MagicDraw 16.5 Enterprise and generated a diagram based on two files with a generalization relation between them. Table -number : string -id : string +Number : string +ID : string +Table(in number : string, in id : string) +Table(in info, in ctxt) +GetObjectData(in info, in txt) : Table REATable +REATable(in table : REATable, in tablewith : Table) +REATable(in info, in ctxt) +GetObjectData(in info, in ctxt) : REATable Figure 3-1 UML diagram manually created with Visio Figure 3-2 UML diagram generated from code with MagicDraw Figure 3-1 shows the relation we would like to map between Table and REATable. The analyzed classes can be seen in the folder “Code\MatchingInLinq\MatchingInLinq\” on the attached DVD. The generalization relation is described in depth in section 4.1. Figure 3-2 shows the generated model from MagicDraw. The diagram does show a directed relationship from REATable to Table, but it is not clear that REATable is a child/subclass to Table. We believe there are two legitimate reasons for this. We have not spent hours on fully understanding how all the options of the tool works. MagicDraw is a complex tool and we just generated a standard UML class diagram, expert users are probably able to produce more useful diagrams. The tool is directed towards creating code from a model and thus the model has to be very detailed. We do however believe, that the model created manually in Visio is providing a cleaner view of the relationship between the two classes. We believe MagicDraw is a fair representative for products offering UML diagrams generated from code. Furthermore, we find that the generated diagram has a low readability, and as such does not offer the overview we wish to provide. Many of the tools offer some kind of refactoring in a simple drag and drop 16 Related Work fashion, but we have chosen not to look further into how they perform. None of the found refactoring tools offer support for C/AL. 3.3.2 UML designers Many good tools exist for drawing UML. Most tools offer good design options where the mouse can be used to arrange the diagrams in any preferred order. We have primarily used Visio 2007 for this report, but many other good tools exist. A list of UML tools can be found at the homepage of Associate Professor M. W. Godfrey from University of Waterloo (29). We have not found a UML designer that can generate large diagrams as a batch process. 3.3.3 Refactoring to Resource, Events and Agents (REA) (30) Since the beginning of accounting software, all systems have relied on the double-entry accounting system, introduced in section 6.3.2. Other systems have, however, emerged, trying to solve the drawbacks from the double-entry system. One of the most successful systems is the Resource, Events and Agents (REA) ontology. The system is developed by William E. McCarthy, who published his first paper presenting REA in 1982 (31). Since then, he and others have contributed in developing REA into an elaborate bookkeeping ontology. The scope of this project was, as described in the introduction (see section 1), extended to cover initial REA identification (see section 6.3.1). We therefore looked into prior work in this field to learn from former achievements. We found that not much work has been done in refactoring accounting systems into REA. As previously stated, most accounting systems are based on the double-entry system. We have looked into a paper analyzing the relationship between REA and the Enterprise Resource Planning system SAP (32). The paper focuses on the similarities in the data models of SAP and REA. Evidence is found that proves the existence of duality within SAP, but it is also found that SAP has implementation compromises that will not fit into a REA model, and that a REA model cannot fully describe the SAP system. The paper concludes that REA terminology is able to present SAP data models and that the REA presentation is able to provide valuable information to the data model. We have not found evidence of any work identifying REA concepts from automated code analysis. 3.4 What has not been accomplished outside Microsoft Many products for refactoring between code and UML exist. They are complicated, and it requires a larger study to benefit from their capabilities. No refactoring tools for C/AL exist. Prior work mapping REA to an ERP product has not focused on analyzing the actual code 3.5 Related work summary The most important notes learned from the related work study were, that we will need to extend the parser to support the relation identification this project aims at solving. Furthermore, we will need to build a custom viewer tool for presenting the identified relations in terms of UML. Chapter 4 4 Foundational Relations (33) The following section explains the conceptual meaning of the UML relations containment and generalization, described in section 5. As previously described, the primary task of this project aims at identifying the two above concepts from the NAV application code. Therefore, we introduce the formal theory for the above concepts. The formal theory uses the abbreviations 𝑝𝑎𝑟𝑡_𝑜𝑓 for containment and is_a for generalization. For this section, we primarily use the naming 𝑝𝑎𝑟𝑡_𝑜𝑓 and is_a, but in the later sections we will solely use the naming containment and generalization to follow the UML standards described in section 5. We present the definition of 𝑝𝑎𝑟𝑡_𝑜𝑓 (containment) and is_a (generalization) in terms of standard firstorder logic. The following two primitive relations are used in our formulas. Inst(x, A) (short for instance) maps the relation between an instance and its class. A statement using inst is: Jane is an instance of a human being. Part(x,y) defines parthood between two instances. A statement using part is: Jane’s heart is part of Jane’s body. 4.1 Definition of the Is_a relation The is_a (generalization) relation is the simpler of the two relations we describe. The formula below expresses that A is_a B. The right side definition of the is_a relation expresses that if x is an instance of class A then x is an instance of class B and that this is true for all x. A statement fulfilling this formula is: A human male is a human. 𝐴 is_a 𝐵 = def ∀𝑥 𝑖𝑛𝑠𝑡 𝑥, 𝐴 → 𝑖𝑛𝑠𝑡 𝑥, 𝐵 Formula 4-1 is_a 4.2 Definition of the Part_of relation The 𝑝𝑎𝑟𝑡_𝑜𝑓 (containment) relation is a bit more complex. To provide a better understanding the formula is divided into two parts which are joined in a third formula. The part_for formula below express that if x is an instance of A then there exists an instance of y where y is an instance of B and x is part of y and that this is true for all x. With the part_for formula we can state that: Human testis part_for human being. But we cannot state that human being has_part human testis because only males have testis. 𝐴 part_for 𝐵 = def ∀𝑥 𝑖𝑛𝑠𝑡 𝑥, 𝐴 → ∃𝑦 𝑖𝑛𝑠𝑡 𝑦, 𝐵 & 𝑝𝑎𝑟𝑡 𝑥, 𝑦 Formula 4-2 part_for 18 Foundational Relations The ℎ𝑎𝑠_part formula below expresses that if y is an instance of B then there exists an instance of x where x is an instance of A and where x is part of y. With ℎ𝑎𝑠_part we can state that human being has_part heart but not that heart part_for human beings because many animals have a heart. 𝐵 ℎ𝑎𝑠_part 𝐴 = def ∀𝑦 𝑖𝑛𝑠𝑡 𝑦, 𝐵 → ∃𝑥 𝑖𝑛𝑠𝑡 𝑥, 𝐴 & 𝑝𝑎𝑟𝑡 𝑥, 𝑦 Formula 4-3 has_part The 𝑝𝑎𝑟𝑡_𝑜𝑓 formula is a combination of the part_for and ℎ𝑎𝑠_part formulas. It expresses that A is 𝑝𝑎𝑟𝑡_𝑜𝑓 B if A is part_for B and B ℎ𝑎𝑠_part A. With the 𝑝𝑎𝑟𝑡_𝑜𝑓 formula we are able to state: Human heart part_for human being and human being has_part human heart, which is the definition of the 𝑝𝑎𝑟𝑡_𝑜𝑓 relation. 𝐴 𝑝𝑎𝑟𝑡_𝑜𝑓 𝐵 = def 𝐴 𝑝𝑎𝑟𝑡_𝑓𝑜𝑟 𝐵 & 𝐵 ℎ𝑎𝑠_𝑝𝑎𝑟𝑡 A Formula 4-4 part_of The first-order logic definitions for is_a and 𝑝𝑎𝑟𝑡_𝑜𝑓 respectively generalization and containment are the requirement to the patterns this thesis aims at identifying. The formal definitions have not been in the foreground when developing the identification algorithms. The formulas are presented to provide a link between the UML concepts generalization and containment, introduced in section 5, and their formal mathematical definitions. Chapter 5 5 Unified Modeling Language (UML)(34) As mentioned earlier, see section 1, we have selected UML as the output standard for our findings. The reason is that we aim at identifying common computer science concepts, described in section 4, and present them in general abstract terms to present a generic result. In the following section, we will give an introduction to the part of UML we use to facilitate a view of our findings. 5.1 UML Terminology Overview The Unified Modeling Language (hereafter UML) has become the de facto standard for describing object oriented software by means of models and diagrams. Since the object oriented languages started gaining acceptance in the late 1970s, available modeling techniques proved inadequate for describing objects and their relationships. This was the driving force behind the development of a new modeling technique and Grady Booch, James Rumbaugh and Ivar Jacobson started developing and documenting UML in 1994. They founded a UML consortium with several organizations willing to allocate resources into developing UML. IBM, Microsoft, Oracle, Rational and Texas Instruments were some of the contributing organizations. The outcome of this work was the UML 1.0 standard presented in 1997 (35). The popularity of the object oriented programming languages has helped UML reach a broad acceptance in both the business and university environments (36). UML covers many different methods and models. Use cases, sequence diagrams, and deployment diagrams to mention a few. The model and syntax we have chosen to work with is the Object Model. We have in fact even limited our use of the Object Model because the analysis of C/AL is limited to identifying relationships between the code objects. It would be easy to map all attributes and procedures into our model, but the model contains more than 900 table objects and most of these have more than 100 attributes. Therefore, attributes and procedures are left out to keep the model simple and facilitate a clear overview of the findings we wish to promote. In the following sections, we explain the UML terminology for relationships, e.g. how the objects connect. We give an example of each relationship and an explanation of its usage. 5.1.1 Dependency The dependency relationship is also known as the “using” relationship. The figure on the left shows a dependency. Window is dependent on Event and a change in Event might cause a change in Window. Event Window Figure 5-1 UML mapping of a dependency 20 5.1.2 Generalization The generalization relationships is the relationship between a general class (also called a super class or parent class) and a more specific class (also called a sub class or child class). On the right we see that we have the super class Car(for all cars) and a more specific class Ford (only for Fords). 5.1.3 Association The association relationship is used to map a relationship between objects. In the example to the right a Company can have *(zero to many) Employees, and an Employee can work at 1 (one) Company. 5.1.4 Aggregation An aggregation is a model of two objects at the same level, i.e. there exists no generalization between them. The relationship can be described as the class Company “Has a” Department. The Company represents the whole and the Department is part of the Company. 5.1.5 Containment Containments are closely related to the aggregation relation described above. The containment is a special aggregation also known as a “strong aggregation”. The containment is mapped with a filled diamond on the aggregation line. The example on the right expresses that Department has a containment to Company and therefore cannot exist without the Company. Unified Modeling Language (UML) Car Ford Figure 5-2 UML mapping of a generalization Company Employee 1 * Figure 5-3 UML mapping of an association Company Department 1 * Figure 5-4 UML mapping of an aggregation Company Department 1 * Figure 5-5 UML mapping of a containment The notation for aggregation, association, containment, and generalization will be used throughout this thesis to represent the conceptual meaning of these concepts. Chapter 6 6 Analysis The following section contains the analysis of the patterns we aim at identifying in the C/AL application code. We elaborate on the patterns for identifying containment, generalization, and REA candidates in the given order. The section provides additional information on NAV and a broader introduction to accounting with the double-entry system and with the REA ontology. The aim of this section is to uncover the code expressing the containment, generalization and REA patterns. 6.1 Identifying Containment Pattern The first concept we are identifying is containment. A containment, is as the definition in section 4.2 states, a relationship where the existence of one object is a requirement for another object to exist. For example, a car cannot have tires if it does not have wheels. The following bullet points are indicators for containments in NAV (37): The primary key of a table ends with an integer (indicates the multiplicity of the relation is 1..n (one to many)) The dependent table has a foreign key, to a referenced table, as part of its primary key. The foreign key field has the name Document Type The table being referenced has code for cascading deletion of all dependent rows in the referencing table 6.1.1 The role of the Sales Header and Sales Line tables in NAV The Sales Header and Sales Line tables are identified by J. Kiehn as an example of a containment in NAV. They are used respectively for storing information on shipping and for what to ship. We have confirmed this containment in NAV by deleting a Sales Header from the NAV Demo database through the C/SIDE client: When a row in Sales Header is deleted, the corresponding Sales Line rows are deleted When creating a row in the Sales Line table, only existing Sales Header IDs are accepted To get a better idea of how the Sales Header and Sales Line tables are used in NAV, we provide an introduction to Sales Orders. 22 Analysis Sales Orders in NAV are created when a customer is requesting a delivery of some product. Figure 6-1 shows the overview where all active (not been shipped) Sales Orders are displayed. When an order is shipped an employee will post the Sales Order. The posting of a Sales Order will copy fields to a journal table. When the customer has paid the delivered products, the journal will be posted to an entry, thereby finalizing the monetary transaction of goods. This part is covered in section 6.3.2.2. Figure 6-1 Overview of all Sales Orders in the Role Tailored Client 23 When an order is received, a new Sales Order is created. Figure 6-2 shows the user interface (UI) to create such a document. If one of the Sales Orders from Figure 6-1 are selected they will also show in this window. In Figure 6-2 we see that the UI has two main parts, General and Lines. The General part is stored as a row in the Sales Header table, and each line in the Lines part is stored as a row in the Sales Line table. The Sales Header table contains information that is common for the entire order, and the Sales Line table contains information that is individual for each product in the order. Figure 6-2 New Sales Order in the Role Tailored Client 24 Analysis 6.1.2 Manual analysis of Containment pattern In the following section we analyze the Sales Header (hereafter SH) and Sales Line (hereafter SL) tables to identify the containment pattern. Many NAV objects are very elaborate, and SH and SL are no exception. SH has an arity (number of fields) of 180 and counts 3334 lines of code, and SL has an arity of 194 and counts 4117 lines of code. There is 1 (one) SH row to 1..n (one to many) SL rows. If a SH row is deleted, all SL rows for the corresponding SH row are deleted. The figure below depicts the relationship between SH and SL in UML and database terms. 6.1.2.1 Containment relation between Sales Header and Sales Line 1 Sales Header Sales Line Sales Header 1..n Sales Line PK PK Document Type No. PK,FK1 PK PK Document Type Document No. Line Figure 6-3 Viewed as an OO UML diagram Figure 6-4 Viewed as a database diagram with primary keys (PK) and foreign keys (FK) Both SH and SL have a primary key ending with a variable containing “No.”, which indicate they are of type integers or code, and a lookup in the code shows they are of type Code. Table primary keys are commonly the combination of a number of fields, as seen in table SH and SL. The Document No. variable that is part of both tables is defined as a foreign key in SL to SH. None of the variables in the SH joined primary key has foreign keys to SL, this proves a 1..* relationship between SH and SL. Reading the OnDelete() procedure trigger it is clear, that SH truly has a cascade delete of SL rows. A table’s OnDelete() trigger is activated when a row in the table is deleted. Looking at the C/AL code associated with the tables, we find the following. Table 36 Sales Header Primary Key Document Type, No. OnDelete() SalesLine should be defined as a variable referencing the table that has a foreign key to Sales Header. SalesLine.SETRANGE("Document Type","Document Type"); SalesLine.SETRANGE("Document No.","No."); SalesLine.DELETEALL(TRUE); (* Not how it is implemented) Table 37 Sales Line Document Type, Document No., Line No. Primary Key Document No. has the property: Foreign Key TableRelation = "Sales Header".No. WHERE (Document Type=FIELD(Document Type)) Table 6-1 Sales Header and Sales Line Containment From the results in Table 6-1 we see that the relationship between tables SH and SL is in fact a containment. 25 6.1.3 Variations in containment pattern The above code is an example of the simplest occurrence. There are many variations for containments, and the code for Sales Header is actually a little bit more complex than listed, because Sales Lines of the type “Item” have to be deleted first. This is a consequence of storing multiple types in a single table. The actual code can be seen in Table 6-2. Globals (Global variables for table SH) OnDelete() Function in SH Function in SH Name DataType Subtype SalesLine Record Sales Line SalesLine.SETRANGE("Document Type","Document Type"); SalesLine.SETRANGE("Document No.","No."); SalesLine.SETRANGE(Type,SalesLine.Type::"Charge (Item)"); DeleteSalesLines; SalesLine.SETRANGE(Type); DeleteSalesLines; DeleteSalesLines() IF SalesLine.FINDSET THEN BEGIN HandleItemTrackingDeletion; REPEAT SalesLine.SuspendStatusCheck(TRUE); SalesLine.DELETE(TRUE); UNTIL SalesLine.NEXT = 0; END; HandleItemTrackingDeletion() WITH ReservEntry DO BEGIN RESET; SETCURRENTKEY( "Source ID","Source Ref. No.","Source Type","Source Subtype", "Source Batch Name","Source Prod. Order Line","Reservation Status"); SETRANGE("Source Type",DATABASE::"Sales Line"); SETRANGE("Source Subtype","Document Type"); SETRANGE("Source ID","No."); SETRANGE("Source Batch Name",''); SETRANGE("Source Prod. Order Line",0); SETFILTER("Item Tracking",'> %1',"Item Tracking"::None); IF ISEMPTY THEN EXIT; IF HideValidationDialog OR NOT GUIALLOWED THEN Confirmed := TRUE ELSE Confirmed := CONFIRM(Text052,FALSE,LOWERCASE(FORMAT("Document Type")),"No."); IF NOT Confirmed THEN ERROR(''); IF FINDSET THEN REPEAT ReservEntry2 := ReservEntry; ReservEntry2.ClearItemTrackingFields; ReservEntry2.MODIFY; UNTIL NEXT = 0; END; Table 6-2 Sales Header Containment in detail 26 Analysis The code, presented in Table 6-2, reveals how the OnDelete() trigger actually calls the procedure DeleteSalesLine(), which again references the procedure HandleItemTrackingDeletion(). In this case, the procedures were located in the same table. This is not always the case. 6.1.4 Manual identification of additional containments We know that all tables with “line” as part of their name are candidates for being part of a containment pair. We are able to find all these tables and display their first primary key. This can be done by creating a new Form in the C/Side client displaying the system table Keys and applying a filter (TableName = *Line* and No. = 1). This filter only displays table names containing the string “Line” and their standard primary key. In NAV it is possible to define alternate sets of primary keys on a table. The above filter identifies 149 obvious candidates that might satisfy the requirement for a containment. Manual analysis of two random candidates indicated that more containments are waiting to be identified. Table 38 Purchase Header Primary Key Document Type, No. PurchLine.SETRANGE("Document Type","Document Type"); OnDelete() PurchLine.SETRANGE("Document No.","No."); PurchLine.SETRANGE(Type,PurchLine.Type::"Charge (Item)"); DeletePurchaseLines; PurchLine.SETRANGE(Type); DeletePurchaseLines; Table 39 Purchase Line Primary Key Document Type, Document No., Line No. Foreign Key Document No. has the property: TableRelation = "Purchase Header".No. WHERE (Document Type=FIELD(Document Type)) Containment Yes, follows the same pattern as SH and SL. Table 6-3 Purchase Header and Purchase Line Containment Table 5087 Profile Questionnaire Header Primary Key Code OnDelete() Not defined Table 5088 Profile Questionnaire Line Primary Key Profile Questionnaire Code,Line No. Foreign Key Profile Questionnaire Code has the property: TableRelation = "Profile Questionnaire Header" Containment No, there is no code for cascading delete – might be an aggregation? Table 6-4 Profile Questionnaire Header and Profile Questionnaire Line Aggregation 27 6.2 Identifying Generalization Pattern The second relationship we are indentifying is generalization. C/AL does not support inheritance, and therefore there is no direct way to program generalizations in the application code. The generalizations we are identifying are therefore special associations whose behavior can be understood as following the generalization pattern we have defined. It is our belief, that if the system should be refactored to a language supporting generalization, the overall design would benefit from this refactoring. Ref Table 1 Ref Table 3 Table Ref Table 2 Ref Table 4 Ref Table 5 The software patterns we are analyzing the code for are those matching the form of Figure 6-5 on the left. Figure 6-5 shows how a table has fields that can be of n table types. Figure 6-5 Table field association We suggest the following refactoring to introduce a cleaner design inducing simplicity. The refactoring is modeling the relationship above as a single association to a generalization object, encapsulating the dependencies in a generalization object. Generalization Table Ref Table 1 Ref Table 2 Ref Table 3 Ref Table 4 Ref Table 5 We believe this is a desirable representation for the pattern defined above. We believe this refactoring could achieve induced simplicity by reducing the total number of lines of code and reduce the total number of dependencies. Lines of code could be reduced because we expect the generalization patterns to occur in multiple tables, allowing us to reuse the generated generalization objects. This is documented in section 9.4. Figure 6-6 Multiple associations refactored to a single association The total number of references would also be reduced by managing the dependencies in the reused generalization object, and thereby reducing the multiple associations from the table to a single association to the generalization object. The application contains 911 tables and they account for 179835 lines of code (LOC). The average LOC is fairly low (197) but some tables, for instance Sales Header (3354 LOC) and Sales Line (4221 LOC), are candidates for the code smell, described in section 6.2.1, Large Class (38) which can be dealt with by Extracting Class which is part of our refactoring. The proposed refactoring will not reduce the code significantly but it will be one of many steps. 28 Analysis 6.2.1 Code smell Large Class and solution Extracting Class A code smell is a term introduced by M. Fowler and is used to describe the “smell” of something that is not right in the code, i.e. code gone bad. The code smell large class (38) is a smell described as a class that is trying to do too much. No exact definition of when a class is trying to do too much exists. Some of the indications are when the class contains too many instance variables, duplicated code, and in general counts too many lines of code. There are many different opinions on when a class is too large, and there will be variance according to which programming language the code is implemented in. The code in the example tables Sales Header and Sales Line are a lot larger than the average lines of code which is another indication for the code smell. The extracting class is defining the refactoring of extracting a part of a class, hopefully a part occurring more than once, and using an instance of the new class instead of the original code (39). An important byproduct from this refactoring is induced independability which is a necessary step in preparing the system for a higher degree of modularization. A more modularized system can more easily be understood, maintained, and extended. When large codebases are not modularized, it can become hard if not impossible to change an item, simply because it is difficult to foresee the implications a change will have if it is used excessively throughout the code. 6.2.2 Manual analysis of the Generalization pattern The generalization pattern is expressed via a relationship between properties on two fields in the same table. The following example lists the requirements for the generalization pattern described with the two fields Field 1 and Field 2. Field 1 has to be of the Navision data type Option and has to have the property OptionString set to a pattern similar to this: item_1, item_2,..,item_n Code 6-1 Requirements for Generalizations Field 2 has to be of the Navision data type Integer or Code and has to have the field property TableRelation set to a pattern similar to this: IF (Field 1 = CONST(“item_1”)) table reference 1 ELSE IF (Field 1 = CONST(“item_2”)) table reference 2 .. ELSE IF (Field 1 = CONST(“item_n”)) table reference 3 Code 6-2 Requirements for Generalizations 29 The field type Option is displayed as a selection box in NAV and the field’s OptionString is being parsed as a comma separated list of elements by the selection box. Figure 6-7 and Figure 6-8 below displays what this looks like in the NAV object designer. Figure 6-7 Selection box with elements Figure 6-8 Property window displaying the OptionString of the field Type 30 Analysis 6.2.3 Manual analysis of the generalization relationship from Sales Line The following section contains a manual analysis of table 37 Sales Line (hereafter SL). The goal of the analysis is to identify the code concepts expressing the generalization pattern defined in section 6.2.2. Table 37 Sales Line Field Type Type Property Option OptionString No. Code TableRelation Posting Group Code TableRelation Unit of Measure Code Code TableRelation Originally Ordered No. Variant Code Code Code TableRelation TableRelation IF (Type=CONST(Item)) "Item Variant".Code WHERE Value ,G/L Account,Item,Resource,Fixed Asset,Charge (Item) IF (Type=CONST(" ")) "Standard Text" ELSE IF (Type=CONST(G/L Account)) "G/L Account" ELSE IF (Type=CONST(Item)) Item ELSE IF (Type=CONST(Resource)) Resource ELSE IF (Type=CONST(Fixed Asset)) "Fixed Asset" ELSE IF (Type=CONST("Charge (Item)")) "Item Charge" IF (Type=CONST(Item)) "Inventory Posting Group" ELSE IF (Type=CONST(Fixed Asset)) "FA Posting Group" IF (Type=CONST(Item)) "Item Unit of Measure".Code WHERE (Item No.=FIELD(No.)) ELSE IF (Type=CONST(Resource)) "Resource Unit of Measure".Code WHERE (Resource No.=FIELD(No.)) ELSE "Unit of Measure" IF (Type=CONST(Item)) Item (Item No.=FIELD(No.)) Table 6-5 Generalizations in Sales Line The table above lists the extracted information from table SL. The table lists the fields fulfilling the defined initial requirements for the generalization pattern. We have identified 6 fields, 1 with the property OptionString and 5 with the property TableRelation. The TableRelation’s properties are dependent on some or all of the elements in the OptionString. The token after the if-statement in the TableRelations is a global table id referring to a table. The diagram in Figure 6-9 illustrates our findings. Unit of Measure Code/Type Sales Line Variant Code/Type Item Variant Item Unit of Measure No./Type Originally Ordered No./Type Posting Group/Type Resource Standard Text Fixed Asset Inventory Posting Group G/L Account Item Charge FA Posting Group Item Figure 6-9 Candidates for Generalization Refactoring Figure 6-9 illustrates the findings in our manual analysis of SL. The color codes used are listed in Table 6-6. 31 The analyzed table (in this case Sales Line) Combination of fields in the Sales Line table named Field1/Field2 expressing the analyzed relationship. Field1 has the property TableRelation which uses variables from Field2’s OptionString property. The table’s referenced by the TableRelation Table 6-6 Color definition for Figure 6-9 We have listed the relations in a simplified manner, the correct display of the relations would be to show each relation as an arrow going from SL to the referenced table. We have chosen to draw the relations this way to underline the relation to the extracted code in Table 6-5. To display Figure 6-9 in terms of UML, we would remove the origin fields and simply map the Sales Line table and its relations to referenced tables as associations. This would produce a star diagram with Sales Line in the middle and connections to each of its referenced tables. In our opinion, it is advantageous to make this refactoring. The result is shown in the following Figure 6-10. Generalization Standard Text Fixed Asset G/L Account Item Charge Item Variant Sales Line Item Unit of Measure Resource Item Inventory Posting Group FA Posting Group Figure 6-10 Refactoring multiple associations to single associations The proposed refactoring will for SL reduce the number of relations from 11 to 1 which is desirable. 6.3 Identifying REA Concepts As described in the introduction (section 1) of this thesis, the project aim was expanded midway in the process. J. Kiehn introduced the REA accounting system and suggested that we should try and see how far we could get in identifying elements from the C/AL code that could fit into the REA model. The following sections give an introduction to the REA model, the double-entry accounting system (used in NAV), and describes our plan for identifying REA concepts in the C/AL code. 6.3.1 The Resource, Event and Agent (REA) model (40) McCarthy found inspiration to REA around 1975 by the emerging relational databases and started working on defining a framework for building accounting systems in a shared data environment. The result was the REA model. 32 Analysis The model’s core feature is an object pattern consisting of two mirror image constellations that represented semantically the input and output components of a business process. The model displays the exchange of goods for money with focus on the duality of the transaction. McCarthy describes this as: “Stock-flow relationships associate the flows in and out of a resource category while the duality links keep the economic rationale for the exchange boldly in the forefront” (41). We bring a simple example (Figure 6-11) to illustrate how the REA model is constructed. The model describes an enterprise selling cookies. The core understanding of the model is that we “Give Cookies and Take Cash” and that the duality of this action is the corner stone in the model. The mirroring is marked by the stippled line. Events are a central part of the model because they map the actions and relations. The principle in the model is the two mirror image constellations where every component in the model has a corresponding component on the mirror image. The following section explains the components in a REA model and identifies the elements in Figure 6-11. We introduce the elements in the following order: Relationships, Resources, Events, and Agents. 6.3.1.1 Relationships Relationships map the association between Resources – Events and Events – Agents. The relationship specifies the nature of the relation in terms of the action taking place. The relationship provides data about entities (Resources and Agents). The relationship between Resources and Events expresses the exchange of the Resource and the relationship between Events and Agents describes the role of the Agent. The model below contains the three types of relationships: Inside Participation – Mapping the relationship between Events and internal employees (Salesperson or Cashier) Outside Participation – Mapping the relationship between Events and external actors (Customer) Stock-Flow – Mapping the relationship between Events and Resources 6.3.1.2 Resources Resources are mapping the stock of some entity. Typically, resources are the corn and bread for a company. Resources are produced from other Resources and the ultimate goal is normally to exchange them for money. The model in Figure 6-11 contains two resources: Cookies – The enterprise’s stock of cookies Cash – The enterprise’s holding of cash 33 6.3.1.3 Events Events map the individual actions inside the company forming the processes in the company. The complete REA model describes the business processes and an Event maps a single step in the process. The model in Figure 6-11 contains two events: Sale – The Customer buys cookies from the Salesperson and receives the cookies Cash Receipt – The Customer pays for the cookies and the Cashier receives the payment 6.3.1.4 Agents Agents are the individual actors in the model. Agents are the initiators of Events and they therefore play an important role in the system. The model in Figure 6-11 contains three actors: Salesperson – Employee in the cookie enterprise handling the delivery of goods to the customer. Cashier – Employee in the cookie enterprise handling monetary transactions. Customer – External actor buying cookies from the Cookie Enterprise. Economic Resource Cookies Economic Agent Inside Participation Salesperson Economic Event Stock-Flow Sale Economic Agent Outside Participation Customer Give Duality Take Economic Agent Outside Participation Customer Economic Event Stock-Flow Cash Receipt Economic Agent Inside Participation Economic Resource Cash Figure 6-11 The cookie company (42) Cashier 34 Analysis 6.3.1.5 Extending the Model The model in Figure 6-11 can easily be extended to cover more business processes and ultimately cover the entire cookie enterprise. An easy place to start would be to focus on the requirements of producing cookies. We need to have flour, sugar, and butter to produce cookies, so extending the model to cover inventory would be easy. The duality would be covered by mapping the monetary exchange for goods with Vendors. From here on the model could be extended to cover employee wages, equipment, and so forth. The REA model offers a good scalability and a down to earth approach in mapping the relevant parts of the business in respect to the business processes. 6.3.2 Introduction to Accounting Theory As described previously, the core part of an ERP system is the accounting application. This is also the case in NAV, and most of the tables and functionality that we will use for examples throughout this thesis are components from the accounting application. In the following sections we will introduce some new concepts. First we will give an introduction to the accounting system that NAV is designed after, namely the all dominant double-entry system. Thereafter, we will draw parallels between the theory and the NAV application to illustrate how the double-entry system shines through in the naming of tables and behavior of the system. Finally, we will use this information to apply REA theory to the NAV application to examine if REA can be applied, and provide additional information on the architecture and behavior of the application. 6.3.2.1 The double-entry accounting system The most used accounting system is the double-entry system. The double-entry system can be traced far back and the first, and most influential, textbook on the technique was published in Venice in 1494 by the Franciscan monk and mathematician Frater Lucal Pacioli (43). The double-entry system is still the dominant system today, and the largest ERP products, including systems such as SAP and NAV, are based on the double-entry system. The basic principle of the system is, that every entry in the book should be presented as both a debit and a credit. Postings therefore always influence at least two accounts. If an amount is added on one account it should be subtracted on another account. The principle is that debit should always be equal to credit, which is the guarantor for the balance to be correct. This can be expressed in the simplified model below. Debit Credit Expenses Paying salaries, buying raw material Assets Money in the bank, inventory Revenue Money from sale of goods Liabilities Debt Table 6-7 The double-entry accounting system The accounting technique is to always register entries on the debit side except if the amount is negative, where it is recorded on the credit side. The debit-credit system is completely necessary in today’s bookkeeping methods. 35 6.3.2.2 The double-entry accounting terms The double-entry accounting system has a special naming convention. The following section gives an introduction to the terms ledger, entry, and journal used in double-entry accounting, and explains with an example how NAV would support a simple beverage store. The following definition of the term ‘ledger’ gives a good explanation of how ledgers, entries and journals are tied together. The definition states: “A book for recording the monetary transactions of a business entity in the form of debits and credits. Entries recorded in the subsidiary journals are posted (recorded) to the general ledger as final entries.”(44) To illustrate how this maps to the real world we will use the beverage fridge of a dormitory as a simple example. At the dormitory I live in, I am responsible for managing our nonprofit fridge with beverages. The system is simple: When a person takes a beverage from the fridge, they mark it on a list. The list is a simple spread sheet with names of the residents on the y-axis and types of beverages on the x-axis. 1. Each line in this list corresponds to a sales line in NAV, listing which beverages we have sold to each customer. 2. When we once in a while change the lists on the fridge, we add together the beverages sold to each customer and write the price in our book. Each line in this book is equal to journal lines in NAV. 3. Every month we add up the total amount for each customer and give them an invoice. When the invoice is paid, we note in our book that the invoice has been paid. This action is equivalent to the NAV action of posting a journal line to an entry in the general ledger account and deleting the journal afterwards. 36 Analysis 6.3.3 Naming of tables in NAV The following section is provided to elaborate on the double-entry accounting terms and show how these naming conventions are reflected in NAV. The naming of NAV tables follows the double-entry accounting terms and therefore tells us a lot about the role of a table object in the system. This section lists the most relevant NAV tables, in terms of accounting, and describes their role in shaping the basic NAV bookkeeping system. Table name Salesperson/ Purchaser Customer Customer Invoice Discount Customer Account Customer Ledger Entry Sales Header Sales Line Vendor Vendor Invoice Discount Vendor Ledger Entry Purchase Header Purchase Line Item Item Ledger Entry General Journal Line G/L Account G/L Entry Table role in NAV Table for storing sales and purchase employees. The salesperson would be an employee having contact with customers, so that the customer would take one or more of the company’s products. The purchaser would be an employee having contact with vendors with the purpose of purchasing materials used to manufacture products or similar. Table for storing customer information, such as ship to address and similar information. Table for storing the customer negotiated discount from the regular list price. Table for storing all financial transactions with a customer. Table to create entries regarding customers in the G/L Account. Table for storing information on shipping. Table for storing information on what to ship. Table for storing vendor information such as the company’s account number in the vendor’s system. Table for storing the negotiated discount from the Vendors regular list price. Table to create vendor entries in the G/L Account. Table for storing information about the vendor the company bought products from. Table for storing information on which products the company bought. Table for storing the products the company sells. Table for creating entries regarding Items in the G/L Account. Records showing the exchanged goods not yet paid for. The general ledger account is the general accounting book for the company. The general ledger entry table is for creating entries in the general book. Table 6-8 Table naming in NAV The important note is that all tables with names such as journal, entry, and ledger are clear indicators that NAV is based on the double-entry accounting system. 6.3.4 REA concepts in NAV As presented in section 6.3.3 NAV contains characteristics of the double-entry system. If the system was to be refactored to the REA accounting model all tables handling receipt of goods, use of materials, and other actions related to exchanging resources would have to be refactored to REA Events (45). For NAV this would mean that entries, journals and ledgers tables would be replaced with REA events tables. We will in the following section examine how this could be done. 37 In Table 6-8 it can be seen that the system contains journal and entry tables which follow the concept from the double entry system. We have identified the pattern of a journal line being posted to an entry. The action of posting an entry with data from a journal line is equivalent to an Event in REA. If we can find the journal line we can also find the tables that contribute to the values in the journal line. The idea proposed by J. Kiehn was to identify REA Resources and Agents from the posting event. We do however know that many values are written to journal tables that are not themselves part of a REA relation. The Gen. Journal Line table contains 142 fields and contains data such as customer discount, date of maturity, etc. The following section contains an analysis of the steps necessary to identify the REA candidates. 6.3.4.1 Identifying REA concepts This section contains an overall step by step approach for identifying REA candidates. The goal here is to layout the overall requirements to REA candidates we will analyze this in depth in the manual analysis in section 6.3.5. 1. Find all entry/journal table variable sets where: a. Field values from a journal line variable is copied to an entry table variable: Ex: CustLedgEntry."Customer No." := GenJnlLine."Account No."; b. The entry table instance is inserted in the entry table Ex: CustLedgEntry.INSERT; c. The journal line instance is deleted from the journal line table Ex: GenJnlLine.DELETE; 2. Find all tables from which values are copied to the journal line: a. Identify all the table objects copying values to the journal line tables identified in step 1. Ex: GenJnlLine."Account No." := SourceCodeSetup.”Account No.”; 3. Find REA candidates from the journal line: a. If there exists a containment (described in section 6.1) between the entry table identified in step 1 and the table objects identified in step 2 we have identified a REA candidate. b. If there does not exist a containment we check if there is a resemblance in the naming of the tables. For instance Vender Ledger Entry which will be identified in step 1 as part of the variable set (Vendor Ledger Entry / Gen. Journal Line) has a resemblance to name of the Vendor table which will be identified in step 2. Therefore it would be reasonable to assume they are related. 6.3.5 Manual analysis of REA Events in NAV In section 6.3.4 we presented how we see REA as a candidate to replace the double-entry system. We found that the posting of data from journal lines to entries is a key component in the double-entry system. Further we found that we can apply REA terminology to the meaning of this movement of data. We know that the posting of data to an entry could be expressed as an event in REA terminology. Our initial work is therefore to find all relations between tables where values from a Journal Line table are copied to an Entry table. J. Kiehn has identified Codeunit 12 Gen.Jnl.-Post Line as a Codeunit that posts journal lines (of type 81 Gen. Journal Line) to an entry. The name Jnl.-Post Line is used as an acronym for Posting of Journal Lines. When we search through the Codeunits in NAV we find the following Codeunits with same naming: 12 Gen. Jnl.-Post Line 22 Item Jnl.-Post Line 38 Analysis 32 BOM Jnl.-Post Line 212 Res. Jnl.-Post Line 1012 Job Jnl.-Post Line 5633 FA Jnl.-Post line 5652 Insurance Jnl.-Post Line Common for all the journal line posting Codeunits above is that they are called from a Codeunit with an ID number one higher and the same ID name where Line is replaced with Batch i.e. 12 Gen. Jnl.-Post Line -> 13 Gen. Jnl.-Post Batch. In NAV batch normally refers to a process run as a batch process i.e. a process with no screen output. We will manually examine Codeunit 12 Gen .Jnl.-Post Line (CU12) and Codeunit 13 Gen .Jnl.-Post Batch (CU13) to find a suitable method to identify the posting mechanism. When the posting mechanism has been identified and we have automated the process of identification the next step will be to identify tables that post values to a journal line and transitively post values to an entry table. The identified tables will be REA Resource and Agent candidates. We expect the seven posting Codeunits to share a common structure due to the lack of inheritance described in section 2.4. CU12 and CU13 count respectively 7245 and 996 lines of code. We will in the following section start our analysis from CU13 which initiates the event. We only present the most significant code lines. ID Codeunit and procedure Code 1 Code; 2 3 4 CU13.OnRun(Rec) Rec: Gen. Journal Line(record) CU13.Code() CU12.RunWithoutCheck() CU12.Code() 5 CU12.PostCust() 6 CU13.Code() (Continuing) Table 6-9 Code for posting Journals to Entries GenJnlPostLine.RunWithoutCheck(GenJnlLine5,TempJnlLineDim); Code(false); IF "Account No." <> '' THEN CASE "Account Type" OF "Account Type"::"G/L Account": PostGLAcc; "Account Type"::Customer: PostCust; … "Account Type"::"IC Partner": PostICPartner; END; WITH GenJnlLine DO BEGIN CustLedgEntry.LOCKTABLE; CustLedgEntry.INIT; CustLedgEntry."Customer No." := "Account No."; … CustLedgEntry."Posting Date" := "Posting Date"; … CustLedgEntry.INSERT; GenJnlLine3.DELETE; //(referring to same instance) The section hereafter describes each of the above seven code blocks and explains why they are relevant. 1. This part is taken from the OnRun trigger. The OnRun trigger is executed when the Codeunit is run. The OnRun trigger takes a record of type Gen. Journal Line as parameter and calls the Code procedure in the same procedure. 2. The Code procedure calls the procedure RunWithoutCheck in CU12. 39 3. The RunWithoutCheck calls the local procedure Code. 4. The Code procedure calls local procedures based on which account type the postings should be posted to. 5. One of the posting procedures in step 4 is PostCust(). This procedure copies values from the Gen. Journal Line table (GenJnlLine) to the Customer Ledger Entry table (CustLedgEntry) and finally the entry is inserted. 6. When PostCust() returns the execution of CU13.Code from step 2 continues. The Code procedure deletes the inserted Gen. Journal Line (GenJnlLine3). From the manual analysis we found that there is a variation in how the posting procedures are implemented. Further the analysis revealed that the code is complex, and that there are many procedure references that needs to be analyzed. The analyzed pattern is spread among five procedures located in two Codeunits. The automated identification algorithm will need to analyze all referenced code. 40 Analysis Chapter 7 7 Tools The tools section introduces the tools, technologies, and methods used to implement: The parser extension identified as a requirement in section 3.1.1. The containment, generalization, and REA identification algorithms described in section 8.9 A tool (Concept Viewer), see section 9.5, for dynamically displaying the findings of our identification algorithms 7.1 .NET Framework .NET is the platform under the latest version of the NAV ERP application and all related work, see section 3, builds on it. Thus .NET is the only sensible platform for working with the job at hand. The .NET framework is the Microsoft flagship when it comes to development. Version 1.0 was released in 2001 and the latest release is version 3.5. The framework allows developers to program for any platform for which there exists a .NET framework implementation. Much like what the Java Virtual Machine does for Java. Microsoft has made the standard open allowing implementations for non Microsoft platforms to appear. Mono (46) is one of such solutions offering full C# 3.0 support in their latest release (version 2.4) targeted for Windows, Mac OS X and Linux. The different flavors of programming languages (F#, C#, VB.NET, and others) are compiled via language specific compilers to the Intermediate Language (MSIL). When the code is run, MSIL is compiled with a JIT compiler to Common Language Runtime (CLR) which is executable bits. This can be studied in depth at MSDN (47). 7.2 Interoperability A great advantage with the .NET languages C#, VB.net, F# is the interoperability they offer. As described above, they all share a common base with the .NET framework and this enables developers to use classes, modules, methods and function from any other .NET based language. The integration can be done easily in Visual Studio simply by referencing a project or .dll. The integration is done completely allowing the developer to seamlessly debug through the executed code lines jumping between C# and F# projects without having to notice the technology and project shift. 7.3 F# Hvitved’s parser was written in F#, but it is incomplete, see section 3.1.1, with respect to the problems that we are going to solve in this project. Thus we need a basic understanding of F# in order to extend the C/AL parser. F# is the latest addition to the family of .NET languages. The latest release is the Community Technology Preview (CTP 1.9.6.2) which was released in September 2008. F# is the result of a research project by Don 42 Tools Syme at Microsoft Research in Cambridge, England. The language is going to be fully integrated in Visual Studio 10 which is expected to ship together with the .NET framework version 4.0 in the latter half of 2009. The language is a declarative language extended with some support for imperative programming. The language is heavily inspired by OCaml (48) (Objective Caml), which is inspired by CAML (49) (Categorical Abstract Machine Language), which again is inspired from ML (50) (Metalanguage) developed in the 1970s at Edinburg University. As described, the language has a strong heritance with many potential users among scientists and researchers in various fields. The biggest achievement in F# is the combination of the brevity and robustness of the Caml programming language family with .NET interoperability facilitating seamless integration of F# programs with any other program written in a .NET language, described in section 7.2. 7.3.1 Language syntax As mentioned before the language is heavily inspired by OCaml. The syntax is actually so familiar to OCaml and Haskel that F# forums even recommend to read books on Haskel and OCaml to learn F# (51). Speaking of syntax it is worth noticing that white spaces matter which will be a surprise to the average OO programmer. The language is strongly typed and uses type inference as we will demonstrate in the following examples. Among other features worth mentioning are generics and modules. Functions in F# are implicitly generic and can be reused in other functions. They can even be nested inside other functions returning their result directly to a parent function. Modules is a great help when writing F# programs allowing the developer to shield off the functions not meant for use outside a module. F# is one of the first mainstream programming languages implementing active patterns and asynchronous programming constructs. The two sections below provide an introduction to the concepts. The active pattern concept is used for our parser extension and the section on asynchronous programming is provided even though we have not used asynchronous programming in our project. The asynchronous programming features of F# are unique for languages in the .NET family and a good motivation for learning F#. 7.3.1.1 Active Patterns (52) Pattern matching is normally done over the core representations of data structures such as lists, options, records and discriminated unions. In F# pattern matching is extensible allowing the developer to define new ways of matching over existing types. Active patterns are the technique for this. The following example is an introduction to how active patterns can be useful in converting an object. In computer science, complex numbers are normally represented as rectangular coordinates (x + yi). But in math we often use the polar representation of coordinates of a phase and magnitude. For some calculations the polar representation is preferable. The next example illustrates how we can define two active patterns to match complex numbers in either the rectangular or polar representation. let (|Rect|) (x:complex) = (x.RealPart, x.ImaginaryPart) let (|Polar|) (x:complex) = (x.Magnitude, x.Phase) Code 7-1 Example with complex numbers – Definition of active patterns ‘Let’ is the definition keyword used for declaring functions and values in F#. The active patterns we have defined are named Rect and Polar they both take one parameter x of type complex (requires import of 43 Microsoft.FSharp.Math) and the returned value will be the value set for either the rectangular or polar representation of the complex number. The next three code examples shows how the two active patterns defined above can facilitate cleaner and shorter algorithm designs. The code defines three calculations one add function and two multiplication functions. The first function addViaRect is a function which adds two complex numbers (a and b) by using their rectangular representation. F# is a strongly typed language with type inference therefore the parameter type will automatically be inferred to be of type complex based on the call to the Rect function. The addViaRect matches the input with Rect and return the sum of the two complex numbers as a new complex number. let addViaRect a b = match a, b with | Rect(ar,ai), Rect(br,bi) -> Complex.mkRect(ar+br, ai+bi) Code 7-2 Example with complex numbers – Add function The last two functions mulViaRect and mulViaPolar are provided to underline the advantage of having both representations. We see that multiplying complex numbers in the rectangular representations is much more complicated than performing the same operation on the polar representation. Therefore it is beneficial to be able to use the optimal representation in each case. let mulViaRect a b = match a, b with | Rect(ar,ai), Rect(br,bi) -> Complex.mkRect(ar*br - ai*bi, ai*br + bi*ar) let mulViaPolar a b = match a, b with | Polar(m,p), Polar(n,q) -> Complex.mkPolar(m*n, (p+q)) Code 7-3 Example with complex numbers – Multiply functions 7.3.1.2 Asynchronous programming (53) F# supports asynchronous workflows, allowing developers to easily convert single-threaded code into multi-threaded code. In contradiction to the ‘let’ keyword, ‘let!’ is used to define objects as being able to run asynchronously. Every function defined with the ‘let!’ keyword executes on a dedicated thread that is taken from and released to a thread pool when the computation is done. Therefore async computation uses more threads compared to single threaded execution. One of the reasons multithreaded computation is easy to do in F#, is due to its immutable data types. Data types in F# and functional languages in general are implicitly immutable, meaning that a declared value cannot be changed during execution. This is a great advantage when designing thread safe code because we always can rely on the state of value. When mutable data types are needed they have to be explicitly declared with the keyword mutable. This will have some limitations on the available options for performance improvements by running in multiple threads. The following modified example from the book Expert F# is provided to explain how asynchronous programming can be done in F#. The provided code does not actually run in the CTP version of the F# language due to changes in the FSharp.Core.dll, but the code can still be used as a good example for illustrating how asynchronous computations work in F#. 44 Tools open System.Net open System.IO open Microsoft.FSharp.Control.CommonExtensions let museums = ["MOMA", "http://moma.org/"; "British Museum", "http://www.thebritishmuseum.ac.uk/"; "Prado", "http://museoprado.mcu.es"] let fetchSync(nm,url:string) = do printfn "Creating request for %s..." nm let req = WebRequest.Create(url) let resp = req.GetResponse () do printfn "Getting response stream for %s..." nm let stream = resp.GetResponseStream() do printfn "Reading response for %s..." nm let reader = new StreamReader(stream) let html = reader.ReadToEnd() do printfn "Read %d characters for %s..." html.Length nm for nm,url in museums do fetchSync(nm,url)) Code 7-4 Example with Sync and Async execution - Synchronous function Code 7-4 above has three main parts. The first part declares the list museums, with three elements each containing the name and URL of a museum. The second part is the fetchSync function which creates a request for a webpage, waits for the response, and prints the length of the response to the screen. The third is a for loop which calls the fetchSync function for each element in the museums list. The above code will run in a single thread, retrieving a single URL at a time. This will lead to starvation, while the program waits for the response, and while the response is downloaded. Code 7-5 is the asynchronous version of part two and three from above. The first thing to notice is how similar the two solutions are! The fetchAsync functions inner code is declared inside an async {}, marking that the code within {} is a part of an asynchronous workflow. The next part is the let! keyword described earlier, declaring that the result should be run in a dedicated thread. Finally the function call in the for loop calls fetchAsync by spawning a new thread with Async.Spawn for each function call. let fetchAsync(nm,url:string) = async { do printfn "Creating request for %s..." nm let req = WebRequest.Create(url) let! resp = req.GetResponseAsync() do printfn "Getting response stream for %s..." nm let stream = resp.GetResponseStream() do printfn "Reading response for %s..." nm let reader = new StreamReader(stream) let! html = reader.ReadToEndAsync() do printfn "Read %d characters for %s..." html.Length nm } for nm,url in museums do Async.Spawn (fetchAsync(nm,url)) Code 7-5 Example with Sync and Async execution - Asynchronous function 45 The performance of the two approaches is very different. The performance of fetchSync would be heavily impaired by the waiting time from sequentially requesting and receiving one html page at a time. The performance of the fetchAsync function would be better because we would shoot off all three requests right away instead of waiting for other requests to complete. The box below show how the output from the two approaches would produce the same output in different order and implicitly speed, due to the reasons described above. Therefore the execution time of the asynchronous version will be close to the execution time required to handle the slowest or largest response, in this case the MOMA museums homepage. The highlighted text gives an indication on which step the synchronous approach will be executing when the asynchronous execution completes. Synchronous execution Asynchronous execution Creating request for MOMA... Getting response for MOMA... Reading response for MOMA... Read 41635 characters for MOMA... Creating request for British Museum... Getting response for British Museum... Reading response for British Museum... Read 24341 characters for British Museum... Creating request for Prado... Getting response for Prado... Reading response for Prado... Read 188 characters for Prado... Table 7-1 Sync vs. Async execution Creating request for MOMA... Creating request for British Museum... Creating request for Prado... Getting response for MOMA... Reading response for MOMA... Getting response for Prado... Reading response for Prado... Read 188 characters for Prado... Read 41635 characters for MOMA... Getting response for British Museum... Reading response for British Museum... Read 24341 characters for British Museum... 7.4 LEX, YACC and Abstract Syntax Trees (54) This section is an introduction to the necessary steps in parsing a text language into an abstract representation, keeping the syntax of the text. These steps are relevant in this project, because much of this work is based on these techniques. We will describe the two parsing tools LEX (55) and YACC (56), their role in a compiler, and how the steps of a compiler are closely related to the work in this thesis. There are many tools for building parsers. The CALParser, described in section 3.1.1, is built with FSLEX and FSYACC, which is the F# variant of LEX and YACC. LEX is a tool to produce lexical analyzers. A lexical analyzer is a program that can recognize tokens in a text and output the identified tokens. The lexical analyzer is produced by compiling a LEX file defined in a special grammar. The grammar is relying heavily on regular expressions, this technique is described in section 7.6. YACC (Yet Another Compiler Compiler) is a tool that can be used to define grammar rules. The grammar rules are recognizing sequences of tokens defined in a Lexer file. The parser file definition is compiled with YACC, and the result is a parser. The output from the parser is token blocks, that can be stored in a suited data structure such as an Abstract Syntax Tree (AST). The data structure used by the CALParser is an Abstract Syntax Tree (AST). The AST is constructed by matching YACC output to data object types with the use of Active Patterns, described in section 7.3.1.1. 46 Tools := . Object Value Property Figure 7-1 The structure of a abstract syntax tree Figure 7-1 on the left shows how an assignment statement would be represented in an abstract syntax tree. The assignment code can be seen below. Object.Property := Value; Code 7-6 C/AL variable assignment The full data model can be seen in the F# source file \Code\Work From T. Hvitved\oo parser\CALast.fs on the appendix DVD. The use of abstract syntax trees in this project is elaborated in section 8. Parsing files is the initial tasks in a compiler and the work in this project is actually very similar to the work of a compiler. A typical compiler goes through the following states when compiling a program: Lexing Dividing code into understandable tokens Parsing Parse the code into understandable blocks (if, for, while, etc. statements) Semantic Analysis Check if the code makes sense (variables are assigned before they are referenced etc.) Optimizations/Transformations Optimize the code by removing unused variables and other more advanced transformations. Code generaration Generate output which could be running code The parsed code has already been compiled by the internal C/AL compiler. Therefore, there is no need to perform semantic analysis. The code is tokenized and parsed with the CALparser and the developed subparser. The primary task in this project is the optimization and transformation step in the compiler. The presented refactorings are the result of this step. The output standard for this “compiler” is UML diagrams, provided by the Concept Viewer. 7.5 C# C# is the target platform, we even want to port the application code to C#, and the final goal of the refactorization is to produce a more readable C# code then the present C# token parser, see section 2.3.3. According to the TIOBE programming community index, C# is the 7th most popular programming (57) language with Java still taking the 1st prize. This is a very good result for a programming language that has only been around for 8 years, taken into account that the indexing algorithm, used by TIOBE among other things, counts legacy code written 20 years ago, when calculating its ranking. 47 As most Danes in computer science know, C# and the .NET framework are developed by lead architect Anders Hejlsberg. Version 1.0 of C# was released in 2001, and the latest version is 3.0 matching version 3.5 of the .NET framework. C# is a highly advanced imperative programming language offering many advanced language extension. We have decided to present a few interesting concepts from the .NET languages from a C# perspective. We will go over the following language extensions in the following sections: Regular Expressions, Lambda Expressions and LINQ. 7.6 Regular expressions (58), (59) Regular Expressions originate from automata and formal language theory developed by Warren McCulloch and Walter Pitts in their work on the McCulloch-Pitts neuron model. The Regular Expression technique has been used since the 1960s and is today widely supported in programming languages. Regular Expressions have been supported by the .NET framework from the first versions, and all the common .NET languages support the use of Regular Expressions. Regular Expressions are useful for declaring expression patterns that can be used to identify patterns in strings. The functionality is described in section 8.4. 7.7 Lambda expressions (60), (61) We use lambda expressions to write cleaner C# code. The use of lambda expression has enabled us to reduce the total number of C# code lines. Lambda expressions originate from lambda calculus which formed some of the ground pillars of computer science as we know it today. Lambda calculus was developed by Alonzo Church during the 1930s and has directly or indirectly inspired Von Neumann in his design of a computer able to process jobs using software (62). The principles behind lambda calculus has in many ways formed functional languages such as Erlang, Haskell, Lisp, ML and F#. Also other types of languages offer support for lambda calculus for example the array programming language APL and the object oriented C#. C# introduced support for lambda expression in version 3.0 which has made lambda expressions a valuable and useful language extension in everyday C# programming. Lambda expressions in C# are often used on collections. Every collection implementing the IEnumerable<T> interface has implicit support for lambda expression operations. The definition in the box Code 7-7 shows the method First which can be called on any Enumerable type (List, Array, Dictionary, etc.) The documentation of the First methods defines First as a generic method able to select an element based on the provided Func(TSource, Boolean). Enumerable.First(TSource) Generic Method (IEnumerable(TSource), Func(TSource, Boolean)) Code 7-7 API description of the First method The code in Code 7-8 is a simple example of how to use the First method. The code defines an integer array with 15 random values. The First method returns a single integer value. The type we work on is a number implicitly inferred to be of type integer, and we choose the first number with a value above 80. The returned value will in this case be the value 92 at index 3. 48 Tools int[] numbers = { 9, 34, 65, 92, 87, 435, 3, 54, 83, 23, 87, 435, 67, 12, 19 }; int first = numbers.First(number => number > 80); Code 7-8 Example with lambda expression The code in Code 7-8 is easily understood and can be expressed in a single line. If we had to write the code in Code 7-8 without lambda expressions it would require 9 lines of code as seen in the example Code 7-9. Therefore the presence of generic methods, which can be configured with lambda expressions, is a great way to improve the readability by removing the need for extra lines of code. public static int First(int[] numbers) { foreach (int number in numbers) { if (number > 80) return number; } throw new InvalidOperationException("No number satisfied condition"); } Code 7-9 Example without lambda expression 7.8 LINQ (63) (64) Language Integrated Query in daily terms LINQ enables us to query data of various types. We use LINQ to query an abstract syntax tree, see section 7.4, representing the NAV application code. LINQ was presented to the public in 2005 during the annual Microsoft Professional Developers Conference (PDC), and was later released with the .NET framework version 3.5. LINQ is developed to fill the gap between object oriented languages and data that does not exist as objects. One example is data in a relational database. LINQ is the result of a long term research investment, and many other projects have formed the basis that LINQ is based on. Among the more significant projects are: Cω (C-Omega), ObjectSpaces and XQuery. LINQ is designed by Anders Hejlsberg who, among a lot of other frameworks and languages, also designed the .NET framework, and has one big advantage compared to the other former projects. LINQ is designed to generically support all types of data source, which was one of the main reasons for focusing on LINQ instead of funding separate projects aiming at individual data sources. The vast majority of applications being developed access data of some kind. The consequence is that a developer needs to learn more than one language. For instance creating a database query often requires that the developer writes a SQL statement as a string and sends it to the database. Product manager for Visual Studio, Jason McConnell expressed this as: “It was like you had to order your dinner in one language and drinks in another,” (65) LINQ is aiming at removing this gap between the data world and the world of general-purpose programming languages by providing a uniform way to access data from within the programming language. To underline how LINQ fills the gap between data and general purpose programming languages, one can conceive LINQ as consisting of two complementary parts: a set of tools that work with data, and a set of programming language extensions. 49 The uniformity of the design enables the developer to query objects in memory, relational database, XML documents and other data from within the same language. Notably with the same simple SQL inspired syntax on all data sources. LINQ is implemented on the .NET framework and can be used from within the .NET languages. To give an idea of the differences between code written without LINQ and code written with LINQ, we bring an example (66) in Code 7-10 and Code 7-11 for some simple C# code working on XML data. Both examples produce a book collection with two books published in 2006. 7.8.1 Example 1: Without LINQ using System; using System.Xml; class Book //used in both examples { public string Title; public string Publisher; public int Year; public Book(string title, string publisher, int year) { Title = title; Publisher = publisher; Year = year; } } static class HelloLinqToXml { static void Main() { Book[] books = new Book[] { new Book("Ajax in Action", "Manning", 2005), new Book("Windows Forms in Action", "Manning", 2006), new Book("RSS and Atom in Action", "Manning", 2006) }; XmlDocument doc = new XmlDocument(); XmlElement root = doc.CreateElement("books"); foreach (Book book in books) { if (book.Year == 2006) { XmlElement element = doc.CreateElement("book"); element.SetAttribute("title", book.Title); XmlElement publisher = doc.CreateElement("publisher"); publisher.InnerText = book.Publisher; element.AppendChild(publisher); root.AppendChild(element); } } doc.AppendChild(root); doc.Save(Console.Out); } } Code 7-10 Example without LINQ 50 Tools 7.8.2 Example 2: With LINQ Example 2 reuses the code from example 1. The code marked with bold text in example 1 is replaced with the code in example 2. XElement xml = new XElement("books", from book in books where book.Year == 2006 select new XElement("book", new XAttribute("title", book.Title), new XElement("publisher", book.Publisher) ) ); Console.WriteLine(xml); } Code 7-11 Example with LINQ When we compare the two examples we first notice that the example with LINQ (Code 7-11) is shorter than the example without LINQ (Code 7-10). Example 1 counts 17 LOC and example 2 counts 10 LOC, which is a 41 % decrease. When we examine example 1 to see where the extra code went we see that most of the code is related to creating and adding elements to the XML document. Further we see that we have to handle advanced concepts, such as root node, create element, set attribute, and append child. These concepts require developers to watch their steps and put more of their attention towards the XML technology because, it can be hard to predict the actual XML outcome from longer and more complex document. When we look at the LINQ counterpart in Example 2, we see how easily this can be done with a complete down to earth approach, by simply returning the XML structure directly. The code is shorter, simpler and easier to read, because we can create the generated XML structure in a single statement. Further the LINQ example is able to replace both the C# foreach and if statement with a simple where and select statement. Common for code comparison from examples using LINQ to examples not using LINQ, we find that code using traditional data source binder code as ADO.net, System.XML or similar data binder when compared to its counterpart in LINQ, will be found less optimal. LINQ code is more compact and has automatically a high degree of readability. LINQ offers out of the box support for objects, SQL, XML, and any data structure implementing the IEnumerable<T> interface. Enableling arrays, collections, databases and XML to be queried with same uniform syntax. Chapter 8 8 Implementation This section introduces how we actually solved the problem defined in section 1. We describe some of the obstacles we had to overcome. Furthermore, we introduce parts of our solution and some of the key components enabling us to solve the problem. 8.1 Problem with Parser We have found that we cannot extend the CALParser implementation nor the produced abstract syntax tree (AST) it produces. The reason we cannot extend the produced AST is because F# is a functional language that uses immutable data types to enable asynchronous computation. When a TableRelation (TR) has been stored in the AST we cannot subparse it. Therefore we decided to export the AST to XML. By exporting the entire AST to XML we are able to query the code with LINQ described in section 8.5. 8.1.1 CALParser AST to XML AST The CALparser produces an Abstract Syntax Tree from the parsed C/AL code. Further the package contains a printer function that prints the AST as C/AL code. We decided to export the C/AL parser output AST to XML. We did this be rewriting the print function to write well-formed XML and write the result to a file. By exporting the AST to XML we have attained the following important goals: We have a data structure we can easily manipulate and extend We can access the data from any programming language able to handle XML and files We can use the .NET language LINQ to query the XML files in an easy manner We have prepared precomputation by storing the AST data in an XML file. This will have a positive impact on computation time later in the process because we don’t have to run the CALparser every time we do analysis. 8.2 New data representation As decided above we store the parsed code in an XML file. The file follows the rules for well formed XML(67). We give a smaller example of our XML structure and leave the larger study to the reader. The full XML representation of Tables and Codeunits can be found in the appendix DVD in the folder Code\Abstract Syntax Tree XML Files\. To give an example of how C/AL code is represented in our XML document we have chosen to extract the XML node for the following simple IF statement: IF PaymentTermsTranslation.GET(PaymentTerms.Code,Language) THEN PaymentTerms.Description := PaymentTermsTranslation.Description; Code 8-1 C/AL IF statement The IF statement in Code 8-1 is the first statement in the first table in the system. It is extracted from the TransalateDescription method in the table Payment Terms. 52 Implementation The IF statement uses GET to check if a row exists in table Payment Terms Transalation with fields matching the two given parameters (PaymentTerms.Code and Language). If GET returns true the description field in Payment Terms is assigned the value of the description field in table Payment Terms Transalation. The corresponding XML data for the two lines in Code 8-1 is provided in Code 8-2. The AST stores both the structure of the code and the meaning of each token and is logically more space consuming than the original code. The two lines of C/AL code in Code 8-1 require 36 lines in our XML document. <BeginEnd> <Stmt> <StmtIfThen> <If> <OpName><![CDATA[.]]></OpName> <Exp1><![CDATA[PaymentTermsTranslation]]></Exp1> <Exp2> <ExpCall> <Name>GET</Name> <Param> <OpName><![CDATA[.]]></OpName> <Exp1><![CDATA[PaymentTerms]]></Exp1> <Exp2><![CDATA[Code]]></Exp2> </Param> <Param><![CDATA[Language]]></Param> </ExpCall> </Exp2> </If> <Then> <StmtAssign> <OpName>:=</OpName> <Exp1> <OpName><![CDATA[.]]></OpName> <Exp1><![CDATA[PaymentTerms]]></Exp1> <Exp2><![CDATA[Description]]></Exp2> </Exp1> <Exp2> <OpName><![CDATA[.]]></OpName> <Exp1><![CDATA[PaymentTermsTranslation]]></Exp1> <Exp2><![CDATA[Description]]></Exp2> </Exp2> </StmtAssign> </Then> </StmtIfThen> </Stmt> </BeginEnd> Code 8-2 Segment from our abstract syntax tree XML representation A text file containing the code for all tables in the application counts 179,835 lines of code and has a size of 10.5 MB. In comparison an XML file storing the AST representation of the code counts 1,093,223 lines of code and has a size of 47.5 MB. 8.3 CAL parser extension for table relations As stated in section 3.1.1, parsing the TableRelation (hereafter TR) property is necessary for analyzing the codebase. TRs express the foreign key of a field, but the syntax is far from trivial. TRs can contain both IF, ELSE IF, WHILE, FILTER and GROUP statements. Furthermore, they are not restricted to reference a single table/field and the syntax causes identifier names to vary. 53 8.3.1 Generic parser vs. specific parsing rules As described in section 3.1.1, the generic parser for C/AL has a few shortcomings that require further work. Extending the CALParser to fully parse TRs has shown not to be a good solution because the lexer and parser definitions have rules that do not apply to the semantics of TRs. TRs can contain variables with names containing spaces and dots and other special signs without being quoted with “” Another aspect is that the CALParser already is a fairly complicated piece of software (3000+ LOC). Further we have found that extending the generated abstract syntax tree (AST) cannot be done after it has been generated due to the immutable datatypes in F# (see section 7.3.1.2). Therefore we decided to export the generated AST to a data format enabling us to manipulate, extend and query the data easily. This is described in section 8.1.1. This leaves us to decide whether a generic TR parser is needed or if a few specific parsing rules can do the job. Therefore we looked at the pros and cons of both solutions to find the best approach. Generic Parser Pros: Building a generic parser from scratch would bring the CALParser one step closer to being complete and it would form a solid basis for future projects. Cons: Very time consuming. Techniques such as YACC and LEX are far from trivial and a substantial amount of hours would have had to be allocated in order to develop a generic parser. Specific parsing rules Pros: The goal is to find similar patterns not the unique ones. Developing a few specific rules is fast. Cons: If the need for parsing all TRs should arise, a lot of rules, only occurring once, would be needed and it would probably result in messy code. This parser will not be generic. 8.3.2 Parser Implementation We put a lot of thought into whether we should develop a generic parser with YACC and LEX or if we should create specific parser rules for the TRs we are interested in. We spent time studying the techniques behind YACC, LEX and Regular Expressions and found that building a parser definition with YACC and LEX would require a lot more time than we had set aside for the parsing task. Based on the pros and cons in the previous section combined with that extending the parser was not part of the initial scope for this project the choice of developing a generic parser would require a reevaluation of the end project goal. Therefore we decided to develop specific parsing rules that will cover the required TRs with regular expressions. 54 Implementation 8.4 Parsing rules (68) As described in the previous section, we decided to base our TableRelation parser on Regular Expressions (RE). Therefore we will go over some of the core operations of REs to explain how our parser works. The following section analyzes two expression examples from the code base. The expressions are the rules we use to parse. The complete code for the examples can be found on the attached DVD in the folder: \Code\RegularExpressionParser\RegularExpressionParser\CALRegularExpressions.fs Expression2 Expression2 is the simplest of our rules. The rule matches patterns of the type: "Sales Header" Code 8-3 TableRelation matching Expression2 Expression2 in Code 8-4 is used together with rules Expression0, Expression1, and Expression4 to match the TableRelation patterns that are part of the containment pattern defined in section 6.1. Below is a walkthrough of the rule. let SimpleID = "[a-zA-Z0-9-/\.]+" let AdvancedID = "\"[a-zA-Z0-9\s-/\.\(\)]+\"" let AdvancedOrSimpleID = "(" + AdvancedID + "|" + SimpleID + ")" let Expression2 = "^(?<table>" + AdvancedOrSimpleID + ")" + "$" Code 8-4 Appendix DVD, file \Code\RegularExpressionParser\RegularExpressionParser\CALRegularExpressions.fs The SimpleID string "[a-zA-Z0-9-/\.]+" will match any string containing the letters a-z, A-Z, 0-9, ‘‘, ‘/’, ‘.’ The dot is a keyword used for wildcard. The wildcard can substitute any character. If the match needed is an actual dot it has to be written with a backslash in front of it as in many programming languages. The same is the case for the quotation mark (\“). The quotation with *++ marks that the pattern will match any string containing any of the allowed characters with a length of 1..n (one to many). If an asterisk had been used instead of a plus the multiplicity would have been 0..n (zero to many) and if there had been no plus or asterisks the quotation would only match against a single character. The AdvancedID string matches any string that starts and ends with quotation marks. Further n spaces are (\s), and parentheses (\() and (\)) are allowed in the string. The AdvancedOrSimpleID string is using the syntax ( .. | .. ) this syntax means either or. The RE will match any string of either the SimpleID form or the AdvancedID form. The Expression2 string is our final expression for the simplest of our regular expressions. The start and end character is ^ and $. The ^ character defines that the pattern must start from the first character in the string we match against. $ defines that the pattern must match the last character in the string that we are matching against. The (?<table> variable) syntax allows us to access parts or all of the matched pattern. In this case we have named the entire pattern table (<table>) and can use this keyword from F# to get the variable AdvancedOrSimpleID. 55 Expression3 Expression3 is a bit more advanced. An example of the pattern that the Expression3 rule match is: IF (Table ID=CONST(13)) Salesperson/Purchaser ELSE IF (Table ID=CONST(15)) "G/L Account" ELSE IF (Table ID=CONST(18)) Customer ELSE IF (Table ID=CONST(23)) Vendor Code 8-5 TableRelation matching Expression3 Expression3 defined in Code 8-6 is used to match the TableRelation patterns that are part of the requirements for the generalization pattern defined in section 6.2. After the below box there is a walkthrough of the rule. let AlternateID = "[a-zA-Z0-9\s-/\.]+" let AdvSimOrAltID = "(?<table>(" + AdvancedID + "|" + SimpleID + "|" + AlternateID + "))" //Simple tokens let Const = "=CONST\(" let RP2 = "\)\)" let RPC = "\)," let EmptySpace = "[\s|\t|\n]*" let Dot = "\." let Else = "ELSE\s" let Expression3 = "^" + "IF\s\(" + AdvSimOrAltID + Const + AdvSimOrAltID + RP2 + "\s" + AdvSimOrAltID + "(" + EmptySpace + Else + "IF\s\(" + AdvSimOrAltID + Const + AdvSimOrAltID + RP2 + "\s" + AdvSimOrAltID + ")*" Code 8-6 Appendix DVD, file \Code\RegularExpressionParser\RegularExpressionParser\CALRegularExpressions.fs The AlternateID string matches strings containing spaces and dots without being surrounded with quotation marks. The Const, RP2, RPC, Dot and Else are simple tokens that match what their name implies. The EmptySpace string matches any number of spaces, tabs and new lines. Expression3 has to start with an IF statement (“^IF”) and the last set of parenthesis ends with a * which means that the pattern within the parenthesis can occur any number of times. 8.5 LINQ for Querying We have found LINQ very useful for our project in fact LINQ was the key driver for exporting our AST to XML. This is reflected in our pattern identification algorithms where LINQ is used excessively. The use of LINQ for querying our XML AST enables our algorithms to be of acceptable size and improves the readability of the code. The following three LINQ statements are taken from the C# project MatchingInLinq, found on the appendix DVD in the folder \Code\MatchingInLinq\. The project contains the developed pattern identification algorithms for containments and generalizations. The statements are taken from the class IdentifyContainments which is the class that contains the algorithm for identifying table pairs matching the defined containment pattern described in section 6.1. The method which the statements are extracted from is the ReplaceVariableNameWithRecordName method. This method is used, as the name implies, to replace local variable names with the actual ID of the table the variable is referencing. We 56 Implementation do this to remove the abstraction of local and global variables names for determining the table ID. This step is described as step 6 in our preparation for pattern identification section 8.9.1. The first LINQ statement is a select statement. The structure of the query is: FROM variable IN data WHERE condition SELECT variable. IEnumerable<XElement> variables = from exampVar in referencedTable.Element("TABLEBODY").Element("CODE") .Element("VarDecls").Elements("Var") where exampVar.Element("VarDecl").Element("Type") != null && exampVar.Element("VarDecl").Element("Type").Value == "Record" && exampVar.Element("VarDecl").Element("Number").Value == table.Element("NUMBER").Value select exampVar; Code 8-7 LINQ Example1 - Select record variables with number equals var Return Value: We see that the return type is of type IEnumerable<XElement> which is a collection of XML elements. From: We see that exampVar is referring to the XML elements Var in a document of the following structure: <TABLEBODY> <CODE> <VarDelcs> <Var> </Var> <Var> </Var> …….. <Var> </Var> </VarDelcs> </CODE> </TABLEBODY> Code 8-8 Querying the abstract syntax tree Where: The where condition states that Var must have the element type with the value Record. Further its value number must be equal to the number of the other table in the containment candidate pair we are examining. <TABLEBODY> <CODE> <VarDelcs> <Var> <VarDecl> <Type>Record</Type> <Number></Number> </VarDecl> </Var> <Var> </Var> …….. <Var> </Var> </VarDelcs> </CODE> </TABLEBODY> Code 8-9 Querying the abstract syntax tree 57 Select: Finally if the WHERE condition is satisfied we select exampVar which, as we saw in the From section, refers to the entire Var element. The second example is the simplest of the LINQ examples we present. All this example does is to extract all elements of type Stmt. IEnumerable<XElement> stmts = onDelete.Element("TAPTrig").Element("Body") .Element("BeginEnd").Elements("Stmt"); Code 8-10 LINQ Example2 – Select all statements The third and last example is a join of the two collections returned by example 1 and example 2. IEnumerable<XElement> k = from aVar in stmt.Descendants("Exp1") join bVar in variables on aVar.Value equals bVar.Element("VarDecl").Element("ID").Value select aVar; Code 8-11 LINQ Example3 – Select all Exp1 variables with ID equals var Return: We still return a value of type IEnumerable<XElement>. From: The nature of the original algorithm requires the presence of a for-each loop that iterates over each element in Stmts. We have left out this loop because this section only focuses on LINQ. The current element of the for-each loop is named stmt and we extract the XElement aVar from it. Stmt’s Descendants statement means that any child element of stmt of the given name is returned. It might be defined as a child of a child of a child to stmt and it will still be returned with the Descendants method. Join: The Join operation defined that we will join variable aVar with variable b. The variable bVar is referring to the variable collection which was the result in example 1, see Code 8-7. On: The On keyword is the Where clause of a join. It declares which variables that should be used to join on. The statement forms the condition that the value of aVar should match the ID of variable bVar. Select: If the above On clause is true variable a is added to the result set. The above examples have shown how LINQ can be used to easily query the abstract syntax tree. As described earlier the provided examples are used to return all relevant variable references in the code of a table. The algorithm for the containment pattern identification, see section 8.9.1, will rename each element in the result from example 3 as part of step 6 in the algorithm. 8.6 Lambda Expressions in action We use lambda expression (LE) throughout our code. A search through the project code finds 30 lambda expressions. We have chosen to present four of those to illustrate what LE’s can facilitate. The first example is running on a part of the RefactorInheritance method from the MatchingInLinq project. The variable tableList is a list of serializable custom objects created to store inheritance. The C# code defines a foreach statement and the lambda expression code sets the property ToGeneralization to the value of the string genName for every Inheritance object in the 58 Implementation tableList collection. The ForEach function actually maps the lambda expression on every element in the collection. tableList.ForEach(p => p.ToGeneralization = genName); Code 8-12 Appendix DVD, file \Code\MatchingInLinq\IdentifyInheritance.cs, method RefactorInheritance The code in Code 8-13 selects a single XML element from a collection of XML elements. The XML element is selected based on the element <Number> being equal to the Number in our Codeunit object. Here it is worth mentioning that when we are dealing with Navision objects such as Tables, Codeunits, Forms and Reports, both their Number and ID (name) are unique. Therefore we are certain in the example below that we only have one match. XElement codeunit = codeunits.Single(p => p.Element("Number").Value == cu.Number); Code 8-13 Appendix DVD, file \Code\MatchingInLinq\IdentifyREAConcepts.cs, method FindProcedureReferences The next example, Code 8-14, is illustrating how lambda expressions can be nested in conditional expressions. The code checks if we have a global variable defined on Codeunit level with an ID matching the value of s. If this is the case, the Any statement returns true thereby fulfilling the if statement. if (globalVariables.Any(p => p.ID == s) Code 8-14 Appendix DVD, file \Code\MatchingInLinq\IdentifyREAConcepts.cs, method Call The last statement we bring, Code 8-15, is based on a ForEach method just like the first example. This expression is different though. First of all it is being called on the Array class instead of being run on an instance of a data structure. The actual data structure is provided as the parameter, in this case temp. The second parameter p points to the current element in temp and each element is added as a new child element to the XML element x as the value of the element PartOfkey. //Type of data to parse "{ ;Document Type,No. " temp = x.Value.Replace("{ ;", "").Split(new char[] { ',' }); Array.ForEach(temp, p => x.AddAfterSelf(new XElement("PartOfKey", p.Trim()))); Code 8-15 Appendix DVD, file \Code\MatchingInLinq\ParseRemaningElements.cs, method ParseElementsToXML From our use of lambda expressions, we have found them to be a valuable tool. We have found that lambda expression enable us to write cleaner and shorter code. We estimate that on average a single line with a lambda expression requires 6 lines of code to be expressed without lambda expressions. Therefore we save 5 lines of code per lambda expression on average. We have used 30 lambda expressions in our project which probably means that we have been able to shorten our code with around 150 lines of code. This is not a lot but in larger projects, or if this project would be elaborated, the importance would grow accordingly. 8.7 Graph generation We aimed at delivering a tool for displaying graphs. Initially we experimented with automating Microsoft Visio. The generated diagrams were good but the computation time was excessive. Generating a graph with 100+ elements took hours which is completely unacceptable if we aim at delivering an interactive tool. Therefore we looked into alternatives and found Microsoft Automatic Graph Layout. 59 8.7.1 Microsoft Automatic Graph Layout (69) Microsoft Automatic Graph Layout (MSAGL) is an internal research project by Microsoft Research offering easy graph generation and dynamic layouts with support for the Multi Dimensional Scaling (MDS) algorithm and the Sugiyama and Ranking layout schemes. We did however quickly find two caveats with MSAGL 1. The Windows Presentation Foundation (WPF) component was in a very rough state 2. MSAGL only supported two arrowhead types The first was a minor problem because its counterpart for Windows Forms proved to be very stable and fast. We found that MSAGL is able to generate large graphs of 100+ elements in a few seconds which is acceptable for our tool. The second caveat was of more serious nature because we needed support for UML symbols to present our findings. Therefore I took contact to Lev Nachmanson (70) from Microsoft Research in Redmond to get access to the source code. Lev replied he would like to add the expansion to the pack and granted us access to the source depot. We added support for the arrow head styles for the UML symbols for containment, aggregation and generalization and sent our contribution to Lev. The changes were accepted without comments and have been checked into MSAGL allowing others to model UML components with MSAGL. Figure 8-1 Arrow heads for Containment, Aggregation and Generalization The Arrow head styles added are shown in Figure 8-1 on the left. The figure shows: Containment, from B to A Aggregation from C to A Generalization from D to A 8.8 Performance boosts The aim is to deliver a tool able to generate graphs in seconds. With the MSAGL toolkit we are able to do this. The parsing and pattern identification algorithms do however still require minutes to execute. We therefore added pre-computation to speed up the user interaction for our Concept Viewer tool. Pre-computation has been added the following places: CALParser and Subparser As described in section 8.1.1 we have changed our parser into writing XML files. This allows the pattern identification algorithms to run directly on the AST stored in the XML files instead of having to parse the C/AL code. Containment Identification The containment algorithm produces two files storing respectively containment and aggregations pairs. Generalization Identification 60 Implementation The generalization algorithm produces two files storing respectively generalization sets and refactored generalization sets. The four files produced by the containment and generalization identification algorithm are stored in a serialized data structure and saved in a binary file. The Concepts Viewer loads the files at startup and has fast access to all results. The pre-computation has resulted in a satisfying performance for the graph generation. 8.9 Algorithm design The language of choice for implementing the algorithms identifying containments, aggregations and generalization candidates is C#. C# was the logical choice because it offers seamless integration with F# and our ability to produce good code fast in C# is better than in F#. The algorithms are described in pseudo code and we are only listing the core steps of the algorithms. 8.9.1 Containment The pseudo code for the containment algorithm can with advantage be read with the manual analysis of the containment pattern from section 6.1 in mind. The pseudo code for the algorithm is provided in the box below. Additional steps to explain the behavior of the algorithm are the following: 1. The XML AST is queried to return all tables. 2. For each table in the results of step 1 we query to find all TableRelations. 3. For each TableRelation we check if the TableRelation type is of type Expression0, Expression1, Expression2 or Expression4 (these can be seen in section 8.4). 4. If step 3 is satisfied we check if the table contains an OnDelete trigger. 5. If the existing OnDelete trigger contains references to other procedures we need to search through their code. We do this by inserting the code from the procedure until there are no more procedure references in our OnDelete trigger. This means that we insert code that needs to be inspected in order to check whether it contains a procedure reference. 6. When all code is collected variables are replace with their type. Meaning that if the code contains a variable Cust of type table 18 Customer, we replace Cust with Customer. This is done to remove the abstraction of variable names to prepare the AST for the final step. 7. Now we have an AST representing the OnDelete trigger in which we have collected all the code it contains and reference. Furthermore, the AST is simplified by the removed abstraction of variable names (step 6). The AST can now easily be query for delete/deleteall function calls and check if the table they are calling on is the table we started querying in step 2. 8. If step 7 is satisfied, we know we have found a containment. If on the other hand, step 4 or 7 was not satisfied we know that we have an aggregation. Find all tables Do for each table { Find all TableRelations for the current table Foreach TableRelation { If TableRelation is of type 0,1,2 or 4 { Get ID of referenced table Find the referenced table‟s OnDelete trigger 61 If (OnDelete trigger exists) { While (Trigger contains procedure calls) { Replace procedure calls with code of referenced procedure code } While (Trigger contains unconverted variable names to tables) { Replace variable names with ID of the table they reference } If (OnDelete trigger contains a DELETEALL for the table that holds the TableRelation) { we found a containment } else { we found an aggregation } } Else { We found an aggregation } } } } Code 8-16 Pseudo code for the Containment algorithm 8.9.2 Inheritance The pseudo code for the inheritance algorithm can with advantage be read with the manual analysis of the generalization pattern in section 6.2.2 in mind. Following the approach from above we have provided pseudo code for the algorithm in the box below accompanied with a numbered list providing more detail on some of the algorithm steps. Step 1-3 are identical to the containment algorithm with the exception that the TableRelation type we are looking for in this case is of type Expression3. 4. Find the field with ID or Number equal to the value of <ItemField></ItemField> in the parsed TableRelation. 5. If the identified field in step 4 contains the property FIPOptionString, we extract all <ItemVariable></ItemVariable> fields. 6. The extracted FIPOptionString is matched with the extracted ItemVariables. 7. If we found a match in step 6, we extract all <ItemTable></ItemTable> values and add a new inheritance object from the table we are currently working on to all tables extracted from <ItemTable>. Find all tables Do for each table { Find all TableRelations of type 3 Foreach TableRelation { Var_1 = field with ID or Number == ItemField of TR IF (Var_1 has property FIPOptionString) { 62 Implementation Get ItemVariables references from TableRelation If (Var_1.FIPOptionString match ItemVariables) { Get ItemTable reference from TableRelation Add new inheritance } } } } Code 8-17 Pseudo code for Generalization algorithm 8.9.3 Inheritance – Reusing generalization objects The findings of the algorithm above allow us to map the identified relations as pure associations (the way it is implemented today) or to refactor them into generalization objects. One of the great advantages of using generalization objects is that we can reuse the generalization objects and thereby reduce their number. This algorithm identifies generalization object matches and allows us to present the refactored objects. The outcome of this algorithm is available in section 9.4. The examples are described similar to prior examples. 1. The input of the refactoring algorithm is the result from the inheritance algorithm. The refactoring algorithm is iterated until all inheritance objects in the inheritance collection have been refactored. 2. Select the first inheritance object 3. Find all inheritance objects where the inheritance property ToRecords match the ToRecords property of the inheritance object selected in step 2. 4. If the inheritance object from step 2 is unique, i.e. if we do not find any inheritance objects with a matching ToRecords property, we remove the object from the collection of non refactored inheritance objects and add the inheritance object to our result collection. 5. If we find inheritance objects with a matching ToRecords property, we remove them from the collection of non refactored inheritance objects. Rename the ToRecords property to a common name and add the objects to the result collection. Find all inheritance objects While (exist inheritance object not refactored) { Get first inheritance object. Find all inheritance objects with matching „ToRecords‟ If(inheritance object is unique) { Remove inheritance object from collection of not refactored objects Add inheritance object to result } else { Remove inheritance collection from collection of not refactored objects Rename „ToRecords‟ to common naming Add inheritance collection to result } } Code 8-18 Pseudo code for refactoring Generalization objects 8.9.4 Implementation of REA identification This section covers the implementation of the steps identified in section 6.3.4.1 and elaborated in section 6.3.5. During the implementation of step 1 in the original identification algorithm we ran into a problem related to locating the relevant code lines. The root of the problem was that the pattern does not require 63 code lines to be located in the same procedures or even Codeunit. In principle it could be stored in a table trigger as well. When we started out we assumed we could identify a delete on a journal (step 1.c) and then find a referenced procedure copying fields to an entry and inserting the entry (step 1.a and 1.b). Unfortunately this is not the case. When we use this approach we find: ID 1 2 3 4 5 6 7 To Entry Table 21 Cust. Ledger Entry 25 Vendor Ledger Entry 5802 Value Entry 355 Ledger Entry Dimension 281 Phys. Inventory Ledger Entry 355 Ledger Entry Dimension 355 Ledger Entry Dimension From Journal Table 81 Gen. Journal Line 81 Gen. Journal Line 83 Item Journal Line 83 Item Journal Line 83 Item Journal Line 83 Item Journal Line 356 Journal Line Dimension Table 8-1 REA results from step 1 The reason we find such a small result set is because the lines we look for are widely spread among different Codeunits and procedures. To illustrate this we will explain how we find result 1 from table 1. 8.9.4.1 Finding the value set Cust. Ledger Entry / Gen. Journal Line The relationship of result 1, and 2 for that matter, is initiated from Codeunit 13 Gen. Jnl.-Post Batch (hereafter CU13) as the naming of CU13 implies this Codeunit is used to post General Journal Lines as a batch process, meaning that the process runs without any visible output to screen. From the earlier definition of journals we know that posting a journal refers to inserting data from the journal into a ledger entry. Therefore it makes sense to assume we will be able to observe the copying of journal data to entry data in CU13. When we inspect the code of CU13 to find the code expressing the pattern, we find that CU13 is calling the Codeunit 12 Gen. Jnl.-Post Line (hereafter CU12) for posting the journal to an entry. Hereafter CU13 performs a delete of the journal line (step 1.c). Therefore our algorithm has to analyze the code of all referenced Codeunits and procedures. We find that the posting of the journal is code by CU12’s PostCust procedure. When we look at the reference chain for the PostCust, procedures that directly or indirectly reference PostCust we find that this produces a graph of 152 procedures. Building this graph is a time consuming task because we are querying large amount of XML data many times. For building the graph we need to do the following, starting at the identified starting point: 1. 2. 3. 4. Look up all variables names in the procedure to check if they refer to the table we work on Analyze all variable assignments in the procedure to check if they follow the defined pattern Find all references to the current Codeunit procedure Do step 1-4 for each result of step 3 Our first design was recursive but that approach required our graph to be acyclic so we had to redesign it to an iterative design when we ran into cycles. Figure 8-2 shows a segment of the produced graph for PostCust containing 152 elements. The model displays the five procedures (marked with blue 64 Implementation background) that are identified as part of the pattern we identified from Table 6-9 in section 6.3.5. Further the model display how large a graph is produced when we search through all procedure references. Finally the figure show the identified reference cycle (marked with grey background). Figure 8-2 Illustrating procedure references between CU12 and CU13 8.9.5 Limitations by approach and solution suggestion As we found in the previous section we cannot know for certain which procedures will perform the delete of journal lines and any of the 152 referencing procedures can have a reference that performs the journal line delete we have to build a reference graph for each element in each reference graph. This would be a very time consuming task both to program and to run and cannot be advised. The performance of the algorithm we have at this point takes minutes to execute and this approach would heavily induce the execution time. The bottleneck of the computation is the querying of XML data. This querying could be cached if we created a complete lookup system analyzing the code in a top down fashion instead of traversing the code in reference chains. The lookup system would also need to store table object actions that could fit into a step in the requirements for REA candidates described in section 6.3.5. 65 8.9.6 Analysis of initial REA results Before we chose to start implementing a time consuming lookup system described in the above section, we decided to use the results found by our initial REA identification algorithm. As described earlier our REA algorithm only identifies the variable sets of step 1 in 6.3.4.1 if they are located in the same reference chain i.e. a procedure referencing PostCust deletes the journal line after the execution of PostCust as in the example of Figure 8-2. The idea is that we would like to get an indication of the quality of the pattern to see if the improvements suggested above could be used. Therefore, we have accepted the initial results from table as the result of step 1 in the identification algorithm from section 6.3.4.1, and will manually perform step 2 and 3 of algorithm. When we look at step 2 of our identification algorithm with the results we found in Table 8-1 we find that results 4, 6 and 7 are false positives. The Journal Line dimension and ledger entry dimension tables contain information regarding journal lines or ledger entries but they are not journals or entries themselves. Therefore we remove these results from our findings. This leaves us with 4 candidates to examine in step 2 namely 1, 2, 3 and 5 listed in Table 8-2. The REA Candidate is a table we have identified as writing values to a journal line. ID 1 2 3 5 REA Candidate 18 Customer 23 Vendor 27 Item 27 Item To Entry Table 21 Cust. Ledger Entry 25 Vendor Ledger Entry 5802 Value Entry 281 Phys. Inventory Ledger Entry From Journal Table 81 Gen. Journal 81 Gen. Journal 83 Item Journal 83 Item Journal Line Line Line Line Table 8-2 REA candidate sets Common for the four sets above is that there does not exist any TableRelations on the fields in the entry table’s primary keys. This does not fulfill the requirements of step 3.a (section 6.3.4.1) because a TableRelation is a required part in our definition of a containment (section 6.1). This leaves us with method 3.b (section 6.3.4.1) which is to identify the candidate based on the naming of the tables. We see that REA candidate 2 maps good to its entry table but candidates 1, 3 and 5 maps poorly. This indicates that even with a larger result set from step 1 we would still not be able to match results in a satisfying manner based on the chosen identification algorithm. Taken the complexity of implementing step 1 correctly (section 6.3.4.1) and the indication of bad results from step 2 and 3 we have decided not to invest more time in implementing this identification algorithm. Another general problem with the examined approach is the information we are able to extract from the events. It is a good approach to identify the events because they are the behavior in the system i.e. selling goods, buying raw materials, paying employees etc. When analyzing the events we are able to find the tables supplying information to the event. If the event was a sale the event would contain information such as Customer, Item, etc. With our knowledge on accounting, REA and our trust in the NAV naming conventions we can refer that Customer will be an Agent and Item will be a Resource. But from a code perspective we do not have the same luxury. We are not able to differentiate Resource tables from Agent tables. Further there exist tables that write values to journal lines without being part of a REA relationship. Therefore the results from a REA algorithm will at best be a table listing all reliable REA candidates. 66 Implementation We believe the code in general is too long and too complex for the chosen approach and we would suggest looking for an alternate identification pattern. Earlier work trying to modularize the architecture in NAV to components concluded that it was not possible to achieve due to high levels of interdependencies. “… we tried to modularize the architecture via components, but it turned out that the interdependency levels were in general far too high to make a reasonable modularization.”(71) Which to some extend could be described as part of the same problem we have been experiencing related to the many references within the code. Chapter 9 9 Results The result section presents the results of each of our identification algorithms. The coverage is calculated to provide information on how good our algorithms are. Hereafter we present the Concept Viewer developed to present our findings. The Concept Viewer is the tool that the Navision Application team can use if they wish to explore the identified relations. Finally we will present diagrams for interesting cases of relations. From the initial analysis we found Visio to be a slow diagram producer we did however get some interesting information from the diagrams. Visio organizes similar structures in the same manner allowing one to discover repeating patterns simply by looking at the component in the diagram. This way we found evidence that our approach was working and that the number of repeated patterns was high. One of the repeating patterns can be seen in Figure 9-1. The figure shows a pattern constellation related to journal tables. The pattern shows that a Journal Line requires both a Template and a Batch to exist and that the Batch requires the Template. The pattern is repeated around 10 times throughout the NAV application. Figure 9-2 shows the same pattern as presented in the Concept Viewer. FA Journal Template * -End186 1 -End185 1 -End189 FA Journal Batch * * 1 -End190 -End188 -End187 FA Journal Line Figure 9-1 Template, Batch, Line pattern found in the initial Containment work Figure 9-2 Template, Batch, Line pattern in the Concept Viewer 68 Results 9.1 TableRelation Analysis and TableRelation Parser Quality Assurance The goal for the TableRelation (TR) parser is as described in section 3 to parse TRs for table fields. The following section explains the technicalities behind the parser and evaluates the coverage of the parser. We have decided to base our parser on 5 regular expressions. This section presents the results from analyzing the parser coverage. 9.1.1 Matches on Key fields As defined in section 6.1, one of the requirements for the containment pattern is that the there exists a TableRelation (TR) from the primary key on the dependent table. Therefore it is relevant to look at our parsing coverage for the subset of the TRs defined on table keys. From our program we know that the number of TRs defined on keys is 753. Table 9-1 is the result from analyzing the coverage of each of our regular expressions. It is interesting to note that pattern 1 (Expression2) counts for 56.3 % coverage of all TRs and that we are able to get a TR coverage of 95.1 % with the 5 patterns. ID 1 2 3 4 5 Pattern Frequency Coverage Expression2 424 56.3 % Expression0 154 20.5 % Expression3 48 6.4 % Expression1 66 8.8 % Expression4 24 3.2 % Total Frequency Total Coverage 424 56.3 % 578 77.8 % 626 83.1 % 692 91.9 % 716 95.1 % Table 9-1 Key fields To ensure that the high coverage percentage found in Table 9-1 is accurate, and to leave out the possibility that a single couple of TRs are repeating throughout the code and that we might not find variations over the pattern, we decided to extract all unique TRs. We redid our calculations with the 292 unique TRs we found from the set above. Table 9-2 lists the result and a resemblance in the distribution over our five patterns can be seen as well as the coverage is still very acceptable. The total coverage is 93.5 % only leaving out 19 TRs from our further analysis. ID 1 2 3 4 5 Pattern Frenquency Coverage Expression2 131 44.5 % Expression0 54 18.5 % Expression3 38 13 % Expression1 35 12 % Expression4 15 5.1 % Total Frequency Total Coveage 131 44.5 % 185 63.4 % 223 76.4 % 258 88.4 % 273 93.5 % Table 9-2 Unique key fields 9.1.2 Matches on all fields The generalization pattern is analyzing all TRs and does not distinguish between tables that are part of a key from those that are not. Therefore we have done our analysis on all TRs as well as the analysis above only concerned with TRs defined on keys. The entire application has 5011 TRs. The table below displays the frequency distribution for each of our five patterns. We see that we have a coverage of 67.2 % on a single pattern. Further we see that we have a total coverage 95 % which is the same as the outcome of our analysis in Table 9-1 69 ID 1 2 3 4 5 Pattern Frequency Coverage Total Frequency Total Coverage Expression2 3369 67.2 % 3369 67.2 % Expression0 858 17.1 % 4227 84.4 % Expression3 332 6.6 % 4559 91 % Expression1 136 2.7 % 4695 93.7 % Expression4 66 1.3 % 4761 95 % Table 9-3 All fields To complete the analysis, we analyzed all unique TRs. We found that the application code contains a total of 651 unique TRs. The analysis below show the lowest pattern 1 coverage with a coverage of only 35.8 percent. This indicates a high repetition in this type of TRs. Further we see that we get a total coverage of 90.1 % leaving out 59 TRs from our further analysis. ID 1 2 3 4 5 Pattern Frequency Coverage Expression2 233 35.8 % Expression0 124 19.1 % Expression3 136 20.9 % Expression1 69 10.6 % Expression4 30 4.6 % Total Frequency Total Coveage 233 35.8 % 357 54.8 % 493 75.7 % 562 86.3 % 592 90.1 % Table 9-4 All unique fields From the above analysis we see that the system contains 5011 TRs and that the subparser parse the 4765. This excludes 246 TRs from our further analysis. Of the excluded TRs we know from Table 9-4 that there exist (651 in total – 592 parsed) 59 unique patterns to parse. To reach 100 % coverage by creating parsing expressions for the 59 remaining patterns is trivial. It has been left out in this work due to the following reasons. Complete coverage is not a priority goal. We parse the TRs we are interested in for our analysis of containments and generalizations and if the need for parsing more TRs would arise each pattern could be added in around 30 min. We believe that we achieved to get a high TR coverage and, at the same time, to keep our code clean. We left out rules that parsed single or few TRs because there was no need for them in the further analysis. 9.2 Results for Containment Pattern We found in our TR analysis that there are at most 644 containments in the application by adding the frequencies of expression0, expression1, and expression2 in Table 9-1. The algorithm we have developed identifies: 148 complete matches 368 partial matches (OnDelete Trigger does not contain Cascading Delete) 95 partial matches (No OnDelete Trigger) 33 partial matches not possible to classify (Referenced object not a table) In total 644 matches. (100 % coverage) 70 Results The 148 complete matches are pure containments after the pattern we have defined in section 6.1. We are only searching through table code for cascading deletes and the code has to be placed in or called from the OnDelete trigger of the referenced table. The 368 partial matches are matches where the referenced table does contain an OnDelete trigger but there is no cascading delete. The 95 partial matches are matches where the referenced table does not contain an OnDelete trigger definition. The 33 partial matches are matches where the structure of the TR matches the patterns for the containment TRs. They are, however, not part of the containment pattern, although they match the pattern. Common for all 33 TRs are that they are using system keywords such as Object, Field and AllObj. An example of such a TR is Table 78 Printer Selection. This table has a TR on the Report ID field: Object.ID WHERE(Type=CONST(Report)) Code 9-1 Special case of TableRelation ignored in Containment analysis The use of Object implies that the reference can be any object. The WHERE clause limits this to objects of the type Reports. As expected we are finding a large result set of partial matches. Fortunately we can actually use the information for the following reason. The containment pattern we are searching for is a special case of an aggregation namely a strong aggregation. The requirements to a normal aggregation match the requirements of the 368 and 95 identified partial matches and they can therefore be modeled as aggregations. This came as a byproduct from our algorithm without being in the original project aim. The outcome from the containment algorithm analysis is therefore that we have found that 148 out of 644 candidates were containments. 463 out of 644 were regular aggregations and 33 out of 644 did not contain references to tables. We know that we do not have complete coverage on containments because we do not scan through code placed in Codeunits. One example of such a cascading delete, which we do not detect, is the call of the PostSalesLines-Delete Codeunit from table 110 Sales Shipment Header’s OnDelete trigger. We do detect the aggregation, but we do not analyze the Codeunit code that contains the actual delete. The PostSalesLines-Delete Codeunit contains methods to delete rows in the following tables: Sales Shipment Line Sales Invoice Line Sales Cr. Memo Line Return Receipt Line Posted document Dimension 71 The application code contains in total 5 Codeunits where the name implies that they are used for deleting rows in tables. It is very likely that they all contain cascading deletes that are not detected at this point. It would have been desirable to have the parsing and analysis of Codeunits as part of the algorithm. This would, for sure, make our results more accurate. We decided to prioritize differently when Jesper Kiehn introduced the concept of REA in the beginning of March 2009. Instead we decided to extend the project scope with an examination of the possibility of identifying candidates for REA. This settled that we would not have time to iterate the containment and generalization patterns to reach complete coverage. During our REA work we exported parsed Codeunit data to XML. This is the basis for our analysis of Codeunits. We estimate that the improved algorithm could be implemented in 2-3 days. 9.3 Results for Generalization Pattern We found in our TR analysis that there are at most 332 generalizations in the application (frequency of expression3, ID 3 in Table 9-3). The algorithm we have developed identifies: 303 complete matches 4 ignored matches 25 matches with no corresponding OptionString In total 332 matches. (100 % coverage) The 303 matches are complete matches on our generalization pattern defined in section 6.2. The 4 ignored matches are TRs that do not contain a one to one match against an OptionString. The TR from table 39 Purchase Line field No. in Code 9-2 is one of the four. The TR has an ELSE IF statement on the constant value 3. This is a code defect where the value is linked to the index of the OptionString instead of the value. We have not added a specific rule to detect these 4 special cases. IF (Type=CONST(" ")) "Standard Text" ELSE IF (Type=CONST(G/L Account)) "G/L Account" ELSE IF (Type=CONST(Item)) Item ELSE IF (Type=CONST(3)) Resource ELSE IF (Type=CONST(Fixed Asset)) "Fixed Asset" ELSE IF (Type=CONST("Charge (Item)")) "Item Charge" Code 9-2 Special case of TableRelation ignored in Generalization analysis The application team should probably consider changing the TRs defined on: Purchase Line.No. Standard Purchase Line.No. Requisition Line.No. Purchase Line Archive.No. Finally we have 25 matches where the TR is of the correct type but no OptionString exist on the field. We did some further investigation to analyze these 25 matches. We found that the field numbers 71, 72, 5714 and 5715 defined on tables: 37 Sales Line, 39 Purchase Line, 5108 Sales Line Archive, and 5110 Purchase Line Archive account 72 Results for 16 out of the 25 matches. They are all of the form below, which means that they are dependent on a Yes or No value from another field in the table. IF (Drop Shipment=CONST(Yes)) "Purchase Header".No. WHERE (Document Type=CONST(Order)) Code 9-3 Special case of TableRelation dependent on YES/NO values ignored in Generalization analysis Of the remaining 9 matches the following 7 are coded to use integer values and should probably use field name instead: 87 Date Compr. Register (on field) 12 Register No. 352 Default Dimension (on field) 2 No. 368 Dimension Selection Buffer (on field) 5 Dimension Value Filter 368 Dimension Selection Buffer (on field) 5 Dimension Value Filter 458 Overdue Notification Entry (on field) 3 Document No. 7340 Posted Invt. Put-away Header (on field) 7306 Source No. 7342 Posted Invt. Pick Header (on field) 7306 Source No. The last 2 TRs not accounted for were encoded correctly but not related to any OptionString. These TRs express special cases that could be analyzed as generalizations but they do not satisfy the pattern requirements we have defined. Further giving that the 23 other matches according to J. Kiehn would probably be candidates for a rewrite, we decided not to look further into these relations. 9.4 Results from refactoring the generalization objects In section 9.3 we listed that we identified 303 matches on the generalization pattern. To simplify the code and draw benefit from the object oriented principles, we have implemented an algorithm to reduce the number of generated generalization objects by sharing identical objects. The algorithm is described in section 8.9.3. We found that from 303 generalization objects only 5 were unique throughout the code. The remaining 298 matches occurred two times or more. This way we were able to reduce the 298 objects to 86 objects. This can be explored in the Concepts Viewer by checking the menu item “Relations To Map” -> “Map Generalization” -> “Map Refactored Generalization Objects”. 9.5 The Concept Viewer and its output The main outcome of this project has been identified relations and the Concept Viewer application able to view the relations. As described in section 8.8, we analyze the C/AL code and save the results to four binary data files storing respectively: Aggregations Containments Generalizations Refactored Generalizations As mentioned earlier, the Concept Viewer tool is a tool to dynamically display the content of the data files in an easy way thereby enabling the user to choose which concepts to map and which objects to map. 73 The Concept Viewer facilitates a dynamic approach to explore our findings. With the Concept Viewer, seen in Figure 9-3, we can map the identified aggregations, containments, and generalizations (without generalization objects, with generalization objects, with refactored generalization objects, and with parent generalizations). The individual mapping options and their outcome are illustrated by examples in the following sections. Tables can be mapped on table or granule level and are simply selected by selecting the object in the tree view in the left side of the form, the diagram is updated accordingly. The Concept Viewer allows the user to zoom, undo, drag and drop single/multiple elements, save diagrams as images or vector graphics, and other commonly expected features of a graph viewer tool. The Concept Viewer can be installed from the attached DVD by running the \Concept Viewer Installer\Concept Viewer.msi. Figure 9-3 The Concept Viewer 9.5.1 Sales Header and Sales Line – Aggregations and Containments We start out by exploring the tables Sales Header and Sales Line. These two tables are interesting because we based much of our manual analysis on these two tables. The diagram in Figure 9-4 displays the containment from Sales Line to Sales Header displayed in the manual analysis in section 6.1.2. From the manual analysis, knowledge of UML and the foundational relation definition of part_of in section 4.2, we know that a containment maps that a Sales Line cannot exist without a corresponding Sales Header. Further we see that Sales Header has two more aggregations. Namely the aggregations from respectively Item Charge Assignment (Sales) and Sales Planning Line. From our knowledge about UML, we know that the aggregation implies that Sales Header’s relationship to these two tables is of type ‘has a’ which defines that Sales Header has one or more instances of (rows in the table) Sales Planning Line. 74 Results Figure 9-4 Sales Header and Sales Line – Aggregations and Containments 9.5.2 Sales Header and Sales Line – Associations The diagram in Figure 9-5 displays the associations that we map from Sales Header and Sales Line. As described in section 6.2, the high number of associations in the application code has proven to be too inconvenient when the behaviour of code is changed. The high number of associations gave the idea to refactor some of these associations to a generalization object thereby reducing the total number of associations. This diagram is the first we present in a row of related diagrams presenting the architechture in different ways. This diagram displays the associations as they actually are defined among the tables. Figure 9-5 Sales Header and Sales Line Associations 9.5.3 Sales Header and Sales Line – Associations refactored via Generalization objects As described, the goal is to reduce the total number of relations on the mapped tables. In diagram Figure 9-6 we have refactored some of the associations to generalization objects. The associations are declared on different fields and we require them to be defined on the same field for being part of the same generalization. Further we have chosen not to create generalization objects for single associations. This means that Sales Line has a field with a single releation to Item Variant. We do not add any value by creating an extra object when the number of relations from Sales Line will be the same, therefore we have left these out. The diagram contains 3 generalization objects reducing the total number of associations for Sales Header from 2 to 1 and Sales Line from 10 to 5. Furthermore, this abstraction has allowed us to see that we in fact have more than one reference to the Item table. It is important to note that this refactoring is an abstract refactoring suggestion. It would not be possible to implement it in C/AL at this point because the C/AL language does not have support for inheritance. The diagram is, however, still very valuable for two reasons. It provides information on how the code is actually organized: How the associations are defined and how they are grouped. Looking at the diagram is significantly faster than reading the code. Further with an understanding of UML we can easily learn a lot about the code simply by reading a diagram. Secondly serious thoughts around the future of the NAV application and the use of C/AL is being thought in the NAV organization. The product itself is moving to C# and it would be an option to move the C/AL application to 75 C#. This is described more in section 2.3.3. The syntax on the generalization objects is “table number” “table ID (name)” : “field name”. Figure 9-6 Sales Header and Sales Line Generalizations 9.5.4 Sales Header and Sales Line – Associations refactored via refactored Generalization objects The not so idiomatic header appart, this is quite interesting. In the example above, we introduced a refactoring of association to reduce the total number of association on tables. This refactoring is provided to reduce the total number of association objects. The turquoise boxes in Figure 9-5 are still our generaliztion objects. We have introduced the refactoring that we use the same generalization object type on every table listed in the box. This displays that the generalization object used by Sales Header is used by 17 tables in total. Using a single object would simplify the code and reduce the total number of lines of code. The diagram also reveals resemblance between tables. The generalization object associated to the Sales Line tables Posting Group field shows that every table listed using this generalization object is named something with line implying that the tables with some justification has similarities. We see the same is the case for the generalization object on the Sales Header field Bal. Account No. all except the Payment Method table contain a form of the word Header as part of their name. Figure 9-7 Sales Header and Sales Line Generalizations refactored 9.5.5 Sales Line, Standard Sales Line, and Sales Line Archive –Reused Generalization objects explored In the diagram in Figure 9-7 we could see a resemblance in the naming between the objects using the generalizations. Diagram Figure 9-8 is exploring the generalization object used by Sales Line, Standard Sales 76 Results Line, and Sales Line Archive. It is important to note that the tables do not share a common base. If they share code, it is because the code has been replicated in all tables. When we analyze the diagram below, we see that there is a remarkable resemblance between the relations. The only difference we can see from the properties we map is that the Standard Sales Line table is missing the association to the generalization object for FA Posting Group and Inventory Posting Group. Besides this association, they seem to be completely alike. Here it is important to emphasize that we only map a small fragment of the code and that they can have many other properties that are not alike. Figure 9-8 Sales Line, Standard Sales Line, and Sales Line Archive reused Generalization objects explored 9.5.6 Sales Line, Standard Sales Line, and Sales Line Archive – Generalization objects explored To underline how the mapping of Sales Line, Standard Sales Line, and Sales Line Archive would look if mapped with non refactored generalization objects, we have provided the diagram Figure 9-9 which maps each single generalization object. The diagram still appears readable, but the number of crossing lines has made it more complicated to read than the previous diagram. Furthermore, the larger the diagram gets, the more caotic it will seem. Figure 9-9 Sales Line, Standard Sales Line, and Sales Line Archive Generalization objects explored 9.5.7 Sales Line, Standard Sales Line and Sales Line Archive – Associations explored The following diagram, in Figure 9-10, displays the association on respectively Sales Line, Standard Sales Line, and Sales Line Archive without any generalizations, i.e. how it is actually implemented today. In our opinion, this diagram has a low degree of readability and we do not get much information from reading the diagram. We could have provided information on where the association is defined but in general the diagram is not very useful. 77 Figure 9-10 Sales Line, Standard Sales Line, and Sales Line Archive – Associations explored 9.5.8 Item – Containments and Aggregations mapped with reused Generalization objects When we explore the Concept Viewer, we find that the amount of concepts to map vary a lot from table to table. For some tables there are no concepts to map at all and for other tables there are an immense number of concepts to map making the diagrams chaotic at best. This is interesting because with this knowledge we can deduce an indication of the effect a change in this object will have on the rest of the system. By analyzing different tables we found that the Item table is one of the ground pillars in the application and based on this knowledge we are able to foresee that changes to behavior of the Item table, with a high degree of certainty, will cause side effects in other parts of the application. Mapping the containments and aggregations of Item, discloses that 14 tables have a containment to Item meaning that their sole existence is dependent on the existence of Item. Furthermore, we disclose that 16 tables have an aggregation to Item meaning that they are also dependent on the Item table. Figure 9-11 Sales Line, Standard Sales Line and Sales Line Archive – Associations explored 78 Results 9.5.9 Item –Reused Generalization objects containing Item The Concept Viewer also contains the option for mapping every generalization that the mapped object is part of. Meaning that we are able to map all generalization objects that Item is part of. As expected, given the knowledge from the diagram above, this diagram is large. The diagram does not provide a clean overview because it suffers from information overload. Nevertheless, we are able to see that Item is heavily used by the generalization objects. In total there are 27 generalization objects referenced from 92 tables. When the diagram is displayed in our tool reorganizing, highlighting and zooming can help to extract information from even larger and more complex diagram. Figure 9-12 Item –Reused Generalization objects containing Item 9.6 Feedback from the Application team We presented the Concept Viewer tool for the Application team, Lead Software Development Engineer Bardur Knudsen. We introduced the tool and watched him play around with it. By first glance, what he liked the most was the reused generalization objects. He finds that they are good to reveal similarities in the code. The reused generalization objects can be seen in section 9.5.4, and in particular in Figure 9-7. “What often happens is that we have to change something in a Sales Header but forgets to update Sales Invoice Header, Sales Header Archive or some other related table” He finds that the generalization objects are showing this. Forgetting to change related tables is a problem when data is copied from table to table as described in section 6.3.4. In best case data is lost and in worst case the system could crash. 79 In general he was very pleased with the diagrams, but would like to see more relations covered, especially the ones defined in Codeunits, which was left out due to the work towards REA concepts identification. No documentation for the NAV application exists. Further the app team will not create any, because as B. Knudsen stated “It will be outdated before it is finished”. B. Knudsen would really like to have a tool like the Concept Viewer to create documentation from code on the fly. He liked the diagrams showing containments and aggregations and stated that NAV could benefit from these diagrams, when studying how a given change would propagate throughout the system. He was however also of the opinion that most experienced developers already have this knowledge from years of experience with NAV development. The tool can however be used by less experienced developers to gain this knowledge. Based on the responses from NAV application developers, we believe that this tool could help developers gain a better understand of the application. 80 Results Chapter 10 10 Conclusion The conclusion sums up the work we have done and lists our solution to the problem defined in section 1. The presented solution consists of the following four main parts: Code parsing General domain pattern matching Diagram generation Domain specific pattern matching The achievement for each of the above parts will be described separately in the following sections. 10.1 Parsing the C/AL application We extended the available C/AL parser to parse the fields, which was not parsed by the original parser implementation. We proved that the produced Abstract Syntax Tree (AST) with advantage could be saved in XML. The XML representation made it easy to query, extend, and store the AST. We found that a simple parser with five parsing rules were able to parse 100 % of the TableRelations, matching the defined patterns. In total 95 % of all TableRelations were parsed. The chosen approach was fast, accurate and easy to work with. 10.2 UML Relationship pattern matching We developed identification patterns for the concepts generalization and containment. We found a high number of matches indicating, that the concepts were present throughout the NAV application code. 10.2.1 Generalization We were able to detect 303 candidates for refactoring to generalizations. From the results, we presented two suggestions to refactorizations: 1. The introduction of a generalization object to reduce dependencies between objects from 2..* (two to many) to 1 (one). 2. The introduction of a reused generalization object able to reduce the need for generalization objects with more than 70 %. 10.2.2 Containment The containment algorithm identifies 148 containments. From the partial matches we were able to derive 463 aggregations. We are certain that more containments could be identified from the 463 partial matches. The primary step towards finding more containments is to analyze the code placed in Codeunits. As 82 Conclusion described in section 1, it was chosen to focus on applying domain specific knowledge instead of perfecting the general domain knowledge. 10.3 The Concept Viewer We are presenting our UML findings in a dynamic, fast, and detailed relation viewing tool. The tool was built with the Microsoft Automatic Graph Layout (MSAGL) research project (69). We were able to contribute to the project, by adding support for UML symbols. The feedback we got from the NAV Application Team, was that they liked the tool, and that they found they were able to gain information from the produced diagrams. According to B. Knudsen the NAV organization could really use a tool like the Concept Viewer to assist developers and architects by providing accurate updated documentation dynamically. 10.4 Resource, Events and Agents (REA) relationship pattern matching We are able to easily identify REA Events. Further we are able to identify a few Resource and Agent candidates, but the work is far from complete. The work suffers from two problems: 1. There is a high diversity in the formulation of the code expressing the REA relations. The variations are making it hard to detect the actual code lines expressing the actions expressing the REA pattern. 2. The chosen approach lacks a way to distinguish between REA Resources and REA Agents. Their role in the system is identical according to the pattern we use at this point. Therefore we need a pattern to distinguish between these two REA types. We believe that REA theory, and the REA model in particular is very relevant for NAV. REA provides a way to view the duality in an accounting system and is able to view the transactions. We see REA analysis as a supplement to UML analysis, because the knowledge we gain from applying REA is domain specific in contradiction to the UML concepts that are of a more general nature. Chapter 11 11 Future work and Perspective Inside the NAV organization our work could form a basis for a supplement to a new development environment. Being able to see the design of your work instantly simply by pressing a button that would make the tool parse the code and identify the relations and present the corresponding UML, would contribute to great knowledge. The tool could also be used in assisting the NAV organization in the refactoring of the NAV application, and the tool could easily be extended to cover more relations and refactorizations. There are a number of issues we would like to continue to work with: 1. It would be interesting to compare our findings with the findings from the work of Till Blume, described in section 3.1.2. We expect that we would find similarities in the results from the two tools even though the approach is different. 2. It would be nice to analyze the code placed in Codeunits in our containment and generalization algorithms. This is fairly simple and could be achieved in a couple of days. 3. It would be interesting to include multiplicity in the containment algorithm. We are not detecting whether the multiplicity is a 1-1 (one to one) or 1-* (one to many) relationship. In terms of REA we found that REA did provide useful information on the behavior of the NAV application objects. We believe that REA is very interesting because it allows us to do domain specific modeling as a supplement to the general purpose modeling information provided by UML. We believe that there could be basis for a project focusing on uncovering the REA patterns and modeling the findings preferable as a new version of the Concept Viewer. A promising project started after this work, focusing on building a framework for searching through the code. It would be relevant to see if the REA pattern identification algorithms could benefit from being implemented with this framework. If the project should be started all over again it would be interesting to look into the C# converted C/AL code. The C# code will probably be incorporated even more in NAV v. 7, and if we worked on the C# representation, we would be able to use standard tools or create a tool that could be used for both the product and the application, see section 2.3.2. Further if we make the tool of a more general nature, C# programmers would be able to use the tool to analyze general purpose code. Tools exist for parsing C# code to an abstract syntax tree, thereby we could save a lot of time by relying on a fully working, supported parser. One such tool is antlr (ANother Tool for Language Recognition) (72), but more interestingly Anders Hejlsberg revealed the future of C# after version 4.0 (73) at PDC (Microsoft Professional Developer Conference) 2008. He introduced the concept “Compiler as a Service”, allowing 84 Future work and Perspective developers to use the C# compiler in many advanced ways, including producing abstract syntax trees. It would be very interesting to contact the .NET team, and see if it would be possible to use their work. Chapter 12 12 Abbreviation App Team AST AST BI C/AL C/SIDE CLR CRM DTU ERP FM GDP GUI IMM LE LINQ NAV NDT MDCC MSDN MSIL OP RE REA RTC SCM SH SL TR UI UML Abbreviations For Microsoft Dynamics NAV Application Team Abstract Syntax Tree (in general) Application Service Tier (related to NAV architecture) Business Intelligence Client/Application Language Client/Server Integrated Development Environment Common Language Runtime Customer Relationship Management Technical University of Denmark Enterprise Resource Planning Financial Management Gross Domestic Product Graphical User Interface Department of Informatics and Mathematical Modeling Lambda Expression Language INtegrated Query Microsoft Dynamics NAV Navision Developer Toolkit Microsoft Development Center Copenhagen Microsoft Developer Network (www.msdn.com) Microsoft Intermediate Language OptionString Regular Expression Resource, Events, and Agents Role Tailored Client Supply Chain Management Sales Header Sales Line TableRelation User Interface Unified Modeling Language 86 Abbreviations Chapter 13 13Works Cited 1. Hvitved, T. Architectual analysis of Microsoft Dynamics NAV. s.l. : University of Copenhagen, 2008. 2. [Online] http://www.uml.org/#UML2.0. 3. S. Haag, P. Baltzan, A. Phillips. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, Ch. 10-12. 4. —. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008. Page 134, section 1, line 1 . 5. —. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, page 135, fig. 12.1, fig. 12.2. 6. —. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, page 134, section 3, line 1. 7. —. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, page 134, section 4, line 1. 8. —. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, page 136, section 3 line 1. 9. —. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, page 136, section 1, line 1. 10. S. Jacobson, J. Shepherd, M. D'Aquila, K. Carter. The ERP Market Sizing Report, 2006-2011. s.l. : AMR Research, 2007. AMR-R-20495. 11. D. Roys, V. Barbic. Implementing Microsoft Dynamics NAV 2009. s.l. : Packt Publishing, 2008. 12. Slide deck from MDCC Dynamics NAV all hands meeting. March 28 2008. 13. A cookbook for using the model-view controller user interface paradigm in Smalltalk-80. G. E. Krasner, S. T. Pope. 3, s.l. : SIGS Publications, 1988, Vol. 1. 14. Studebaker, D. Programming Microsoft dynamics NAV. s.l. : Packt Publishing, 2007. 15. Stoustrup, B. The C++ Programming Language. s.l. : Addison Wesley, 1997. 16. S. Haag, P. Baltzan, A. Phillips. Business Driven Technology. s.l. : McGraw-Hill Higher Education, 2008, page 124, section 1 and 2, page 131, figure 11.4, no. 2 and 4. 17. Troelsen, A. Pro C# 2008 and the .NET 3.5 framework. s.l. : Apress, 2007. Ch. 1. 18. [Online] http://msdn.microsoft.com/en-us/library/bb669144.aspx. 19. Studebaker, D. Programming Microsoft Dynamics NAV. s.l. : Packt Publishing, 2007, page 4-5. 20. [Online] http://en.wikipedia.org/wiki/OLAP. 21. [Online] http://www.parentenet.com/parentetech/parentetech_unique_technology.htm#sift. 88 Works Cited 22. [Online] http://msdn.microsoft.com/en-us/library/dd301468.aspx. 23. [Online] http://dinosaur.compilertools.net/#lex. 24. [Online] http://dinosaur.compilertools.net/#yacc. 25. [Online] https://mbs.microsoft.com/partnersource/downloads/releases/NDTAll. 26. [Online] http://www.scootersoftware.com/. 27. [Online] http://www-01.ibm.com/software/awdtools/developer/rose/. 28. [Online] www.magicdraw.com/. 29. [Online] http://plg.uwaterloo.ca/~migod/uml.html. 30. On the relationship between REA and SAP. O'Leary, D. E. s.l. : International Journal of Accounting Information Systems, 2004, Vol. 5. 31. The REA Accounting Model: A Generalized Framework for Accounting Systems in a Shared Data Environment. McCarthy, W. E. No. 3, s.l. : The Accounting Review, 1982, Vol. Vol. 57. 32. [Online] http://www.sap.com. 33. B. Smith, C. Rosse. The Role of Foundational Relations in the Alignment of Biomedical Ontologies. s.l. : MEDINFO, 2004, page 444-445. 34. G. Booch, J. Rumbaugh, I. Jacobson. The Unified Modeling Language User Guide. s.l. : Addison Wesley, 1999, preface, ch. 4-5. 35. —. The Unified Modeling Language User Guide. s.l. : Addison Wesley, 1999, preface, page xx, line 7. 36. Contemporary approaches and techniques for the systems analyst. Batra, D. & Satzinger. No. 17, s.l. : Journal of Information Systems Education, 2006, page 257-266. 37. Identified by J. Kiehn. 38. Fowle, M. Refactoring, Improving The Design of Existing Code. s.l. : Addison-Wesley, 2002, page 78. 39. Fowler, M. Refactoring Improving the Design of Existing Code. s.l. : Addison Wesley, 2002, page 147. 40. The REA Modeling Approach to Teaching Accounting Information Systems. McCarthy, W. E. s.l. : Accounting Education, 2003, Vol. 18. 41. —. McCarthy, W. E. No. 4, s.l. : Accounting Education, 2003, page 430, section 2, line 7, Vol. Vol. 18. 42. —. McCarthy, W. E. No. 4, s.l. : Accounting Education, 2003, page 431, figure 4, Vol. Vol. 18. 43. Accounting for Rationality: Double-Entry Bookkeeping and the Rhetoric of Economic Rationality. B. G. Carruthers, W. N. Espeland. No. 1, s.l. : The American Journal of Sociology, 1991, page 37, line 2, Vol. Vol. 97. 89 44. [Online] http://www.reallifeaccounting.com/dictionary.asp#L. 45. On the relationship between REA and SAP. O’Leary, D.l E. 5, s.l. : International Journal of Accounting Information Systems, 2004, section 5.2. 46. [Online] http://mono-project.com/. 47. [Online] http://channel8.msdn.com/Posts/MSIL-the-language-of-the-CLR-Part-1. 48. Kickey, J. Introduction to Objective Caml. s.l. : Cambridge University Press, 2008. 49. G. Cousineau, M. Mauny. The Functional Approach to Programming. s.l. : Cambridge University Press, 1998. 50. Harper, R. Programming in Standard ML. s.l. : Carnegie Mellon University, 2009. 51. [Online] http://stackoverflow.com/questions/179492/f-and-ocaml. 52. D. Syme, A. Granicz, A. Cisternino. Expert F#. s.l. : Apress, 2007, page 224-230 section on active patterns. 53. —. Expert F#. s.l. : Apress, 2007, ch. 13 on Asynchronous computation. 54. D. Syme, A. Granicz and A. Cisternino. Expert F#. s.l. : Apress, 2007, ch. 16. 55. [Online] http://dinosaur.compilertools.net/#lex. 56. [Online] http://dinosaur.compilertools.net/#yacc. 57. [Online] http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html. 58. Friedl, J. E. F. Mastering Regular Expressions, 3rd edition. s.l. : O'Reilly, 2006, section 3.2. 59. D. Jurafsky, J. H.Martin. Speech and language processing - An Introduction to Natural Language Processing, Computational Linquistics, and Speech Recognition. s.l. : Pearson Higher Education, 2008, page 42. 60. The Impact of the Lambda Calculus in logic and computer science . Barendregt, H. s.l. : The bulletin of symbolic Logic, 1997, page 14. 61. [Online] http://msdn.microsoft.com/en-us/library/bb397687.aspx and http://en.wikipedia.org/wiki/Lambda_calculus. 62. The Impact of the Lambda Calculus in logic and computer science. Barendregt, H. s.l. : The bulletin of symbolic Logic, 1997, page 194, section 3.2, line 16. 63. F. Marguerie, S. Eichert, J. Wooley. LINQ in Action. s.l. : Manning, 2008. 64. Ferracchiati, F. C. LINQ for Visual C# 2008. s.l. : Apress, 2008. 65. F. Marguerie, S. Eichert, J. Wooley. LINQ in Action . s.l. : Manning, 2008, page 5, quote 1. 90 Works Cited 66. —. LINQ in Action. s.l. : Manning, 2008, page 35-37. 67. [Online] http://w3schools.com/xml/xml_syntax.asp. 68. [Online] http://www.regular-expressions.info/, http://www.codeproject.com/KB/dotnet/regextutorial.aspx and http://msdn.microsoft.com/enus/library/hs600312.aspx . 69. [Online] http://research.microsoft.com/en-us/projects/msagl/ . 70. [Online] http://research.microsoft.com/en-us/um/people/levnach/. 71. Hvitved, T. Architectural analysis of Microsoft Dynamics NAV. s.l. : University of Copenhagen, 2008, page 45, section 3. 72. [Online] http://antlr.org. 73. [Online] http://channel9.msdn.com/pdc2008/TL16/. 74. [Online] http://channel9.msdn.com/pdc2008/TL16/. Chapter 14 14 Appendix 14.1 Content on the enclosed DVD The DVD attached to this report contains, the Concept Viewer Installer and code for our work. The individual projects are described hereunder: The root folder on the DVD contains two folders: Concept Viewer Installer: This folder contains the installer for the Concept Viewer tool. To install the program run the Concept Viewer.msi. The default installation directory is $:\Program Files\David Flenstrup\Microsoft Dynamics NAV Concept Viewer\. When the program is installed it can be run from the “Microsoft Dynamics NAV Concept Viewer” shortcut found on the Windows desktop and all programs menu. Code: This folder contains the following subfolders containing the code for this project: o Abstract Syntax Tree XML Files This folder contains the abstract syntax tree representation of the C/AL application code. The folder contains three files: Codeunits as Abstract Syntax Tree.xml XML representation of all Codeunits Tables as Abstract Syntax Tree - Not Subparsed.xml XML representation of all tables with the level of detail offered by the original CALParser Tables as Abstract Syntax Tree – Subparsed.xml XML representation of all tables after with the added details from our subparser o ASTTable: Initial analysis project used to extract properties directly from the CALParser. The project is only used for analyzing the C/AL application and does not contribute directly to the rest of our work. o ConceptViewer: The project contains the implementation of the relation viewer tool (the Concept Viewer) presenting our findings. o MatchingInLinq: This project contains the implementation algorithms for containment, aggregation, and generalization. o MSAGL: 92 Appendix o o o This folder contains the dll files for the MSAGL project. We do not have ownership of the MSAGL project and can therefore not pass on the actual code. RegularExpressionParser: This is the subparser we have developed for C/AL. It parses TableRelations and creates an XML element with the parsed TableRelation. Simple OO Parser: This folder contains the extended CALParser developed by T. Hvitved. The primary extension is the added CALToXML.fs file printing the abstract syntax tree to XML. Work From T. Hvitved: This folder contains code from a former project(1). The folder contains the CALParser we have extended into the parser Simple OO Parser and a text export of the NAV application compatible with the CALParser.