UNIVERSITY OF YORK DEPARTMENT OF COMPUTER SCIENCE Scalable persistence of EMF models Author AVRAAM – LEONIDAS DRAKOPOULOS Supervisor DR. DIMITRIS KOLOVOS SUBMITTED IN SUPPORT OF THE DEGREE OF MASTER OF SCIENCE IN INFORMATION TECHNOLOGY Academic Year 2009-2010 Word count: 17,530 as counted from MS Word. (All the body is included in the word count) Scalable persistence of EMF models Abstract In the software engineering world, the notion of modelling has been established as a good practice towards designing and documenting a solution. In the modern world though, due to the increasing complexity of software systems, new technologies were in need to cope with the new emerging demands. Model Driven Engineering is such a methodology that is introduced in order to meet the emerging software demands and provide a higher level of abstraction in the software development process. In MDE, models are used as the central artefact of the development process that in most cases are used for automatic code generation. Generally this new approach to software development has many benefits such as increased productivity and improved software quality. As almost all new technologies though, along with all the benefits, some challenges emerge too. These challenges have to be addressed in order for MDE to become fully adopted by the industry. One of the major challenges faced by MDE is this of scalability and more specifically the scalable persistence of models in the MDE context. This project will attempt to partly address the problem of scalability by proposing a relational backed database persistence solution for storing EMF based models. The project is developed in the context of Eclipse Modelling Framework, which is a framework for MDE development that is intergraded with Eclipse IDE platform. 2 Scalable persistence of EMF models Acknowledgement Throughout this project, I would firstly like to thank my supervisor Dr. Dimitiris Kolovos who has provided me with solid support, precise guidance and valuable advice which has helped me both personally and professionally. In addition I would like to thank my family for believing in me and who provided me with moral support and encouragement. Finally I would like to thank my friends for hearing patiently my frustrations. 3 Scalable persistence of EMF models Ethical Statement For this project there were no immediate ethical issues that needed to be considered. 4 Scalable persistence of EMF models Table of figures FIGURE 2.1: MDA SOFTWARE DEVELOPMENT LIFE CYCLE (ADOPTED FROM [1]) ............................... 12 FIGURE 2.2: METAMODELLING ARCHITECTURE (ADOPTED FROM [35]) ......................................... 13 FIGURE 2.3: BASIC STRUCTURE OF AN EMF MODEL (ADOPTED FORM [10]) ....................................... 16 FIGURE 2.4: EXAMPLE OF EMFATIC SYNTAX ....................................................................................... 17 FIGURE 2.5: EMF RESOURCE DIAGRAM (ADOPTED FROM [9]) ............................................................. 18 FIGURE 2.6: EPSILON ARCHITECTURE (ADOPTED FROM [31]).............................................................. 19 FIGURE 2.7: CDO ARCHITECTURE (ADOPTED FROM [15]) ................................................................... 21 FIGURE 2.8: TENEO - HIBERNATE ARCHITECTURE (ADOPTED FROM [19] ).......................................... 21 FIGURE 3.1: WATERFALL DEVELOPMENT PROCESS (ADOPTED FROM [22]) ......................................... 24 FIGURE 3.2: ITERATIVE WATERFALL DEVELOPMENT PROCESS (ADOPTED FROM [21]) ....................... 25 FIGURE 4.1: PROJECT BUILDING BLOCKS ............................................................................................. 27 FIGURE 5.1: PROJECT’S USE CASE DIAGRAM....................................................................................... 31 FIGURE 5.2: DATABASE ER DIAGRAM ................................................................................................. 33 FIGURE 5.3: INJECTION ALGORITHM FLOWCHART .............................................................................. 38 FIGURE 5.4: PROJECT'S CLASS DIAGRAM ............................................................................................ 37 FIGURE 7.1: CASE STUDY ECORE MODEL............................................................................................. 48 FIGURE 7.2: REGISTER EPACKAGES MENU .......................................................................................... 49 FIGURE 7.3: INSTANTIATED METAMODEL ............................................................................................ 49 FIGURE 7.4: REFLECTIVE EDITOR PROPERTY VIEW .............................................................................. 50 FIGURE 7.5: OBJECTS TABLE ................................................................................................................ 50 FIGURE 7.6: ATTRIBUTES TABLE .......................................................................................................... 51 FIGURE 7.7: REFERENCES TABLE ......................................................................................................... 51 FIGURE 7.8: BLACK BOX TESTING SCHEMA ......................................................................................... 52 FIGURE 7.9: UI EXTENSION .................................................................................................................. 57 FIGURE 7.10: TIME THROUGH EMF MODEL ......................................................................................... 57 FIGURE 7.11: DATABASE BOOT TIME ................................................................................................ 58 FIGURE 7.12: EMF BOOT TIME .............................................................................................................. 58 5 Scalable persistence of EMF models Table of Contents CHAPTER 1 .......................................................................................................................................... 9 INTRODUCTION................................................................................................................................. 9 1.1 PROJECT MOTIVATION ................................................................................................................... 9 1.2 OUTLINE OF THE REPORT............................................................................................................... 9 CHAPTER 2 ........................................................................................................................................ 11 LITERATURE REVIEW .................................................................................................................. 11 2.1 MODEL DRIVEN ENGINEERING ................................................................................................... 11 2.2.1 Model Driven Architecture Development Life Cycle ........................................................... 11 2.2.2 Model Driven Architecture Benefits .................................................................................... 12 2.2.3 Model Driven Engineering Challenges................................................................................ 13 2.3 METAMODELLING ARCHITECTURE ............................................................................................. 14 2.4 DEFINING THE BASIC CONCEPTS OF ECLIPSE MODELLING FRAMEWORK. .................................. 14 2.4.1 A first glance ........................................................................................................................ 15 2.4.3 What is an EMF model?....................................................................................................... 15 2.4.4 What is the basic concept of the generated code? ............................................................... 17 2.4.5 How are models persisted in EMF? ..................................................................................... 18 2.5 OBJECT PERSISTENCE API ........................................................................................................... 18 2.6 REFLECTIVE API ......................................................................................................................... 18 2.7 EPSILON ....................................................................................................................................... 19 2.8 DATABASE PERSISTENCE FOR EMF............................................................................................. 20 2.8.1 Connected Data Objects (CDO) .......................................................................................... 20 2.8.2 Teneo .................................................................................................................................... 21 2.9 CONCLUSION ............................................................................................................................... 21 CHAPTER 3 ........................................................................................................................................ 23 METHODOLOGY AND ANALYSIS ............................................................................................... 23 3.1 INTRODUCTION ............................................................................................................................ 23 3.2 WATERFALL MODEL .................................................................................................................... 23 3.3 WATERFALL MODEL ANALYSIS ................................................................................................... 24 3.3.1 Project analysis .................................................................................................................... 24 CHAPTER 4 ........................................................................................................................................ 26 REQUIREMENTS .............................................................................................................................. 26 4.1 INTRODUCTION ............................................................................................................................ 26 4.2 PROJECT OBJECTIVES .................................................................................................................. 26 4.3 PROJECT BUILDING BLOCKS ....................................................................................................... 27 3.3.1 Database Injection ............................................................................................................... 27 3.3.2 Database querying ............................................................................................................... 29 3.4 SUMMARY ................................................................................................................................... 29 CHAPTER 5 ........................................................................................................................................ 30 DESIGN ............................................................................................................................................... 30 5.1 APPROACHES CONSIDERED ......................................................................................................... 30 6 Scalable persistence of EMF models 5.1.1 CDO approach ..................................................................................................................... 30 5.1.2 Teneo approach ................................................................................................................... 30 5.1.3 Summary .............................................................................................................................. 31 5.2 USE CASE DIAGRAM ................................................................................................................... 31 5.3 GENERIC DATABASE DESIGN ...................................................................................................... 32 5.3.1 Conceptual design ................................................................................................................ 32 5.3.2 Logical Design ..................................................................................................................... 33 4.3.3 Physical design .................................................................................................................... 35 5.4 EMF INJECTION INTO THE DATABASE ......................................................................................... 37 5.5 GENERAL UML DESIGN .............................................................................................................. 39 5.4.1 Class diagram ...................................................................................................................... 39 5.4.2 Class diagram summary....................................................................................................... 39 CHAPTER 6 ........................................................................................................................................ 41 IMPLEMENTATION ........................................................................................................................ 41 6.1 INJECTING INTO THE RELATIONAL DATABASE ............................................................................ 41 6.2 QUERYING METHODS EXPLAINED ............................................................................................... 42 6.3 ECLIPSE PLUG – IN ....................................................................................................................... 45 6.3.1 Plug – in architecture .......................................................................................................... 46 6.3.2 Plug – in roles explained ..................................................................................................... 46 CHAPTER 7 ........................................................................................................................................ 48 EVALUATION ................................................................................................................................... 48 7.1 CASE STUDY ................................................................................................................................ 48 7.1.1 Constructing an EMF model ................................................................................................ 48 7.1.2 Instantiating the model......................................................................................................... 49 7.1.3 Objects injected in the database .......................................................................................... 50 7.2 TESTING....................................................................................................................................... 52 7.2.1 Black Box testing.................................................................................................................. 52 7.2.1 Database injection testing .................................................................................................... 52 7.2.2 Database querying testing ................................................................................................... 53 7.3 REQUIREMENTS EVALUATION..................................................................................................... 53 7.3.1 Database injection functional requirements evaluation ...................................................... 54 7.3.2 Database querying functional requirements evaluation ...................................................... 54 7.3.1 Database injection non functional requirements evaluation ............................................... 54 7.3.2 Database querying non functional requirements evaluation ............................................... 55 7.4 GENERAL PERFORMANCE EVALUATION ..................................................................................... 55 7.4.1 EOL Scripts explained ......................................................................................................... 56 7.4.2 Extender plug – in ................................................................................................................ 57 7.4.3 EMF run time vs. database run time .................................................................................... 57 7.4.4 EMF boot time vs. database boot time ................................................................................. 58 7.4.5 Comparison and outcomes ................................................................................................... 58 CHAPTER 8 ........................................................................................................................................ 59 CONCLUSION ................................................................................................................................... 59 8.1 PROJECT OVERVIEW .................................................................................................................... 59 8.2 PERSONAL DEVELOPMENT .......................................................................................................... 60 8.3 FUTURE WORK ............................................................................................................................. 60 7 Scalable persistence of EMF models BIBLIOGRAPHY ............................................................................................................................... 62 8 Scalable persistence of EMF models Chapter 1 Introduction 1.1 Project motivation More and more, the concept of modelling is embedded in the area of computer software development. The concept of modelling has a big range of interpretations even inside the context of software engineering. Unified Modelling Language (UML) so far has been the dominant technology that was related to modelling. Increasingly though the concept of Model Driven Engineering (MDE) is gaining momentum and popularity. Eclipse Modelling Framework (EMF) is a subproject of the open source and well known Integration Development Environment (IDE) Eclipse and provides a solid base to the use of modelling and code generation in terms of MDE. EMF is a powerful framework that is designed in such a way in order to make modelling useful and practical to the Java Developer. EMF can be considered as the bridge between modellers and programmers unifying three widely used technologies: Java, UML and XML. EMF integrates with many platforms. More specifically for the purposes of this project Epsilon will be used, which is a platform that integrates with EMF and supports task specific languages for model management. EMF is a flexible, efficient framework that encapsulates a complete and useful API, allowing the principles of MDE to be successfully applied. EMF community supports this technology and continues to add new features to the framework making it an even more appealing solution to the potential adopters. EMF technology as a cutting edge technology, even though it is extremely useful and promising, it is not without important issues that have to be addressed. One important challenge that EMF faces is the scalable persistence of EMF models. The goal of this project is to partly address the scalability issues faced by EMF through the development of a database backed solution that integrates with Epsilon platform. In this context all the technologies that were aforementioned are going to be explained in the Literature Review (Chapter 2) section of the report. 1.2 Outline of the report The project has 7 Chapters excluding the Introduction chapter. Chapter 2 Literature Review: This chapter investigates some of the literature associated with EMF and generally MDE. A focus on EMF persistence is given. Chapter 3 Methodology: This chapter describes the chosen software development process that was followed during the implementation of this project. Chapter 4 Requirements: This chapter identifies the clearly the project’s objective. In addition the requirements of the project are identified. 9 Scalable persistence of EMF models Chapter 5 Design: This chapter outlines the design of the basic building blocks of the project. ER modelling, UML and flowcharts are used. Chapter 6 Implementation: This chapter explains the code of the basic parts of the implementation. In addition important information about Eclipse plug – ins is provided. Chapter 7 Evaluation: This chapter investigates a case study that makes use of the code developed as well as some of the most important tools of EMF. Basic system cases are outlined and a discussion takes place about performance through instantiating a big EMF model. Chapter 8 Conclusion: This chapter summarizes the work and outcomes of the project. In addition approaches are proposed for the future continuation of the project. 10 Scalable persistence of EMF models Chapter 2 Literature Review The topic of interest in this dissertation is the scalable persistence of EMF models. To ease through the reader, this report will provide a solid background on the Eclipse Modelling Framework and its key features. The purpose of the chapter is to give a comprehensive view of the purpose of EMF, identify the existing literature on that matter and give a basic perspective of the benefits and challenges faced by this cutting edge technology. 2.1 Model Driven Engineering Software complexity is increasing rapidly. Developers and software engineers in their effort to keep up with this boost of complexity, often seek solutions in the Model Driven Engineering approach which provides a higher level of abstraction when compared with traditional methods of coding [32]. The notion of the MDE approach alters the primary objective of the developer. The developer focuses on modelling a solution for the problem at hand rather than developing code [32]. A reasonable question a software engineer or a developer would ask at this point is: Isn’t that what UML does? The answer to this question is that the MDE approach is differentiated from other modelling techniques because the model is used as a basic artefact to the process of code development rather than a tool of documentation as UML and other similar approaches do. The basic idea of the MDE approach is that the developer focuses on producing models that are going to be used as templates for automatic code generation which can be executed, tailored and customized [2]. Eclipse Modelling Project conforms to the principles of MDE approach and is operating under the umbrella of the widely used open source Eclipse IDE. The basis of Eclipse Modelling Project is the Eclipse Modelling Framework which provides the tools for modelling and code generation. Models inside EMF can be developed by its own tree based editor but can also support the import of UML by other famous IDE’ s like IBM Rational Rose [3]. 2.2.1 Model Driven Architecture Development Life Cycle MDA is a specific notion that is encapsulated in the context of MDE [33]. In many ways the MDA lifecycle is similar to the traditional lifecycle of software development. Though many similarities exist, there are also some key differences. The differences lie mainly to the different artefacts that are created during the development process [1]. Below in Figure 2.1 the MDA lifecycle is presented: 11 Scalable persistence of EMF models Figure 2.1: MDA software development life cycle (adopted from [1]) In Figure 2.1 an iterative process is presented which includes all the traditional steps of software development like requirement analysis, coding and testing. The Figure 2.1 seems quite familiar except from PIM and PSM components which stand for Platform Independent Model and Platform Specific Model respectively [1]. PIM and PSM are both models with the main difference that PIM has a higher level of abstraction than PSM. A PIM can be transformed into many PSMs. A PSM is focused in the platform that is implemented. The final objective is code to be generated from PIM model [1]. 2.2.2 Model Driven Architecture Benefits The above architecture encapsulates many benefits. There is a shift of focus of the developer, to designing and generating PIM models instead of coding. The development process in this way is more productive since there is an abstraction of technical details. In addition, the amount of code to be developed is less since if the transformation from a PIM to PSM is successful then in most cases the code is automatically generated [1]. In addition there are portability benefits because of PIM’s platform independency. A PIM can be deployed to any platform and then be transformed to PSM models [1]. Also there are important benefits in Maintenance and Documentation. The PIM by being a vital part of the development process is not abandoned. Once the first code generation cycle is complete, it is actively used and updated when changes take place. This way PIM has a dual role that serves both as a high level documentation model as well as an input template for automatic code generation [1]. 12 Scalable persistence of EMF models 2.2.3 Model Driven Engineering Challenges Model Driven Engineering and more specifically the Eclipse Modelling Framework starts to emerge as a very promising technology. As happens with every new technology though, new challenges emerge. This is the case too with MDE. Many challenges have arisen that have to be dealt in order for this technology to be fully adopted by the software industry [4]. Some of the most significant MDE challenges that are faced will be enumerated. Note that these are not the only challenges to be faced by MDE but the ones that are more in accordance with the topic of this dissertation will be presented [5]: Traceability: In traditional practices of programming when an error occurs in the code, the compiler or interpreter points out the exact line of the error in the application. When MDE is used to develop an application the model is the central artefact of the process. The model is used as a template for automatic code generation. If though a problem in the generated code occurs the process to correct it is not as straightforward as in the traditional practices of programming. Fixing the mistake in the code will not fix the problem in the long-term. The problem has to be traced back to the model level so that the generated code that it produces is always correct. Also if the code is derived by a set of models the traceability of a possible mistake is even harder since all the models have to be examined [4, 5, 30]. Verification and validation: In all software development approaches the verification and validation processes are essential. As such in MDE these processes are necessary too. Along with verification and validation in MDE though many more challenges are emerging in this context. A very important challenge is to find approaches to verify test and validate models as well as the code that is automatically generated by them. Also there is a huge necessity for formal verification of models. Many existing formalisms already exist like graph transformation theory, category theory, model checking and others but the challenge that underlies is to define, structure and standardize the relationships between them [4, 5]. Industrial adoption: MDE has successfully been introduced to some areas of software industry. Despite this fact many companies are still reluctant to use MDE technology. The most common reason is that a significant extra amount of training is needed in order for the MDE technology to successfully be applied in industry. Also many companies have already invested in other technologies so the transition from previous technology to MDE could become a very costly operation [4, 5]. Scalability in MDE: Often in the industry engineers have to cope with very large models of thousands of elements which take too much time to load causing a substantial overhead to the development process. There are many issues that need to be dealt that underlie the scalability of MDE [4, 5]. 1. Model transformations should not take place after a small change in the original model. 2. Also in the case of code generation the entire code should not be regenerated after a small change to my source model. 13 Scalable persistence of EMF models 3. In addition there should be a suitable framework for collaboration between the developers working on a project. A change to the model from one developer should not be in conflict with the changes of another. 4. There is a necessity for efficient manipulation of the model in parts instead of having to load in memory all at once. This dissertation will attempt to address some of the problems of scalability and especially No 4 problem. More specifically the projects objective is the development of an Eclipse IDE plug-in which primarily will allow the injection of an EMF model into a database as well as the querying of parts of the EMF model. The goal of the project will be discussed in detail in the chapters that follow. 2.3 Metamodelling Architecture The notion of modelling is very important for this project. The objective of this section is to provide a basic framework that will allow the reader to be familiar with terms like metametamodel, metamodel and model in the context of this report. Figure 2.2 describes the hierarchy of metamodeling architecture which is organized into 3 levels: M1 – M3. Each level describes the rules that the lower levels must conform to. Thus a higher level can be considered as the definition of the modelling language that its lower levels must comply to. [35]. For example in the first branch of Figure 2.2, in M2 level there is the UML metamodel. The UML metamodel complies with the modelling language as defined from the M3 level which is the Metametamodel. Now the UML M2 level could describe Classes and Relationships. The M1 UML model is like an instance of M2 metamodel and conforms to the modelling language defined by it. Figure 2.2: Metamodelling Architecture (adopted by [35]) In the following sections these terms will be used in extent. Eclipse Modelling Framework is based on the 3 level Metamodelling Architecture that was analysed [35]. In the following sections these terms that were explained will be presented again in the EMF context. 2.4 Defining the basic concepts of Eclipse Modelling Framework. This section’s purpose is to explain how an EMF model can be represented as well as answer some of the very basics questions regarding EMF’s core structure. What is an EMF model? 14 Scalable persistence of EMF models What is the basic concept of the generated code? How are models persisted in EMF? 2.4.1 A first glance EMF is the technology that glues together the modelling and the programming worlds and conforms to the principles of MDE. What really makes this technology stand out is that after an EMF model is defined, efficient, fine tuned and fully customizable code can be generated with just the clicks of some buttons. The structure of EMF models are constructed in a way that makes reuse of the developers previous experience of Java, XML and UML [6]. EMF is a modelling framework that is integrated with Eclipse IDE and glues together three very important technologies: XML, UML and Java. An EMF model can be well thought out as the representation that summarizes all the three of the above technologies. A transformation or in simpler words a change to an EMF model would also be propagated to the other technologies too [7]. A starting point to define an EMF model as said before could be either XML or UML. Defining an EMF model with UML or XML would require to sync and integrate Eclipse with other modelling tools, like for example IBM Rational Rose [8]. Vlad Varnica, the OMONDO business developer director said for EMF on 2002 [36]: “EMF represents the core subset that’s left when the non-essentials are eliminated. It represents a rock solid foundation upon which more ambitious extensions of UML and MDA can be built.” Vlad Varnica 2.4.3 What is an EMF model? All EMF models are represented by the Ecore, which is one of the major milestones of Eclipse Modelling Framework [9]. The Ecore is an EMF model itself [9]. In the context of the modelling hierarchy that was presented in section 2.3, the Ecore is the Metametamodel M3 can be considered the modelling language in order for a specific metamodel M2 to be defined. Then based on the M2 metamodel another model can be defined M1, that conforms to the M2 metamodel. In Figure 2.3 the basic structure of the Ecore is presented. In a closer look its structure is quite straightforward and resembles a lot like a UML class diagram. This fact is not surprising since the Ecore can be considered to be in a sense a subset of a UML Class diagram [9]. In order for the Ecore to be represented we need 4 basic Ecore classes [9]: EClass: Represents the modelled Class. As seen from Figure 2.3 an EClass as expected can have 0 or more EAttributes and 0 or more EReferences. EAttribute: Represents an attribute in the EClass. Each EAttribute can have one EDataType. EDataType: Represents the type of EAttribute. EReference: Represents an EClass modelled in another EClass. As stated before an EClass can have 0 or more EReferences. EStructuralFeature: It is the super Class that EReferences and EAttributes inherit from. 15 Scalable persistence of EMF models The simplified basic structure of the Ecore metametamodel as described above is [10]: Figure 2.3: Basic structure of Ecore (adopted form [10]) There are four different ways to represent and construct an EMF model that conforms to the modelling language defined by Ecore. Anyone who wants to build an EMF model can have as an input any of the four technologies cited below. XML, UML, Java interfaces and Eclipse own tree based editor The easiest way to start is of course is Eclipse’s own tree based editor which is a tool that is fairly easy to use and does not require an integration with other editors.[9] Eclipse Modelling Framework also provides a textual editor called Emfatic which can automatically be transformed to an EMF model and provides a quicker and more convenient approach for building EMF models [11]. In order to make the picture complete, a simple EMF metamodel will be provided and its equivalent syntax in Emfatic as implemented in the Eclipse IDE modelling environment. This model can be considered to be a metamodel since other models can be instantiated that conform to it. Extending this logic the Library metamodel that is presented in Figure 2.4 can be considered to be in the level M2 of the metamodelling hierarchy and the models that conform to it in the level M1.The Library metamodel conforms to the modelling language of Ecore Metametamodel. 16 Scalable persistence of EMF models Figure 2.4: Example of Emfatic syntax and equivalent EMF model In this very simple example the first EClass that is shown in Figure 2.3 is the Library. The EAtribute is the name of the Library which is of type EString. The EReferences in the EClass Library are the EClasses Writer and Book respectively which are also represented in the model. A Library object can have zero to many Writer objects as well as zero to many Book objects. The EClasses Book and Writer that are defined in the model follow exactly the same pattern as the EClass Library that was just explained. The Emfatic syntax resembles a lot like the Java syntax. With just the click of a button any change to the Emphatic syntax is propagated to the tree based Ecore editor and vice versa. 2.4.4 What is the basic concept of the generated code? This project objective is not to use the generated code that derives from an EMF model so only the very basic features of code generation will be described at this point. As discussed above the primary objective when designing EMF models in the Eclipse Modelling Environment is ultimately the automatic code generation. With just some clicks away, from the EMF model the Generator model is derived which is an EMF model too. The majority of information needed from the Generator model in order to produce the code exists in the initial EMF model. The separation of the initial model and Generator model introduces some extra complexity to the whole process of modelling and generating the code. It is necessary though in order to keep the initial EMF model independent and pure of the extra information needed for the code generation [8, 12]. In any project when we want to generate code there should be two models in our project: the initial EMF model and the Generator model with extensions .ecore and .genmodel respectively. Any changes to the .ecore model are automatically propagated to the .genmodel model in order to be always in synch. Both .ecore and .genmodel EMF models are highly dependent on each other. [8, 12] 17 Scalable persistence of EMF models An important observation at this point is that EMF uses the previous experience of the user in Java. In Java all classes inherit from the java.lang.Object. Extending the same logic all interfaces in EMF inherit from the interface EObject [13]. 2.4.5 How are models persisted in EMF? Eclipse Modelling Framework provides a built – in model persistence mechanism. The technology used is called XML Metadata Interchange (XMI). This being the default way of persistence in EMF, no additional code or effort is required to store any objects that conform to the Ecore in XMI form [9]. The EMF persistence framework though provides a powerful API that supports persistence in other forms other than XMI like for example databases. The downside of this approach though is that the serialization code has to be developed from scratch using the API provided by EMF [9]. 2.5 Object persistence API The most basic building block in the EMF persistence framework is called a resource. Any Object in order to be persisted, a resource is required. As discussed previously an EClass includes references and attributes. When we decide to save an object of an EClass in a resource then generally all the structural features, which are the attributes and the references that are included in this object are saved to the same resource by default. What if it’s required though to create more that one resource? It does not make sense to have dangling resources without some sort of grouping. For this reason EMF persistence framework introduces another interface called ResourceSet. The ResourceSet contains all the resources grouping them altogether and thus making them more accessible for manipulation [9]. Below in Figure 2.4 we can observe two resources loaded in a ResourceSet. The URI ‘s next to each resource are used as unique identifiers between resources. A resource can be loaded to a ResourceSet. Resources in a ResourceSet can also have cross references with each other as shown in Figure 2.5. Figure 2.5: EMF resource diagram (adopted from [9]) 2.6 Reflective API The base interface of EMF is EObject, which provides an API for managing model elements reflectively. The reflective API provides an alternative way to manage objects in EMF that differs from the mainstream approach which is to use the automatically generated code. A developer has the 18 Scalable persistence of EMF models choice either to use the reflective EObject API or the code generation facility provided by EMF in order to implement the desirable runtime behaviour of his system. [9] The reflective API provides accessor and mutator methods like for example eGet, eSet. A simple example is provided below: instance.eSet(…….) to set the value of an attribute String example = (String)instance.eGet(…) to read the value of an attribute Through the reflective API any resource that is loaded in a ResourceSet can be accessed. Any object of an EClass can be interrogated and in parallel access all the object’s structural features. There is the option to navigate among objects and their references, retrieve any information about their attributes and manipulate the data at will. The reflective API enables a more flexible approach regarding the manipulation of objects since there is no need to initiate the process of the automatic code generation that EMF provides. Unfortunately the trade off is that the data takes longer to be processed when using the reflective API compared to the auto generated code. 2.7 Epsilon Epsilon stands for “Extensible Platform of Integrated Languages for mOdel maNagement”. It is a platform that was developed at the University of York and supports the construction and utilization of purpose – specific languages for model management tasks such as model to model transformation, code generation and model comparison merging validation and refactoring [29]. As it can be derived from Figure 2.6, Epsilon is a platform that operates under the Eclipse project and the task specific languages that operate in this context can support model management activities for EMF models. Figure 2.6: Epsilon architecture (adopted from [31]). The task - specific languages that so far are supported by Epsilon are: 19 Scalable persistence of EMF models 1. Epsilon Object Language (EOL): It is the base language that all the others task specific languages that are described below extend to. EOL can be used as a standalone language that can provide model management properties. 2. Epsilon Validation Language (EVL): The objective of the use of this language is for validation reasons as well as for evaluation of constraints. 3. Epsilon Transformation Language (ETL): The objective of this language is to transform a number of input models to a number of output models. 4. Epsilon Comparison Language (ECL): The objective of this language is to identify similarities and matching patterns between models that are possibly derived from different modelling technologies. 5. Epsilon Merging Language (EML): The objective of this language is to merge models. 6. Epsilon Wizard Language (EWL): The objective of this language is to support the updating of models that possibly derive from different metamodels and modelling technologies. 7. Epsilon Generation Language (EGL): The objective of this language is code generation and it is build on top of EOL. The interest of this project is to integrate the persistency database backed solution for EMF models, with the interface IModel which is provided by Epsilon. Only through the implementation of the interface IModel a driver on the Epsilon platform can be developed. Through the integration with the IModel interface the task specific languages provided by Epsilon can be used. 2.8 Database persistence for EMF As aforementioned in previous sections, XML Metadata Interchange is the default technology that EMF uses in order to describe EMF models into a persistent form. Along with XMI, Teneo and Connected Data Objects (CDO) are both technologies that support a different kind of persistence. Both technologies were developed under the Eclipse community umbrella and provide a relational database persistence solution for EMF models [14, 15, 16]. XMI is the build in persistence solution for EMF models. When large EMF models need to be persisted though, XMI does not scale well. The main reason is that the entire model needs to be loaded into memory at all times even if a small part of the model needs to be interrogated. This is inefficient from the boot time and memory footprint perspective. As thus database persistency solutions like CDO and Teneo were investigated in the context of improving the scalable persistence of EMF models. These solutions could have served as the backbone technologies for this project. CDO and Teneo technologies are revisited again though on the Design part (Chapter 5) of the report where the reasons for which these technologies were not used for this project are analyzed. 2.8.1 Connected Data Objects (CDO) CDO technology is a framework that acts like a plug-in and is integrated with the Eclipse Modelling Framework.[16] CDO is a runtime environment operating on a 3 tier client server architecture as shown in Figure 2.7 that supports data persistence on the back end. 20 Scalable persistence of EMF models Figure 2.7: CDO architecture (adopted from [15]) Examples of pluggable data storage technologies that can be integrated with CDO is relational databases like MySql and Oracle as well as object databases and file systems. Using the EMF API it is possible to save your models into relational databases and thus making your application more scalable. CDO through its architecture also supports collaboration and the existence of concurrent users [17]. 2.8.2 Teneo: Teneo like CDO is a database persistence solution for EMF models into databases. The mapping between the EMF model and the relational database is created automatically. Teneo supports integration with both Hibernate and EclipseLink. Hibernate is the technology that actually provides the API that allows the EMF model to be injected into the database as well as the API for further database manipulation and querying [18,19]. Figure 2.8: Teneo - Hibernate architecture (adopted from [19]) In Figure 2.7 it is illustrated how an EMF model is automatically mapped into a Hibernate mapping and then stored into a relational database. 2.9 Conclusion So far an overview of all the technologies that were used for the project implementation was presented. The basic concepts around MDE, EMF and modelling were introduced and explained. Also the concepts of EMF reflective API and generated code were discussed. A focus was given to the EMF object persistence. Also the existent technologies that so far support database persistency solutions for EMF models were introduced. In this context the project objective was briefly discussed. In addition information was presented about Epsilon platform as well as the reasons for its integration with this project. In the next Chapters the software development methodology as well as the requirements analysis will be presented. Also the design as well as the implementation of the persistence database backed scalability solution that was developed in this project will be thoroughly analysed. Moreover a 21 Scalable persistence of EMF models detailed evaluation will be provided that will focus on the performance stats of the developed solution. 22 Scalable persistence of EMF models Chapter 3 Methodology and Analysis 3.1 Introduction The project objective is to provide a scalable and memory efficient persistence mechanism for storing - loading EMF models and integrate it with the Epsilon platform. In order for this objective to be accomplished the selection of a suitable software development process was necessary. This section includes an outline of the software development process that was selected for the development lifecycle of this project. In addition a short but to the point analysis is presented regarding the reasons behind this selection as well as the modifications made to tailor the development process according to the goals of this project. The three candidate software development processes examined were the Waterfall model, the Evolutionary model and the Component Based Software Engineering model. Out of the three processes the Waterfall model was chosen as the most suitable to be used for this project. 3.2 Waterfall model Every software process has some basic activities that are common. The most important of them is 1. Software specification 2. Software design and implementation 3. Software validation 4. Software evolution The Waterfall model is no exception including these fundamental activities. The basic characteristic of this model though is that it treats these basic functions them with a sequential approach. More specifically each lifecycle phase has to be complete before the other can begin. In Figure the phases of the Waterfall model are presented [22]: 23 Scalable persistence of EMF models Figure 3.1: Waterfall development process (adopted from [22]) Requirements and analysis definition: It is the phase where the system specifications and constraints are defined. System and software design: It is the phase where the building blocks of a system are decided as well as the relationships between them. Implementation and unit testing: It is the phase where every part of the system is implemented using a set of programs and tested separately. Integration and system testing: It is the phase where the separate parts of the system implemented in the previous phase are integrated and are tested as a single complete unit. Operation and maintenance: Further optimization of the system according to new requirements and potential improvements 3.3 Waterfall model analysis In real life it is rarely the case that a software engineering project is distinctly divided in 5 phases. These phases usually overlap with each other. In addition during the course of the development process the requirements change in a significant proportion of projects. Moreover it is rarely the case that the design and implementation phase of a project goes as planned. Taking into consideration the above observations it is evident that almost in every development process, there is a need for iterations. 3.3.1 Project analysis The requirements of this project where stable and well defined from the beginning. Each phase in the project lifecycle was clear and had to be completed before the other phase could start. Based on these facts it was decided that the Waterfall model was the suitable choice for the completion of this project. After researching though a hybrid model of the Waterfall Software process was decided to get adopted. The approach for the design phase of the project was not clear yet. Many were the candidate technologies that were investigated at the time for the backbone of the application. As thus there 24 Scalable persistence of EMF models would definitely be a need for iteration during the second phase of the development process, something that was not supported by the Waterfall model as described above. In addition it was not known in advance the programming obstacles that could arise during the implementation phase of the project. There was a need for a model that could provide an alternative plan of action and that supported iteration in case the approaches that were decided during every phase were inefficient. Figure 3.2: Iterative Waterfall development process (adopted from [21]) So the solution was to adopt an enhanced Waterfall model which would keep the lifecycle phases separated as in Figure 2.2 but at the same time support iterations during every phase of the development process [21]. This hybrid model would give additional flexibility to the project but also provide alternatives in the remote case that an obstacle could not be bypassed. 25 Scalable persistence of EMF models Chapter 4 Requirements 4.1 Introduction In this section the requirement analysis of the project will be provided. The project will be broken down into two basic milestones. For each milestone a requirements analysis will be provided. The requirements analysis will be based on the two most important requirements classifications: functional requirements and non-functional requirements [22]. The functional requirements state the services and the functionality that a system should deliver. Moreover functional requirements describe the behaviour of a system under particular situations and inputs. The non functional requirements span through many areas in a project. Non functional requirements state the constraints on the services and functions that the system should deliver. Some examples of non functional requirements measures are the speed of the system, size, reliability, portability etc. As explained in the methodology part of the project (Chapter 3) the definition of the requirements is the first phase of the development lifecycle process. 4.2 Project Objectives This project objective is the development of an Eclipse IDE plug – in that aims to improve the scalable persistence of large EMF models. As discussed in the Literature review section (Chapter 2), one of the main problems that MDE faces is that it is still unable to cope with very large models. These models have to be transformed, constructed and merged. In addition it is time consuming to load these huge models since the whole model has to be loaded in memory in order to query a part of it using XMI, which is the default persistency solution for EMF models. The strategy devised for this project is to map the information contained into EMF models into a generic relational database. Then through the querying of the generic relational database it will be possible to load only a selected part of an EMF model. Thus the costly operation of loading the entire EMF model into memory each time an interaction is required, is avoided. More specifically, Epsilon platform implements drivers through the implementation of the interface IModel which can then be integrated with EMF platform. The default technology for persistence that EMF provides, as discussed in the literature review section (Chapter 2) is XMI. The problem is that when IModel interface is implemented with XMI a big EMF model 26 Scalable persistence of EMF models in the scale of MBytes takes too long to be loaded into memory since the entire XMI file has to be loaded. Thus the goal is to implement the IModel interface using a database backed solution in order to solve the loading problem XMI faces. 4.3 Project Building Blocks EMF Metamodel instantiated objects DB module EMF Reflective API EMF Persistence API H2 DBMS Querying module EMF Reflective API Querying Interface Implementation Figure 4.1: Project building blocks Figure 4.1 illustrates the main building blocks of this project. The first part of the project injects instances of an EMF metamodel into the generic relational database using the EMF reflective and the EMF persistence API. The second part of the project queries the data that were injected into the database using the EMF reflective API. We will analyze the functional and the non – functional requirements for each of the building blocks that were identified above. 3.3.1 Database Injection In this building block, the instances that conform to an EMF metamodel are injected into a generic relational database. Each functional requirement is given an index so that it can be 27 Scalable persistence of EMF models easier identified in the Evaluation part of the project (Chapter 7). The functional requirements are presented below: Functional requirements: F.R Injection 1. The instances of the EMF model that was given as an input shall be injected into a database. F.R Complete 2. The objects, the information that hold as well as the relationships between them shall be stored in the database. F.R. Generic 3. The schema of the database shall be generic. The schema shall be compatible with any EMF model given as an input. All the information of the instances of an EMF model shall be mapped to the relational language of the database schema. F.R. Transparency 4. The location of the database shall be transparent. The user should be able to determine the location of the database which stores the EMF objects. F.R Reflection_1 5. The code shall use the EMF reflective API to inject the instances into the database. The code generation facility offered by EMF shall not be used. The reflective API usage will facilitate the user to use the plug – in since there will be no need for the actions needed in order for the code to be generated. Non – functional requirements: N.F.R Embedded 1. The DBMS used shall be embedded to the application and be as efficient as possible for the specific purpose needed. The database shall be built-in to the application in order to be easier for the user to use. N.F.R Efficiency_1 2. The Java code used for the database injection shall be as efficient as possible. N.F.R Maintainability 3. The java code shall be easy maintainable. 28 Scalable persistence of EMF models 3.3.2 Database querying In this building block of the project the instances that were previously stored in the database have to be queried and loaded into in-memory objects to facilitate the management of the information they contain. Functional requirements: F.R Querying 1. The users shall access the information in the database by loading parts of the stored model into in-memory objects. F.R Variety 2. The users shall be provided with a large set of methods that allows them to access the information stored in the database with many different ways of choice. The methods provided for database querying are more than 10 and conform to the interface IModel. The methods provided are for read – only use of the database and do not support modification. F.R Reflection_2 3. The code written for the querying of the database shall use the EMF reflective API again. This fact shall permit the user to use the querying methods provided with less effort since there will be no need for code generation which would add additional steps to the whole process. Non – functional requirements: N.F.R Efficiency_2 1. The code shall be reliable, robust and efficient. N.F.R Adaptability 2. The application shall handle potential mistakes by the user while querying the database. For example the application should return null when the user requests for an object that does not exist in the database. 3.4 Summary This chapter identifies the functional and non functional requirements of the main building blocks of the application in accordance so far with the software methodology chosen for this project. The next phase in the development lifecycle is the design phase. 29 Scalable persistence of EMF models Chapter 5 Design As presented in the requirements section the project consists of two main building blocks. Each building block design will be presented separately. 5.1 Approaches considered As discussed in the literature review section (Chapter 2) before the design of the solution could start we had to examine all existing technologies that could provide the backbone for the Eclipse IDE plug – in. Three were the approaches that were considered at that point. CDO technology, Teneo technology or design from scratch a database schema that could store the information contained in EMF objects. 5.1.1 CDO approach CDO seemed as the technology that could provide the mapping between EMF and a relational database for the plug - in. With further research though the following conclusions were made: 1. CDO technology was mostly oriented for Client – Server architecture with also support for changes made from concurrent users. These features are of course useful but out of focus of the objective of this dissertation. There was no need for adding more complexity and computational overheads. 2. Judging also from hands on experiments, a conclusion was reached that CDO is under documented and it was not clear how to connect a relational database on the back end. This fact would also add time overhead at the implementation of the querying module of the project [20]. Based on these facts the CDO approach was not chosen and Teneo became the next candidate technology for consideration. 5.1.2 Teneo approach Teneo is a database persistency solution for EMF. Applying this technology to our project could provide an automatic database mapping between EMF and a relational database. Also due to its integration with Hibernate technology a suitable API would be available for the querying part of the project. After though implementing some applications with Teneo the following conclusions were reached: 1) The database schema that is created is highly dependent on the model itself. This feature of Teneo significantly deviates from the requirements of this project. Our goal is to create a generic mapping of the Ecore model to a database and not a specific mapping for each Ecore model. 2) The mapping of the EMF model in the database was successful but there was not support for loading only a part of the ECore model to the memory. This fact made this solution very inefficient since if the ECore model was very large, the time to load it from the database to the memory would be very long. This fact was not in accordance with the requirements of the project. 30 Scalable persistence of EMF models 3) The saving of the model to the database was not transparent. The location of the database that contained the EMF models was not clear to the user. This fact again was not in harmony with the requirements set for this project. 5.1.3 Summary CDO and Teneo initially appeared as promising solutions to use as the backbones for the implementation of the plug – in, since they provided the mapping of an EMF model to a database. On a closer look though both technologies lacked many design features that were described in the requirements of this application. In addition both technologies added unnecessary complexity to the project. Based on the above facts the third alternative approach was selected for the implementation of the application. First a generic relational database would be designed from scratch that would provide the persistency solution. Second, the code for the mapping between EMF and database had to be implemented. 5.2 Use Case Diagram Below a use case diagram is presented that analyzes the two basic axis of the project. Also a goal based description will be provided for each use case. Figure 5.1: Project’s Use Case diagram Use Case: Inject model into database Goals - User must be able to create a database. - User must be able to specify the location of the database - User can persist any number of models in the database Pre-Condition - The models must be loaded to a resource before they can be persisted in the database. 31 Scalable persistence of EMF models Use Case: Query the database - User can use the query interface in order load data for the database Goals - User must be able to load parts of the model into memory. Pre-Condition - The models must be saved into a database before they can be queried. 5.3 Generic Database design As discussed previously a relational database design that can hold EMF objects is a very important part of this project. In this section the steps of the design of the database schema will be presented. The design process will be divided into three phases. The Conceptual, the Logical and the Physical design. 5.3.1 Conceptual design In the Conceptual design phase the primary objective is to construct an Entity Relationship diagram by identifying [28]: 1. The data objects or entities, 2. The relationships between the objects, 3. The constraints and the rules that govern the operations on the objects. The ER diagram will be constructed in three steps: 1st step: Building an entity – attribute table. The entities that were identified for the database design for this project are in accordance with the structure of the ECore model. Every entity that is identified has also specific attributes that describes it. Below we will provide an entity – attribute table. Entities Attributes Object objectID, eClassName Attribute objectID, attributeName, attributeValue Reference objectID, referenceName, valueID Table 5.1: Entity-attribute table As it is evident from the table 4.1 the entities of the ER diagram are the Object, the Attribute and the Reference. These entities represent the building blocks of an Ecore Model which are the EClass, EAttribute and EReference as discussed in the Literature review section. 2nd step: Building an entity-relationship table. 32 Scalable persistence of EMF models In this step of the Conceptual design the relationships and the multiplicities will be defined and grouped together in a table. Entities Relationship Multiplicity Object, Attribute contains 1..1 : 0..* Object, Reference Consists of 1..1 : 0..* Table 5.2: Entity Relationship table An object can contain zero or more Attributes and can consist of 0 or more References. A Reference or an Attribute though cannot exist without a base Object. 3rd step: Building the Entity – Relationship diagram. The ER diagram will be based on the Entity – Attribute table and Entity – Relationship table. To represent the ER diagram a hybrid representation will be used that includes UML schematics. Figure 5.2: Database ER diagram The next step is to translate the Entity Relationship diagram into a set of tables. 5.3.2 Logical Design In this phase of the database design the conceptual design will be transformed into a set of tables that can support the operations needed as defined in the requirements section. Database Definition Language (DBDL) will be used for the tables description which is a simple way of representing a table by means of its name, columns and primary key. For example “Moons (ID, Moon, Planet)” is describing a table that is called Moons and has three columns ID, Moon and Planet. ID is the primary key of the table [28]. It is a three step process of transforming a conceptual database design into a logical database design. 33 Scalable persistence of EMF models 1st step: map the ER entities into tables. Each ER entity that was identified in the figure 4.1 will be mapped to a table. Objects (objectID, eclassName) Attributes (objectID, attributeName, attributeValue) References (objectID, referenceName, valueID) Listing 5.1 2nd step: mapping the relationships between the entities. Normally the relationships are mapped between entities introducing the notion of the foreign key. This particular database design though implements the relationships between the entities in a more abstract level. As defined in the ER model the Object Entity contains attributes and consists of references. Through the objectID field which is unique the connection between Objects, Attributes and References is implemented. Objects (objectID, eclassName) Listing 5.2 The objectID is identified as the primary key in the table Objects. To each different EClass in the Ecore model a unique id is assigned. For the tables Attributes and References there is no point in identifying primary keys since all the fields of the tables are required in order to distinguish one record from another. 3rd step: determine the fields’ data types. For each table that was identified the data types of its fields will be identified. Table Objects Column Data type ObjectID INT eClassName CLOB Table 5.3: Table Objects Table Attributes Column Data type 34 Scalable persistence of EMF models ObjectID INT attributeName CLOB attributeValue CLOB Table 5.4: Table Attributes As shown in Table 4.3 the attributeValue field type is CLOB which is a data type used to represent large data types. An attribute of course can be of any type like INT or BOOLEAN. CLOB data type is used to represent the attribute value in the database. When though this field is loaded into memory it is then casted to the correct data type as described in the Ecore model. Table References Column Data type ObjectID INT referenceName CLOB valueID INT Table 5.5: Table References 4.3.3 Physical design In order to realize and implement the database design there was a need for a suitable selection of a Database Management System (DBMS). The selected DBMS is H2 Database [34]. In this section the reasons for the selection of H2 will be provided, the code to connect to H2 Database as well as the code to create the tables that were described above. 4.3.3.1 H2 Database A very important part of the design for this application is the selection of the DBMS. The objective was to find a DBMS that was as efficient as possible when operating in embedded mode since the application was going to be applied on very large EMF models and thus the speed of the database was crucial. After researching and comparing different DBMS the final decision was that H2 Database would be used. Below a matrix is presented comparing stats between different DBMS with H2 when operating in embedded mode. H2 is faster and has lower memory footprint than other DBMS [23]. 35 Scalable persistence of EMF models Table 5.6: H2 Performance comparison (adopted from [23]) In addition H2 Database is very easy to ease in embedded mode. The only thing required in order for the Java code to be connected with the database is the connector JAR file. Other DBMS like MySql require more information in order for the connection to be established like the port number that MySql listens at the user’s computer. Additional information like the above would probably add unnecessary complexity to the application. For all the aforementioned reasons H2 Database was selected. 5.3.3.2 Database creation – connection code In this section the code developed for the connection and the creation of the database schema that was described above, will be presented. Listing 5.1: Open connection to H2 DB method private void openConnection(String dbNamePath){ try { //load software driver Class.forName("org.h2.Driver"); //open connection con = DriverManager.getConnection("jdbc:h2:" + dbNamePath, "sa", ""); } catch (Exception e) { e.printStackTrace(); } } The openConnection(String):void method connects the application with the H2 Database. For the creation of the tables there is a general method called for all three tables Listing 5.2: Create table method private void createTable(String dbNamePath, String sqlDrop, String sqlCreate) { //open connection if(con == null){ openConnection(dbNamePath); } try { //instantiate statement 36 Scalable persistence of EMF models stmt = con.createStatement(); //try to execute sql stmt.executeUpdate(sqlDrop); //try to execute sql stmt.executeUpdate(sqlCreate); stmt.close(); } catch(SQLException s) { JOptionPane.showMessageDialog(null, s.getMessage(), "CREATE Error", JOptionPane.ERROR_MESSAGE); } catch(Exception e){ //inform user of error JOptionPane.showMessageDialog(null, "General error"); } finally { //close the connection closeConnection(); } } The create table method executes two Sql strings every time it is called. For every table two Sql strings are executed. The first drops the table if it exists already and the other creates the table. Below the Sql strings for the creation of table Objects are presented: Listing 5.3: SQL Queries "DROP TABLE IF EXISTS Objects"; "CREATE TABLE Objects (objectID INTEGER PRIMARY KEY AUTO_INCREMENT, eClassName CLOB)"; 5.4 EMF Injection into the database After the design of the relational database in order for the first milestone of the project to be completed an algorithm had to be designed and implemented in order for the EMF models to be injected in the database. There is no real need to provide at this point a UML diagram since this algorithm does not involve interactions with other classes. A flow chart will illustrate better the logic that underlies the design of EMF injection algorithm. 37 Scalable persistence of EMF models Start Initialize Iterator Has the resource more Eclass objects? Put the Eclass objects in the HashMap mapping them with their id Yes No Initialize Iterator No Has the resource more Eclass objects? End No Yes Inject the id and Eclass name into table Objects Has the current Eclass object more EAttributes? No Has the current Eclass object more EReferences? Yes Yes Inject the EAttribute name, EAttribute value and Eclass id into the Attributes table Get the Ereference object id from the HashMap Inject the current Eclass object id, EReference Name and Ereference object id into the References table. Figure 5.3: Injection Algorithm Flowchart The Figure 5.3 explains the logic of the algorithm that was going to be implemented in order to inject the EMF models to the generic database. An important part of the algorithm is the HashMap object creation in the beginning. A HashMap object can be imagined as a table with two columns and variable number of rows. This specific HashMap that was designed in the first column contains the Eclass object and in the second column the EClass’s equivalent id. The HashMap creation has to be the first step in the algorithm in order to be used at the injection of data in the References table. It is required to create a general map with objects ids since any object can be a reference to another object. 38 Scalable persistence of EMF models 5.5 General UML design In this section we will analyze the classes of the application design using UML. A Class diagram will be provided that describes both of the building blocks of the application. 5.4.1 Class diagram The design of the application is based on three classes: LiveObjectModelBuilder, LiveObjectModel and LiveObject. The Class LiveObjectModelBuilder is used to for the first building block of the application and the LiveObjectModel as well as LiveObject classes are used for the second building block of the application. LiveObject Class is an entity class. Below is the Class diagram of the application design. Figure 5.4: Project’s Class diagram 5.4.2 Class diagram summary The general idea of the design is that an EMF model using the LiveObjectModelBuilder class is injected into a database. Any LiveObjectModel must directly be associated with a database that was created by the class LiveObjectModelBuilder since the data manipulated by the LiveObjectModel must be coming from the database where the model is stored. LiveObjectModel implements the interface IModel and in this way the application is integrated with the Epsilon platform. LiveObject is an entity Class. When the user calls the methods provided by LiveObjectModel to query the EMF models stored in the database, the information has to be stored in memory in a specific form. 39 Scalable persistence of EMF models The information taken from the database through LiveObjectModel is stored in memory in LiveObject objects. Each LiveObject object has three basic fields that represent each instantiated EClass. The id of the instantiated EClass object as described in the database, the EClass name as well as the LiveObjectModel that is derived from. The last field is very important since many LiveObjectModel objects can operate on a single database. The basic concept that underlies this design is to fetch data on – demand. The data that need to be interrogated is either loaded into LiveObject objects or other suitable objects depending on the data type of the data requested. This way only the part of the model that was requested needs to be in memory. 40 Scalable persistence of EMF models Chapter 6 Implementation The purpose of this section is to describe and explain very important pieces of code that implement the basic parts of the design as discussed in Chapter 5. In addition basic points will be discussed about the plug – in architecture which was used for the implementation of this plug-in. 6.1 Injecting into the relational database In this section it is explained how a model is inserted into the H2 Database. It has been mentioned in the Literature Review section (Chapter 2) that in order for the model to be able to be persisted, first it has to be put into a Resource which is contained in a ResourceSet. The code that implements this functionality is presented in Listings 6.1, 6.2: Listing 6.1: Resister metamodel public EPackage registerMetamodel() throws Exception { String ecore = base + "library.ecore"; ResourceSet resourceSet = new ResourceSetImpl(); resourceSet.getResourceFactoryRegistry(). getExtensionToFactoryMap().put("*", new XMIResourceFactoryImpl()); Resource resource = resourceSet.createResource(URI.createFileURI(ecore)); resource.load(null); return (EPackage) resource.getContents().get(0); } Listing 6.2: Load model to resource public Resource loadModel() throws Exception { String model = base + "myLibrary.model"; ResourceSet resourceSet = new ResourceSetImpl(); resourceSet.getResourceFactoryRegistry().getExtensionToFactoryM ap().put("*", new XMIResourceFactoryImpl()); EPackage metamodel = registerMetamodel(); resourceSet.getPackageRegistry().put(metamodel.getNsURI(), metamodel); Resource resource = resourceSet.createResource(URI.createFileURI(model)); resource.load(null); return resource; } 41 Scalable persistence of EMF models To get into the full detail of this code is out of focus. The main concept though is that a ResourceSet is created in which a resource is loaded. The unique identifier in order to distinguish resources between themselves in a ResourceSet is the URI. As seen above, the loadModel() method returns a specific resource. Based on Figure 5.4 which provides a class diagram now all there is to be done in order to load the model in the database is call two methods from the LiveObjectModelBuilder Class. First the database must be created calling the createDatabase() method and then a resource can be added calling the addResource(Resource) method. The functionality of adding the information contained in a resource to the database was explained through a flowchart in the Design phase so the code will not be presented. 6.2 Querying methods explained In this section the basic methods used to query parts of the database will be explained. The methods that were implemented conform to the interface IModel which as stated in the requirements section (Chapter 4) is the interface that has to be implemented in order to implement a driver through Epsilon. When the IModel method that is implemented is complicated then code will be provided to support the descriptive explanation. Methods: allContents():Collection: The objective of this method is to store all the information in the Objects table into LiveObject objects. As discussed previously the Objects table stores all the names of the EClasses of a model. The number of the LiveObjects created will be as much as the number of the EClasses of the model. As expected this method will return a Collection of LiveObjects objects. getAllOfType(String):List<LiveObject>: The objective of this method is to return a list of LiveObjects of a specific type. With simple SQL strings the Objects table is queried and the information is stored in a List of LiveObject objects. getAllOfKind(String):List<LiveObject>: The objective of this method is to return all the objects that inherit from a specific type of object. More specifically the specification of this method is to return all the types of objects that are of the same kind of a specific EClass. All the children, including the children of children and so on, are considered to be of the same kind as the root of an inheritance tree. The root of this tree in this situation is an EClass that is specified in the String parameter of the method. This method is more complex than the methods presented above so the code will be presented and explained. Listing 6.3: Method to find subtypes protected List<EClass> getAllSubTypes(String clazz) { //Stores the sub Types of a particular class ArrayList<EClass> allSubTypes = new ArrayList<EClass>(); //Stores all the classes of a package ArrayList<EClass> allClasses = new ArrayList<EClass>(); allClasses = loadAllEClasses(); EClass eClass = classForName(clazz); 42 Scalable persistence of EMF models EClass current = eClass; //adds the Class itself in the subTypes ArrayList allSubTypes.add(eClass); int j = 0; while(j < allSubTypes.size()){ for(int i = 0; i < allClasses.size(); i++){ //checks if super type and also not to be the same class if(current.isSuperTypeOf(allClasses.get(i)) && current != allClasses.get(i)){ allSubTypes.add(allClasses.get(i)); } } j = j + 1; if(j < allSubTypes.size()){ current = allSubTypes.get(j); } } return allSubTypes; } Listing 6.4: Method to get all types of subtypes public List<LiveObject> getAllOfKind(String type) { List<LiveObject> x = new ArrayList<LiveObject>(); ArrayList<EClass> allSubTypes = new ArrayList<EClass>(); allSubTypes = (ArrayList<EClass>) getAllSubTypes(type); for (int i = 0; i < allSubTypes.size(); i++){ x.addAll(getAllOfType(allSubTypes.get(i).getName())); } return x; } The first method that is presented above “getAllSubTypes()” (Listing 6.3), is to get all the sub types of an EClass. This method returns a List of EClasses that contains the entire inheritance tree when considering as a root the EClass specified as a parameter. For that to be achieved all EClasses of an EPackage were loaded into another List first. The algorithm contains two nested loops and also a basic if statement. The basic logic of the “getAllSubTypes()” method is to examine the number of children for a current EClass every time sequentially by investigating all the EClasses of the EPackage. Every time a child is identified it is added in a specific Collection. This process is repeated for every child that is identified each time. The base EClass in order for this procedure to begin is given as a parameter. 43 Scalable persistence of EMF models The challenging part of this algorithm was that no method was provided by the EMF API that identified the entire children tree of an EClass. The EMF API method that was used is the “isSuperTypeOf(EClass):Boolean” which only identifies if an EClass is an immediate parent of another EClass. After identifying all the sub classes of an EClass, then the procedure to return all the types of each sub type is easy (Listing 6.4). For each EClass that is a sub type the “getAllOfType(String): List<LiveObject>” method is called, getting all the types of objects that are of the same kind of an EClass that is specified as a parameter. knowsAboutProperty(Object, String):Boolean: This method can be considered as a representative of many other methods of the interface IModel which examine an aspect of the model and return a Boolean value. This method takes as a parameter a LiveObject object and checks if there is a Reference or an Attribute that belongs to an object of an EClass that is described by the LiveObject object. The logic that underlies this method is to take the object of the EClass that is represented by the LiveObject object and check if there is any EStructuralFeature that is described by the String type at the parameters of the method. get(String):Object: This method as seen from the Class diagram in the design section, is in the LiveObject Class. Its objective is to get a structural feature that is either a reference or an attribute and load it to a suitable object. This object can be a Collection of objects as well. The EStructuralFeature is described by the String parameter of the method. Because the method is complicated, some parts of the code will be presented and explained. First it is examined if the String parameter is actually an EStructuralFeature. If it is not, a null value is returned. If it is, then it is either an EReference or an EAttribute. So for an EAttribute the code is (Listing 6.5): Listing 6.5: Part of method to get attribute values if (sf instanceof EAttribute) { if (sf.isMany()) { List<Object> castedValues = newArrayList<Object>(); Collection<String> values = model.getAttributeValues(id, property); for (String value : values) { castedValues.add(cast(value, sf.getEType())); } return castedValues; }else { List<String> values = model.getAttributeValues(id, property); if (values.size() > 0) { return cast(values.get(0), sf.getEType()); }else { return sf.getDefaultValue(); } } 44 Scalable persistence of EMF models } The castedValues List is used because the EAttribute values should not be loaded in the String form that are saved into the database. The EAttribute values should be loaded in the type that is specified in the model. The EAttributes values are taken from the database and loaded into Collection objects. If the EAttribute is “not many”, meaning that only one value corresponds to the EAtrribute, then the first element of the collection is returned. If though the List that holds the Attributes values is empty then it means that the Attribute has a default value that is the same for all the objects of the same type and so the default value is returned. A similar logic is applied when the EStructuralFeature is an EReference. The first else in the code section implies that the EStructuralFeature is an EReference. Listing 6.6: Part of method to get reference values else { if(sf.isMany()){ Collection<LiveObject> values = model.getLiveObject(id, property); return values; }else{ Collection<LiveObject> values = model.getLiveObject(id, property); if(values.size()>0){ LiveObject x = model.getLiveObject(id, property).get(0); return x; }else{ return sf.getDefaultValue(); } } The basic logic that underlies the algorithm is presented below The objective of this part of the program is to get the LiveObject objects that correspond to a specific object id and to a specific reference name. The reference name is given to the method as a String parameter. In order for this objective to be accomplished the algorithm has to visit two tables: the References table and the Objects table. In order for the procedure to be more efficient an SQL query was used in order to join the two tables and extract the desired information more efficiently instead of visiting separately the two tables. 6.3 Eclipse plug – in As stated in the Requirements section (Chapter 4) the objective of this project is to develop a plug – in which will improve the scalable persistence of EMF models. In this section the architecture of a plug – in component in Eclipse will be discussed briefly. In addition the role of the code developed will be explained in this context. 45 Scalable persistence of EMF models 6.3.1 Plug – in architecture The basic unit or component that provides functionality in Eclipse is called a plug-in. While there is a running instance of Eclipse, a plug – in is attached to a plug – in Class which is responsible for managing and configuring the plug – in instance. Every plug – in resides in its own folder. In the folder there is the manifest file which gives the Eclipse runtime the necessary information to activate the plug – in. Each time Eclipse is running there are some core plug –ins that are automatically activated by providing some default plug – in classes and functionality. Other plug – ins though, noncore, are activated only when needed by other plug - ins. There are two kinds of relationship that describe this situation: 1. Dependency: Some plug – ins in order to run need some other plug – ins as prerequisites 2. Extension: The procedure of adding elements to a plug – in is called an extension. The manifest file includes the information that describe the kind of relationship that a plug – in is involved. Dependency: There are two ways to make use of a dependency plug – in. It can be used at runtime, but it must be made available to the dependant plug – in or at compile time where Eclipse’s classpath has to be augmented including the dependency plug – in [24]. Extension: In this situation there exists the host plug – in and the extender plug – in. The second extends the functions of the first. As stated before an extension plug – in can extend the behaviour of the host plug – in. Any plug – in component in Eclipse can allow its extension in many different ways. A simple example would be that a workbench UI allows its menus to be augmented. There many different of extensions allowed by a plug – in. All these extensions are attached to the host plug - in from specific slots that are called extension points [24]. Between the relationship of a host plug – in and an extender plug – in, a callback object can play a significant role in the communication of the two. A callback object is implemented through plain Java Objects and can add additional functionality to an extender plug – in, which is attached to an extension point on a host plug – in. At this point a basic context for the plug – in architecture was provided. Of course there are a lot more to explain around Eclipse plug – ins but an extensive analysis would be out of scope for this project. 6.3.2 Plug – in roles explained Now that a better framework was provided around Eclipse plug – ins the building blocks of the application regarding the implementation can fit into place. Dependencies: 46 Scalable persistence of EMF models The H2 DBMS plays an important role in this project. Each time it is needed though to connect to the DBMS, the classpath of Eclipse has to be augmented since the H2 .jar connector file is necessary. In other words H2 connector file was a prerequisite for the code of this project to work. Eclipse provides the tools to convert a file into a plug – in dependency. In this case the H2 connector file was converted into a dependency plug – in and added to the project’s workbench. By doing that potential users of the plug – in developed will not have to add the H2 connector file manually. Extensions: For the extension part, the Eclipse Modelling Framework can be imagined as the host plug – in. The code described in the Design (Chapter 5) and Implementation part (Chapter 6) of the project can be considered to be the callback object. So what was needed to implement apart from the callback object would be the extender plug – in so that the callback object could operate in this context. To handle the Extension and Extension points requires experience and deep knowledge of Eclipse’s architecture. The steps to develop the extender plug – in so that the callback object could operate were implemented by Dr. Dimitris Kolovos who is the supervisor of this degree thesis. The actions above that included the creation of a dependency plug – in, the implementation of a callback object as well as the extender plug – in allowed the creation of an operational Eclipse extension plug – in. 47 Scalable persistence of EMF models Chapter 7 Evaluation In this section a case study will be presented as well as test cases. In addition the requirements evaluation will be presented as well as some improvements evaluation. 7.1 Case study At this point the EMF model will be kept simple and small in order for the case study to be easy for comprehending and due to practical space purposes. The fact that a relatively small EMF model is presented does not mean that the application can be applied only to simple EMF models. The EMF model that is going to be presented in the case study is structured in a way so that the functionality of the application can be examined in full extent. 7.1.1 Constructing an EMF model The EMF model that our case study will be based is presented below. It is an extended model from the one that was presented in the Literature Review section (Chapter 2). This model includes also inheritance relationships and more classes. The extLibrary.ecore file is presented in Figure 7.1. Figure 7.1: Case study Ecore model The extended library metamodel consists of six EClasses. Each EClass may contain EAttributes, EReferences or both. An inheritance relationship in this model is for example between the EClasses Writer and Person. EClass Writer inherits from the EClass Person. 48 Scalable persistence of EMF models It is out of scope at this point to present the equivalent Emfatic code that was implemented in order to generate this model. The thing to remember is that any change to the model with the clicks of some buttons can automatically be propagated to the Emfatic code and vice versa. Before going to the database injection part there is the necessity to stress again that it is not the Ecore model itself that it is injected to the database. The database stores a model with object instances that conform to the extLibrary.ecore metamodel. The extLibrary.ecore metamodel can be imagined as the blueprint of the objects that are going to be instantiated based on it. Having that in mind the next step is to create objects that conform to the “extlibrary” metamodel of Figure 7.1. 7.1.2 Instantiating the model As defined in requirement section (Chapter 4) the models should be instantiated without the help of the automatic code generation facility EMF provides. The objects should be instantiated reflectively. In order to achieve that, the built-in EMF reflective editor will be used. The way to achieve that is really straightforward and does not need additional plug-ins in order to edit the model. After creating an ECore model that the objects need to conform to it, all is needed is to let know the EMF of the new ECore created. To update what EMF knows, the newly created model has to be registered by right – clicking and selecting register EPackages like in Figure 7.2. Figure 7.2: Register EPackages menu Now it is easy to create models that conform to the newly created EMF metamodel. The model that is created using the reflective editor [25] and is presented in the Figure 7.3 Figure 7.3: Instantiated metamodel This model in Figure 7.3 conforms to the EMF metamodel in Figure 7.1. What is the information that is included in this model though? As described in the model in Figure 7.3 there is the root object of type Library that is called “myExtLib” and has many objects as references: 1. Writer object called Dan 49 Scalable persistence of EMF models 2. Writer object called Jack. 3. Book object called Code Da Vinci 4. Enc object called Britanika 5. Employee object called John Each of these objects has several fields that need to be instantiated. In the model though as seen in Figure 7.3 not all the information that is encapsulated in each object is visible. In order to instantiate and also to view all the information of an object EMF provides a property view. Below in Figure 7.4 the property view of object 1 is presented: Figure 7.4: Reflective editor property view Figure 7.4 demonstrates all the information encapsulated in the Writer object with First Name: “Dan”, Last Name: “Brown” and also a reference to an object Book with Title “Code Da Davinci”. Similarly all the other objects in Figure 7.3 have been instantiated using the build – in reflective editor and the EMF property view. There is also the option to instantiate the objects programmatically but since the instantiated model was small, using the built – in reflective editor is quite convenient. The next step is to inject the model described in Figure 7.3 in the database. 7.1.3 Objects injected in the database In this section the case study will be continued. The model that contains instantiated objects in Figure 7.3 is injected into the database. The tables that were created during that process are presented below in Figure 7.5, 7.6 and 7.7. The information in each table will be analyzed separately. Figure 7.5: Objects table 50 Scalable persistence of EMF models Figure 7.6: Attributes table Figure 7.7: References table Objects table Each EClass name is inserted in the Objects table and a unique id is assigned to each object. Despite the fact that we have two Writer objects, each of them is assigned a different id. The Objects table contains six objects which is the right number since we had six instantiated objects in Figure 6.3.Continuing this logic it is obvious that it is of no importance the size of the model. The table will continue growing assigning each new object a unique id number. Attributes table In the Attributes table each attribute of an object is mapped with its id and its value. In this way all the information stored in attributes, it is associated with the suitable object id. In addition there is enough information in order to determine which attribute belongs to which object. In this case also if the model was of very large size, it would not affect at all the functionality of this table. For example if the information of the second row of the table is examined the following conclusions would be drawn: The object with id = 2 has an attribute which name is firstName and has the value Dan. 51 Scalable persistence of EMF models References table The same logic is followed also in the references table. To analyze the functionality of this table the first row of the table will be explained. The information illustrated is that in the object with id = 1 there is a reference that is called writers and is of type object with id = 2. This means that the reference is of type Writer because in the objects table this is the eClassName with id 2. Respectively in the second row, the object with id 1 has a reference of type object with id = 3 which is of type Writers again. 7.2 Testing In this section the testing analysis will take place through black box testing the two building blocks of the project. Based on the results of the testing the functional requirement analysis will be presented in the nest section. The case study presented above will be used for the testing process. 7.2.1 Black Box testing Black box testing is the most suitable process to use at this point since it focuses on the system functional requirements. The idea of black box testing is to test a system without knowing the internal structure. An input is given to the system and the output that the system gives is compared with the expected output as in Figure 7.8. System input Black Box System output Figure 7.8: Black Box testing schema Each test case will have a purpose, specific inputs and comparison between the outputs and expected outputs [26]. 7.2.1 Database injection testing The evaluation of this building block of the project will be based on the case study that was presented in the 7.1 section of the report. Purpose of the test: Test the database module functionality Input values Expected / Actual Results Pass/Fail Input model in Figure 6.3 Expected results: Pass. The tables in Figure 6.5 Destination path for the database file New db file in the projects folder. Actual Results as expected The database module of the application passed the black box test. All the information of the model was successfully injected in the database. Also a database file was created in the project’s folder which contains all the model data. 52 Scalable persistence of EMF models 7.2.2 Database querying testing The tests will be based again on the case study. The basic methods of the IModel interface that were implemented will be tested. Purpose of the test: Test the querying module functionality Method Input values Expected / Actual Results getAllOfKind(String): “Book” Expected results: List<LiveObject> The List to have two Objects, one Object Book and one Object Enc. getAllOfType(String): List<LiveObject> allContents():Collection<?> getElementById(String): Object Actual Results as expected Expected results: The List to have two Writer Objects. “Writer” The LiveObject Model that corresponds to the Case study. Actual Results as expected Expected results: A Collection with all the Objects in the Objects table in Figure 6.5. Actual Results as expected. Expected results: A LiveObject with id=4 and className=”Book” “4” getElementId(Object): String EClass Library isOfKind(Object, String): Boolean The LiveObject with className = Enc and the String “Book”. knowsAboutProperty(Object, String): Boolean The LiveObject with className = Writer and String value firstName Pass/Fail Pass. Pass. Pass. Pass. Actual Results as expected. Expected results: A String with value=“1” Pass. Actual Results as expected. Expected results: A Boolean value true. Pass. Actual Results as expected. Expected results: A Boolean value true. Pass. Actual Results as expected. The querying module of the project passed the black box test. Many more methods were implemented in this module but an exhaustive testing of every method would be out of scope for this project. 7.3 Requirements Evaluation The functional requirements evaluation will take place separately for each building block of the project. The evaluation will be based on information that was presented so far as well as the black box testing of the previous section, which is a process that focuses on the functional requirements of a system. 53 Scalable persistence of EMF models 7.3.1 Database injection functional requirements evaluation Each of the functional requirements is tagged with a special index (Section 4.3) so that it is easier to refer to. Functional requirements: F.R Injection: It is satisfied since through the case study it is obvious that the EMF model is successfully injected into the database. F.R Complete: It is satisfied because through the description and explanation of each table in the Case Study it is evident that all the information represented in the model is successfully mapped into the database. F.R. Generic: It is satisfied since the database design is operational for any EMF model. The database design is not dependant on the structure of the EMF model. F.R. Transparency: It is satisfied since the user through the LiveObjectModelBuilder Class has the option to choose the location of the created database. In addition a separate transparent file of the database is created to the location specified by the user. F.R Reflection 1: It is satisfied since the EMF automatic code generation facility is not used to manipulate the objects. 7.3.2 Database querying functional requirements evaluation Each of the functional requirements is tagged with a special index (Section 4.3) so that it is easier to refer to. Functional requirements: F.R Querying: It is satisfied since as explained in the implementation part of the project parts of the model can be loaded into LiveObject objects. Also there is the option to load to memory specific EAttributes or EReferences. In this way selective parts of the model can be loaded into memory. F.R Variety: It is satisfied since as explained in the implementation part (Chapter 6) of the project as well as at the black box testing part a large set of methods have been implemented that allows the user to query the data in the database in many different ways of choice. F.R Reflection 2: It is satisfied since the EMF automatic code generation facility was not used in order to implement the querying methods. 7.3.1 Database injection non functional requirements evaluation Non – functional requirements: The non – functional requirements are satisfied too. N.F.R Embedded: The H2 DBMS system was used in embedded mode. The .jar file needed for the database connection was transformed into an Eclipse plug – in. This way the user can use it as a dependency plug – in to the plug – in developed in this project. 54 Scalable persistence of EMF models N.F.R Efficiency_1: In addition an effort was made to make the Java code efficient in order for the injection of the model to happen as fast as possible. The time consuming activity of opening a connection to the database was handled with caution. The connections to the database were implemented only when needed. N.F.R Maintainability: The Java code developed is well documented through UML diagrams in order to be maintainable. 7.3.2 Database querying non functional requirements evaluation In this section using the tools that Eclipse platform provides the non functional requirements of the querying building block of the project will be evaluated. Non – functional requirements: N.F.R Efficiency_2: An effort was made to make the code as efficient as possible. More specifically when consecutive querying methods are executed, they all use the same connection to the database. This technique is applied because opening a connection to a database is a time consuming activity and performance plays a big part in this project’s objective. In addition by joining tables when needed instead of visiting the database multiple times in order to extract data the code is more efficient. N.F.R Adaptability: The code prevents the alienation of the user by having if clauses as well as try & catch clauses. This way even if the user inserts invalid parameters when trying to execute methods of the application the return values are indicative of the inconsistencies. 7.4 General Performance Evaluation The objective of this section is to test the performance of the application that was developed. Using Eclipse IDE tools as well as the Epsilon plug – in, the time needed for specific functions to be executed will be measured. A comparison will take place between 1. Executing these functions through extracting the data from the database. 2. Executing these functions using the data provided from the EMF model itself. All the test functions that will be timed and measured are implemented in the form of Epsilon Object Language (EOL) scripts. In addition a very big EMF model will be instantiated using EOL language. The application’s performance will be measured while handling this big EMF model. The metrics that are of interest are three at this point: 1. Runtime behaviour: For this metric all the objects of the model will be interrogated. In this case the database backed solution is expected to run significantly slower since all the data have to be extracted from the database. With the XMI persistency solution all the model is already loaded into memory so the interrogation of all the objects of the model is expected to be done much faster. 2. Boot time: For this metric the time for one element of the model to be accessed will be measured. The database backed solution is expected to perform faster since there is no need for the whole model to be loaded in memory in order to access the model. The loading time will be considered. 55 Scalable persistence of EMF models 3. Memory footprint: When only a small part of the model is accessed it is obvious that the memory footprint of the database solution is much more efficient. Only the elements that are on demand accessed need to be in memory in contrast with the XMI persistence solution that the whole model needs to be loaded into memory at all times. The process of realizing these tests will be explained thoroughly in the following sections. 7.4.1 EOL Scripts explained The EOL program that will be provided in Listing 7.1 is used for the instantiation of a quite big EMF model. The syntax is similar to that of Java. First a library object is instantiated that will be the root object. After there is a for loop with 30,000 iterations that instantiates 30,000 Writer objects and 30000 Book objects. The Writer and the Book objects are also added as references to the root object Library. In addition the attributes of the objects are instantiated too. The model can be considered quite big since it consists of 60,001 objects. In the hard disc the model occupies almost 2.6 MB Listing 7.1: EOL instantiating code var library : Library := new Library; library.name = 'myLib2'; for (i in Sequence {1..30000}){ var w : Writer := new Writer; w.firstName := 'w' + i; w.lastName := 'l' + i; library.writers.add(w); var b : Book := new Book; b.title := 't' + i; b.pages := 200; library.books.add(b); } In the first line of the EOL script in Listing 7.2 the Profiler object is instantiated. The Profiler object is used for doing time measurements when EOL scripts are executed. First the name of the library object is printed. Then the entire list of the Writers and all the titles of the books in the Library are printed. Listing 7.2: EOL script1 var profiler = new Native('org.eclipse.epsilon.eol.tools.ProfilerTool'); profiler.start("main"); var l = Library.all.first(); l.name.println(); for(writer in Writer.allOfType){ writer.firstName.println(); } for (book in Book.all) { book.title.println(); } profiler.stop(); 56 Scalable persistence of EMF models The EOL script in Listing 7.3 is printing the name on the root object which is of type Library which contains all the references of the Writer and Book objects. Listing 7.3: EOL script2 var profiler = new Native('org.eclipse.epsilon.eol.tools.ProfilerTool'); profiler.start("main"); var l = Library.all.first(); l.name.println(); profiler.stop(); 7.4.2 Extender plug – in Below in Figure 7.9 a part of role of the extender plug – in is illustrated. The UI of Eclipse menus were augmented by the adding an additional functionality. A file that is of type “.model” can be injected into an H2 database. So the actions that were taken so far were to instantiate a model using the EOL scripts and then by using the extended UI menu the instantiated EMF model was injected into an H2 database. The database that was created contains all the information included in the 20,001 objects. Figure 7.9: UI extension 7.4.3 EMF run time vs. database run time Below in Figure 7.10, the time to run the EOL script of Listing 7.2. The time presented is in ms units. The time for the tens of thousands of names and titles to be printed is 2.075 sec. This result is very fast and also anticipated. The EMF model is already loaded in memory so the amount time needed to manipulate it, is very little. Figure 7.10: Time through EMF model The amount of time to run the EOL script through the database is significantly more. The time needed is more that 35 minutes. The difference in executions times was of course anticipated since in this case all the data that are printed are extracted from the database and is not already in memory. 57 Scalable persistence of EMF models 7.4.4 EMF boot time vs. database boot time The time needed to access one object of the big EMF model from the database is presented in Figure 7.11. It is the time needed for the EOL script in Listing 7.3 to be executed which is 1.172 Figure 7.11: Database Boot time The time needed to access one object of the big EMF model directly from the model is presented in Figure 7.12. It is the time needed for the EOL script in Listing 7.3 to be executed which is only 0.015 sec. Figure 7.12: EMF Boot time The database solution is slower but its big advantage is that only a very small part of the model to be loaded in memory. So to make the comparison complete the time for the big EMF model to be loaded into memory has to be measured. This loading time was measured using Java. The time to load this big EMF model of approximately 2.6 MB was 1.689 sec. So finally is the EMF boot time is bigger than the database boot time since 1.689 sec + 0.015 sec > 1.172 sec. 7.4.5 Comparison and outcomes As analyzed above the run time to execute the EOL script that interrogates all the objects of the huge EMF model that was instantiated is significantly bigger when it is handled through the database than when it is handled directly through the EMF model. This was a result of course that was anticipated. On the other hand though, a model that contains tens of thousands of objects as the above is mapped to a huge XML file. This XML file takes much time to be loaded into memory. As it was presented in section 7.4.4 the boot time through the database backed solution that was developed is less than the boot time through EMF. This is because it is time consuming for the whole EMF model to be loaded into memory. The model of the example is 2.6 MB. For bigger models this difference would be even more obvious since a bigger model would take even more time to load and all the other metrics would remain stable. The objective of the database backed solution was accomplished since it is more efficient from the boot time perspective In addition the database backed solution is much more efficient from the memory footprint perspective. This fact is obvious since the data that is loaded into memory is done on demand. So if a small part of the model needs to be interrogated only this small part will be loaded into memory. On the other hand with XMI which is the default persistency solution of EMF the entire model needs to be loaded in memory even if a small part needs to be interrogated. 58 Scalable persistence of EMF models Chapter 8 Conclusion 8.1 Project overview The purpose of the project was to implement a persistency solution using databases for EMF models to address the scalability issues faced by EMF technology. To achieve the objectives of the project, a generic database design was implemented that is capable to store any kind of EMF model. In addition a set of querying methods that complied to the interface IModel were implemented in order to manipulate the data in the database with the ultimate goal of bypassing the loading problems EMF faces when persisted in XMI form. Moreover the application developed was transformed into an Eclipse plug – in. The plug – in is that was developed can be used to extend the functionality of the EMF platform by injecting an EMF model into a database. During the requirements evaluation all requirements were satisfied so the aims of the project can be considered that they were achieved. Chapter 2 of the project, the literature review identified the basic concepts of MDE as well as some basic features of the EMF platform. In addition two technologies that provided persistency solutions for EMF models were identified: Teneo and CDO. Epsilon platform was explained briefly too. In Chapter 3 the methodology of the project was explained. The reasons for the need of iterations throughout the project were discussed. In Chapter 4 the project motivation was analyzed in more detail. In addition the requirements analysis was presented. The analysis was done separately for each building block of the project and was divided into functional and non – functional requirements analysis. Chapter 5 of the project provided a detailed design of how the application for this project was designed. In addition a very important analysis was provided regarding the reasons for which the technologies of CDO and Teneo were not used. The analysis was cross-referenced with Chapter 4 of the project, the requirement analysis. After hands - on experiments and implementing examples using CDO and Teneo the two technologies would not be able to satisfy the requirements that were set for this project. In Chapter 6 the implementation details of the project were presented. A focus was given to the querying methods that were implemented. From the set of the querying methods the most important were analysed and the algorithms used to implement them were explained thoroughly. Also basic information was provided about Eclipse plug – ins in order to explain the implementation of the project in this context. In Chapter 7 the evaluation of the project took place. A case study was presented that demonstrated the functionality of the application developed. In addition testing cases, using black box testing, of the important querying methods were presented. The application passed all the testing cases. Moreover the requirements evaluation took place. More specifically for the evaluation of the performance part of the project the Epsilon Object Language (EOL) was used. Through EOL scripts the querying methods were double tested and their performance was measured. In addition a big EMF model was instantiated programmatically using EOL in order for the performance of the application to be tested against it. 59 Scalable persistence of EMF models 8.2 Personal development The project of Scalable Persistence of EMF models that was developed during the summer term of this year required studying and contextualizing many different technologies which have a steep learning curve. Due to that fact though, through the project development I had the amazing opportunity to acquire valuable knowledge of many different technologies. First of all through the extensive literature research around the subject I learned a lot about the basic principles of MDE as well as the benefits and the challenges faced by this technology. In addition I set firm basis on the fundamentals of EMF which include concepts as: EMF modelling and structure, automatic code generation, EMF reflective API, EMF persistence API and the XMI serialization. In addition I learned a lot about subprojects of EMF such as CDO and Teneo that gave me a holistic view regarding the persistence of EMF. Through the design phase of the project I had the opportunity to strengthen my UML skills as well as my knowledge about database design. While implementing this project I set a solid basis on the reflective API that EMF provides and how to manipulate EMF objects without the use of the automatic code generation facility. In addition I learnt a lot through hands on experience about the persistence framework of EMF. In addition I had the opportunity to exercise my Java skills, SQL development skills and contextualize to a better degree the powerful possibilities of Eclipse and Eclipse plug - ins. Moreover while implementing the evaluation part of the project I learnt how to set up configuration launches on Epsilon platform when using EOL language for model management activities. Overall it was a remarkable experience that helped me extend to a significant extent my computer science skills and introduce me to new exciting technologies. 8.3 Future work The objective to ultimately address the scalability issues that EMF faces is not possible to be addressed through an MSc project due to timing constraints. As thus there is room for improvements that will be discussed briefly below. Write property As discussed in the requirements section (Chapter 4) the purpose of this application was to query the models from the database for only read only purposes. A more difficult task and future improvement would be to expand the functionality of this application by adding methods that will be able to modify the model. This task will require much additional coding. Most importantly though, a mechanism should be provided that will propagate the changes made to the data in the database to the model itself. Additional case studies Due to timing constraints the application was not tested exhaustively. A good practice for future improvements would be to test the application against many big EMF models in order to expose possible holes from the performance perspective and address the grand challenge of scalable persistence of EMF models even more effectively. 60 Scalable persistence of EMF models Improve database design Another consideration for improvements could be a more efficient database design. More specifically at the references table another column could be added in order to map the object id with the EClass name. This information already exists in the Objects table. This means that in the references table there would be redundant data but this could improve the performance of the application since only one visit to the References table would be sufficient for extracting the needed data. 61 Scalable persistence of EMF models Bibliography [1] Anneke Kleppe, Jos Warmer, and Wim Bast, MDA Explained: The Model Driven Architecture™: Practice and Promise. Boston, United States of America: Addison-Wesley Professional, 2003. [2] K. Žagar A. Vodovnik, "MODEL DRIVEN ARCHITECTURE, CONTROL SYSTEMS AND ECLIPSE," in Accelerator & Large Expt. Physics Control Systems Conference, Geneva, 2005, p. 6. [3] Elena Tibrea, "Enriching EMF models," Hamburg University of Science and Technology, Hamburg, Msc 2006. [4] Ragnhild Van Der Straeten,Tom Mens Stefan Van Baelen, "First International Workshop on Challenges in Model Driven Software Engineering ," in Model Driven Engineering Languages and Systems, Toulouse, 2008, p. 61. [5] Richard F. Paige, and Fiona A.C. Polack Dimitrios S. Kolovos, "The Grand Challenge of Scalability," in Challenges in Model-Driven Software, 2008, p. 6. [6] Dave Steinberg Nick Boldt, "Intorduction to Eclipse Modelling Framework," IBM, Toronto, 2006. [7] Karsten Ehrig, Christian K¨ohler, G¨unter Kuhns, Gabriele Taentzer, Eduard Enrico Biermann, "EMF Model Refactoring based on Graph Transformation Concepts," in Third Workshop on Software Evolution Embracing the Change, 2006, p. 16. [8] Eclipse (undated): Eclipse http://www.eclipse.org/emf. Modelling Framework Project [On-line]. Available at: [9] F. Budinsky, M. Paternostro and E. Merks D. Steinberg, EMF Eclipse Modelling Framework, 2nd ed. Mishigan, United States of America: Addison - Wesley , 2008. [10] Bacvanski, V. & Graff, P. (2005). Mastering Eclipse Modelling Framework. Available at: http://www.eclipsecon.org/2005/presentations/EclipseCon2005_Tutorial28.pdf [11] Eclipse (undated). Emfatic [On-line]. Available at: http://wiki.eclipse.org/Emfatic. [12] Claudia Ermel, and Stefan Jurack Enrico Biermann, "Modeling the "Ecore to GenModel" Transformation," , 2010, p. 13. [13] Eclipse (undated). Eclipse Documentation – Archived release [On - line]. Available at: org.eclipse.emf.doc/references/javadoc/org/eclipse/emf/ecore/EObject.html [14] Eclipse Foundation. Teneo, 2008. Available at: http://www.eclipse.org/modeling/emft/?project=teneo. [15] Eclipse Foundation. CDO, 2008. Available at: http://www.eclipse.org/modeling/emft/ 62 Scalable persistence of EMF models [16] Eclipse Foundation (undated) CDO [On-line]. Available at: http://wiki.eclipse.org/CDO [17] Thomas Krafft (2010, June 21) Introducing the Objectivity Eclipse CDO [On-line]. Available at: http://eclipse.sys-con.com/node/1439578 [18] Dave Minter and Jeff Linwood, Beginning Hibernate From Novice to Professional, Steve Anglin, Ed. New York, United States of America: Apress Inc, 2006. [19] P. Charlan, D. Ouellet and M. Salois, “System architecture recovery for open source software integration.” Defence R&D Canada, 2009. [20] Stefan Winkler (2008, Jan. 29). The EclipseCorner [On-line]. http://dev.eclipse.org/newslists/news.eclipse.technology.emft/msg04425.html Available at: [21] Christian Fuchs (2002). Software Engineering and the Production of Surplus Value [On - line]. Available at: http://clogic.eserver.org/2002/fuchs.html [22] Ian Sommerville, Software Engineering. United States of America: Adison Wesley, 2007. [23] H2 (undated). Performance comparison http://www.h2database.com/html/performance.html [On - line]. Available at: [24] A. Bolour (2003, July 3). Notes on Eclipse plug- in architecture. Eclipse Corner [On - line] Available at: http://www.eclipse.org/articles/Article-Plug-in architecture/plugin_architecture.html [25] Eclipse (undated). Reflective EMF tutorial [On http://www.eclipse.org/gmt/epsilon/doc/articles/reflective-emf-tutorial line]. Available at: [26] RedStone Software (undated). Black-box vs. White-box Testing: Choosing the Right Approach to Deliver Quality Applications [On line]. Available at: http://www.testplant.com/download_files/BB_vs_WB_Testing.pdf. [27] Elmasri, R. & Navathe, S. (2007). Fundamentals of Database Systems. Boston,Mass; London. Pearson/AddisonWesley. [28] DIS lecture notes. (undated) ER modelling [On - line]. Available at: http://wwwcourse.cs.york.ac.uk/dis/ [29] Kolovos, D. & Paige, R. & Rose, L. & Polack, F. (undated). THE Epsilon BOOK. [30] B. Selic, "The Pragmatics of Model-Driven Development," IEEE Software, 2003. [31] L. M. Rose, R. F. Paige, D. S. Kolovos and F. A.C. Polack., "The Epsilon Generation Language," in European Conference in Model Driven Architecture (ECMDA), 2008, p.16. [32] D. C Schmidt, “Model-Driven Engineering”. Vanderbilt University, 2006 [33] S. Kent, “Model Driven Engineering”. University of Kent, 2002. [34] H2 (undated). H2 database engine [On - line]. Available at : http://www.h2database.com/html/main.html [35] D. Kolovos, "An Extensible Platform for Specification of Integrated Languages for Model Management," PhD Thesis, The University of York, 2008. 63 Scalable persistence of EMF models [36] N. Boldt, D. Steinberg (2006, March 20), Introduction to Eclipse Modelling Framework [On line], Available at: http://www.eclipse.org/modeling/emf/docs/presentations/EclipseCon 64