openModeller: A Framework for species distribution Modelling A performance evaluation approach Agenda 1 openModeller Framework 2 The Performance Evaluation Process 3 Infra-structure Implemented 4 Architecture Proposed 5 Simulation Model 6 Conclusions openModeller Framework: an overview openModeller is a fundamental ecological niche modelling framework. A number of fundamental niche modelling algorithms are provided as plugins, including GARP, Minimum Distance, Climate Space Model, Bioclimatic Envelopes, and others. The software includes facilities for reading species occurrence and environmental data, selection of environmental layers on which the model should be based, creating a fundamental niche model and projecting the model into an environmental scenario. openModeller Framework: an overview The process: 1. a set of occurrence points for a species; 2. a set of environmental layers (rainfall, temperature etc.); 3. choose an algorithm to be used to construct the niche model, and select appropriate parameters for the algorithm; 4. generate the ecological niche; 5. use the generated model to calculate a probability of occurrence surface by projecting the model into a set of set of environmental layers for a given region. openModeller Framework: The process: OM Console an overview The Performance evaluation Process To produce the model and generate the projected probability surface of species occurrences, the system uses computational resources in an intensive way. The modelling process is complex and demands a lot of processing and time. Prior to starting the performance evaluation process, a detailed study of the openModeller execution flow was carried out. The scientists and researchers that actually use the system were interviewed. The first step in conducting a performance evaluation was to divide the openModeller Framework into separate components. These components were then each installed and configured within a COM+ environment host. The Performance evaluation Process Thus a methodology was devised to enable analysis of the performance of component-based applications and to collect data from components. To implement this methodology it is necessary to acquire performance parameters for the openModeller framework. The Performance evaluation Process The AOP techniques were used because no severe performance penalties were introduced in the openModeller application. Through AOP techniques it is possible to intercept, for instance, a method call and get the time that this method takes to execute. The AOP approach, in conjunction with Object Oriented Programming techniques and asynchronous messages processing architecture approach, proved to be an efficient way to collect performance data for each component and method. These techniques offer control over the granularity of instrumentation and decrease the level of overhead due to instrumentation. The Performance evaluation Process After running the application and recording performance metrics, the analysis of the evidence collected can be carried out without having to understand the work flow of the entire solution execution. Additionally evidence can be analyzed on a per module bases removing the need to process enormous amount of performance information collected during the application execution. The other metrics like processor and memory utilization was obtained from WMI API provided in the .NET Framework. A back office visualization tool was developed to allow the consolidation of the collected results. The visualization tool also enables the localization of the most important performance problems that occurred during the execution of the application. After locating the candidate bottlenecks, the next step is to understand the causes of those performance problems. The Performance evaluation Process The results consolidation tool The Performance evaluation Process Through the analysis of the results it was possible to determine the methods with high call time. Based on the evidences collected, four categories were created with the purpose to discover and divide the bottlenecks in the program workflow. The next graph shows the result: Furcata Boliviana – 1Layer – BioClim Algorithm Infra-structure implemented To minimize the impact of performance penalties with the introduction of instrumentation, an asynchronous message processing approach was implemented. In this way, the user starts the process, and as the component methods are called, the AOP class implementations are called in a separate thread. A message is built containing the time consumed by the method. The WMI API is then called to profile the metrics used and, in sequence, puts this message in the Microsoft Message Queue. Parallel with this execution thread, the main thread, which is executing the openModeller algorithm, continues processing normally. Infra-structure implemented The next illustration shows the workflow of the architecture and infra structure implemented to enable the gathering of evidence related to the performance evaluation process. Architecture proposed The second layer is the core of the openModeller request processing subsystem and can be composed of several servers hosting different modeling algorithms. This layer represents the main module of the architecture. The following diagram represents a logical division of the openModeller system in a distributed way: Simulation Model The main idea of this model is to allow the simulation of the system and analyze different component system configurations over the high performance infrastructure. This model is a tool for support decision to define the better distribution of the components of openModeller and identify possible bottlenecks. Based on this architecture, this research represents the simulation model of the software components. This model is based on closed network queues, where each queue represents a software component of the system. Conclusions This research was very useful to provide the openModeller application with architecture and infra-structure highly scalable and available and in agreement with the biological researches and scientists necessities, through parallel and distributed processing. The AOP technique to instrument the code has proved to be an efficient way to collect performance data as it could control the granularity of data collected through interception mechanism provided by the COM+. The asynchronous message processing used in the AOP implementation decreased the interference of this instrumentation in the end results. The application back office helped the analysts to quickly find the bottlenecks in the systems and to propose a way to turn the code more efficient. Conclusions The next stage of this research will be the simulation of the model of the component architecture created. It will turn possible the architecture optimization through better component distribution in manner to decrease the response time to the user and decrease the use of the computational resources by the openModeller system. Acknowledgment The authors are grateful to FAPESP, The São Paulo State Research Foundation, Brazil for the support to the openModeller project.