A new tool for fundamental niche modelling Renato De Giovanni Centro de Referência em Informação Ambiental, CrIA openModeller • Definition • History • Motivation and features • Design • Interfaces and additional tools • Algorithms • Future plans Definition openModeller is an open source C++ library completely dedicated to static spatial distribution modelling. Applications Biology: Fundamental niche modelling. Geology ? Demography ? Others ? openModeller’s history apr 2003: Initial design of a new modelling environment at CRIA as a natural consequence of previous experiences with other tools (DesktopGarp). oct 2003: First working prototype as part of the speciesLink project (Fapesp). dec 2003: Released all source code (sourceforge). feb 2004: Partnership with BDWorld (CSM / GRID component). apr 2004: Partnership with University of Kansas (GARP / BTRA). jan 2005: Released first graphical user interface (Tim Sutton & Peter Brewer). may 2005: Basis of a new thematic project funded by Fapesp (4y). Main Motivation Facilitate and speed up modelling tasks, offering at the same time a homogeneous environment to carry out experiments with different algorithms. Main features • • • • • • • • Platform independent. Enables the existence of multiple interfaces on top of it. Accepts different formats of georeferenced maps. Accepts different coordinate systems and projections for each map and for the whole set of occurrence points. Accepts different cell sizes and extents for each map. Allows the different algorithms to use exactly the same input and the same working environment, therefore enabling fair comparison between all results. Isolates algorithm logic from other issues related to maps, georeferencing, input and output formats, etc. Offers a collaborative and transparent environment for all interested developers. Architecture overview pluggable algorithms interfaces Console API API SOAP server Bioclim open Modeller SWIG wrapper GARP CSM others... drivers (GDAL, proj4, etc) others... points (diff. coord systems) maps (diff. formats) Interfaces and additional tools • Command line / Console suite – om_console – om_viewer (X11) – om_niche (X11) • SWIG wrapper – Python • SOAP interface (prototype server and sample client) • Web interface • Graphical User Interface (Linux, Windows, Mac OS) Console interface >> om_console request.txt WKT Coord System = Species file = Species = Map = Mask = Output map = Output mask = Output format = Output file = Algorithm = Parameter = Console interface Console interface Tool for visualizing maps >> om_viewer -r request.txt Tool for visualizing models >> om_niche request.txt Web Interface Web Interface Graphical User Interface Graphical User Interface Graphical User Interface Development of algorithms • Metadata definitions (name, version, author, description, bibliographic references, parameters). • Method to initialize the algorithm. • Method to generate the model. • Method to calculate the probability of occurrence given a certain vector of environmental values. Algorithms: Building models Sampler gives the algorithm vectors of environmental values from a set of occurrence points: Ex: [20˚, 115 mm], [22˚, 100 mm] open Modeller Algorithm API Algorithm uses the values to build a distribution model and stores an internal representation of it. Algorithms: Generating distribution maps For each cell of the resulting map, openModeller asks the probability of presence sending the vector of environmental values as a parameter. Ex: probability for [30˚, 90 mm] ? open Modeller Algorithm Algorithm answers with a probability of presence. Ex: prob = F( [30˚, 90 mm] ) = 0.8 Algorithms • Bioclim • Climate Space Model (Broken Stick cutoff method) • GARP (incl. best subset procedures) • Distance algorithms – Distance to average – Minimum distance Algorithms - Bioclim • Assumes normal distribution for each environmental variable. • Envelopes are represented by the interval [m - c*s, m + c*s], where 'm' is the mean; 'c' is the cutoff parameter; and 's' is the standard deviation. • Besides the envelope, each environmental variable has additional upper and lower limits taken from the maximum and minimum values related to the set of occurrence points. • Points are classified as: suitable, marginal or unsuitable. fig. 1: cutoff = 0.674 fig. 2: cutoff = 0.99 Algorithms - GARP • Genetic Algorithm for Rule-set Production: models are represented by a set of rules generated by a genetic algorithm. • Non-deterministic: produces a different model each time the algorithm is run. fig. 1: model 1 fig. 2: model 2 fig. 3: model 3 Algorithms – GARP with Best subsets procedure • Runs several GARP models and chooses the best ones according to omission and commission erros. • Resulting model is the overlapping of models that were selected in the previous step. fig. 1: sample model Algorithms – distance to average • Normalizes environmental values and parameter. • Calculates the mean point in environmental space considering all presence points. • Probabily of presence is proportional to the Euclidean distance from the average point (linear decay). • Parameter determines the maximum accepted distance. fig. 1: parameter = 0.1 fig. 2: parameter = 0.3 Algorithms – Minimum distance • Normalizes environmental values and parameter. • Probabily of presence is proportional to the Euclidean distance from the closest point (linear decay). • Parameter determines the maximum accepted distance. fig. 1: parameter = 0.05 fig. 2: parameter = 0.1 Use case – Byrsonima subterranea Brad. & Markgr. = original point = 4 new points Scope issues & known limitations • Works only with static models – dynamic modelling is currently outside the scope of this tool. • None of the algorithms can handle categorical maps (although the library is already prepared to deal with them). • None of the algorithms can handle absence points (except GARP), and none of the high level interfaces is prepared to receive absence points as an additional parametrer. • Produces only bi-dimensional maps – not prepared to produce models in three dimensions (especially considering aquatic environments). • Still not sufficiently documented! • Still not sufficiently tested! Future plans • Implementation of other algorithms: neural nets, cellular automata, GLM, GAM, GRASP, Domain… • Development of new components to help on pre-processing and post-analysis. • Finalize Web and SOAP interfaces. • Develop SWIG interfaces for other programming languages. • Improve documentation. • Implementation of a new and advanced graphical user interface. New version of the graphical interface Institutions & People Mauro Muñoz Renato De Giovanni Tim Sutton Peter Brewer Ricardo S. Pereira Kevin Ruland Jens Oberender Thank you http:// openmodeller . sf . net renato (at) cria . org . br