THE EUROPEAN GRID OF SOLAR OBSERVATIONS Robert D. Bentley1, Dept. Space and Climate Physics, University College London Anthony Finkelstein, Dept. Computer Science, University College London C. David Pike, Dept. Space & Technology, Rutherford Appleton Laboratory Valentina Z. Zharkova, Dept. Cybernetics, University of Bradford ABSTRACT The European Grid of Solar Observations (EGSO) is a Grid testbed funded by the European Commission under the Information Society Technologies (IST) thematic priority of the Fifth Framework Programme (FP5). EGSO will provide the tools and infrastructure needed to create a data grid that will form the fabric of a virtual solar observatory. this will encourage participation and that this will ensure its long-term viability. The problems that EGSO addresses are not unique to solar physics. Other disciplines also have distributed data sets that are becoming too large to copy around and a principle objective of the EGSO is to develop tools that can also be used on other projects. 2. EGSO started in March 2002 and will last for 36 months. The project involves eleven groups from ten institutions located in five countries in Europe and the US and is led by University College London – a total of four groups are from the UK. The EGSO Consortium is in discussion with other groups interested in creating a virtual observatory with the aim of finding a solution that is universally acceptable. 1. INTRODUCTION The task of identifying solar data sets of interest, then locating and retrieving them, remains a continuing difficulty. The data are heterogeneous and widely distributed, without any means to tie them together, and there is no systematic way to identify observations associated with a particular feature or type of event. Also, the rapidly increasing volume and complexity of solar data necessitate a sea change in the way the data are handled. EGSO, the European Grid of Solar Observations, is designed to confront these issues. It will allow a user to identify solar observations covering a given time interval and pointing, or a type of feature; it will locate the selected observation and then return them after any necessary processing. To achieve its objectives, EGSO is developing new forms of catalogues: unified observing catalogues derived from existing catalogues, and feature and event catalogues. It will provide the tools to search these, and will federate data archives to simplify the recovery of the data. We want EGSO to succeed and every effort will be made to ensure that it is attractive for scientists to use and not too complex or onerous for data providers to support. If EGSO is generally acceptable, we hope that 1 For more information, e-mail: bentley@egso.org OVERVIEW 2.1 The Generic Problem In solar physics, observations are used to construct a picture of the plasma in multi-dimensional parameter space, including space, time, temperature and density. The observations are made at different wavelengths originating from different levels in the solar atmosphere and the combined information allows the user to build up an understanding of the changes in structures, motion of material, sites of energy release, etc. Observations from both ground and space are both important. Satellite-based observations are made at wavelengths that do not penetrate the Earth’s atmosphere, including UV, EUV and X-rays. Groundbased observations, mainly in optical and radio wavelengths, compliment those from space. Satellites are usually operated under the umbrella of large organizations and the instruments they carry are often built by international collaborations – as a consequence, the data are handled in a more systematic and open manner. Data are stored in archives, often at mission level, with copies at one or more sites. The files have various formats, including FITS, and range from single images to extended intervals (an hour or orbit). The ground-based observatories involved are both large and small, and are located throughout the world, scattered over many time zones. Since observatories only observes for a fraction of the day and are often affected by weather, good coverage often means dealing with a number of observatories. The data are usually available as FITS files of single images. Often there is a single copy of the data, managed by observatory. 2.2 Data Analysis When analyzing solar observations, the user undertakes the following three steps – they are beset with problems: • Identify suitable observations Many studies relate to the state or evolution of features. They involve time intervals from a few minutes to many hours, and areas from a fraction to the whole solar disk. Frequently, they make use of serendipitous rather than planned observations – an instrument observes the sun, and post-facto events or feature of interest are identified. To gain any understanding, it is necessary to use as many different wavelengths as possible. When undertaking a study, the researcher will have identified a particular event, or the occurrence of a type of feature, and will then have the task of identifying the observations they need to investigate the phenomenon. Catalogues are key to this. Solar observing catalogues differ in quality, contents and format and in their availability and accessibility. For several space-based instruments, their observing catalogues are distributed within the SolarSoftWare DataBase tree, making it possible to use search then with SolarSoft [1]. However, the size of the catalogues deters most sites from holding a complete set. Also, some catalogues have dependencies on ancillary data or consist of multiple interrelated files making them difficult to access except with specialized software. • Retrieve the data At some sites users can identify suitable observations from observing catalogues distributed with SolarSoft, but these are the exceptions. Also, the tools to easily conduct such searches for multiple instruments are almost none existent. As a consequence, the user often has the problem of both identifying and retrieving the data at the same time. The data are heterogeneous, widely scattered, with differing means of access. Being unable to identify suitable observations a priori means the user has to access many sites in their effort to gather data. Although they often need only a subset of each data file, it is necessary to retrieve several quite large files containing extended intervals through sometimes quite crude interfaces. This can be quite a painful experience. • Process the data Once the data has been retrieved, they have to be processed. This usually involves the extraction and calibration of a subset of the data. Here, solar physics has a great advantage over many disciplines. SolarSoft provides a common set of analysis tools that are distributed globally – it also establishes the environment in which to use them. Calibration data are often usually distributed with the software in the SolarSoft tree. 3. PROJECT DETAILS 3.1 Project Objectives In the EGSO contract (IST-2001-32409), the project declared several objectives including: • Develop the middleware to federate solar data archives across Europe, and beyond • Create the tools to select, process and retrieve distributed and heterogeneous solar data • Provide the mechanism to produce standardized observing catalogues for solar observations • Provide the tools to create a solar feature catalogue • Make all tools and middleware created by the project open source In essence, EGSO will create the fabric of a virtual solar observatory and will provide an entry point into solar data for other disciplines, including space weather, climate physics, and astrophysics. 3.2 Project Phasing & Status The work in EGSO is divided into four phases: I. Project definition; consult the community explore and experiment with technologies II. Architectural design; prepare system integration and validation plan III. Implementation of the design; development of middleware and catalogues IV. Product commissioning and delivery EGSO is currently (in August 2003) in the early stages of Phase III. A detailed set of system requirements was drawn up during 2002. These were prioritised and used as guidance for the EGSO Architecture – this was delivered to the Commission in early 2003. The architecture was refined over the following months and implementation started in the summer of 2003. 3.3 Who is involved in EGSO EGSO consortium is comprised of groups that have considerable experience in handling solar observations and provide access to a representative subset of currently available data. It also includes groups with the expertise in information technologies (IT) that will be needed to develop the project. Details of the consortium are given in Table 1 – the Swiss and US partners depend on their own funding. The two US partners in EGSO are also members of the US Virtual Solar Observatory (US-VSO) which is funded by NASA. Other US groups have recently become associated with the EGSO. These include other members of the US-VSO, Stanford University and MSU, and Lockheed-Martin, lead of the Collaborative Sun-Earth Connector (CoSEC), funded by NASA under Living with a Star. EGSO, US-VSO and CoSEC held a joint meeting in October 2002 and are now trying to collaborate as closely as possible. Other groups involved in EGSO include the European Space Agency, and a branch of Astrium (which is acting as an observer on the project). Table 1. EGSO Project Consortium Members and Associate Members Consortium Members Country University College London (PI Group) Dept. of Space and Climate Physics Dept. of Computer Science ** Rutherford Appleton Laboratory Dept. of Space and Technology (**) University of Bradford Dept. of Cybernetics ** Institut d’Astrophysique Spatiale ** Observatoire de Paris-Meudon Isituto Nazionale di Astrofisico (Obs. of Turin, Naples & Trieste ) Politechnico di Torino Dept. Automation & Informatics ** University of Applied Science Dept. of Computer Science ** Solar Data Analysis Center, NASA-GSFC National Solar Observatory Note: Groups marked “**” have IT expertise UK Associate Members Astrium plc. (UK/French/German company) Stanford University Montana State University Lockheed-Martin European Space Agency (SOHO Project) 4. UK UK France France Italy Italy Switzerland USA USA Country • • • Use Cases solicited from the solar community Consultation with the user community Brainstorming, etc. resulting in the EGSO Concepts Document The EGSO architecture is designed to meet these requirements and be a flexible and extensible as possible. It is divided into three roles: Provider, Broker and Consumer. The Provider role involves interactions with data and other providers – it includes software that might reside on data centre systems. The Consumer role includes the interaction with the users of EGSO, provides access through the User Interface and any necessary workflow capabilities. The Broker acts as a switching centre for requests. It maintains registries that allow it to keep track of available resources, including data and metadata, and can make decisions on how best to satisfy a request. Below, we highlight a few features of the EGSO system. 4.1 Feature Recognition Tools Experience gained by the Observatory of Paris-Meudon (using Hα data) is being combined with the expertise in feature recognition of the Cybernetics group at Bradford University to extend existing techniques to develop a set of tools to detect solar features such as filaments, sunspots, active regions, etc. Once developed and fully evaluated, these techniques will be applied to a selected set of synoptic data to build a valuable new catalogue, the Solar Feature Catalogue. The tools will also be available for the user to apply to individual images. UK 4.2 Catalogue Preparation & Access USA USA USA Netherlands DESIGN AND IMPLIMENTATION Four workpackages will create the key components of EGSO. Together these cover the three steps described in Section 2.2. In the EGSO implementation, as far as possible data will be extracted before being returned to the user – this is departs from the current process of extracting after the data are retrieved and greatly simplifies the user’s software installation. The EGSO system requirements prepared in 2002 drew on a number of sources: • A user survey conducted in collaboration with SpaceGrid , an ESA sponsored Grid study project Catalogues are key to locating the data, and their importance will grow as the rapidly increasing volumes of data prohibit the unnecessary copying of data sets. Existing observing catalogues are heterogeneous and this makes them difficult to search. EGSO will produce standardized solar catalogues to simplify the searches of existing data, and ensure that future data will be more accessible. The idea of standardized cataloguing was first proposed as the Whole Sun Catalogue [2]. The proposed Unified Observing Catalogues (UOCs) will be self-describing, quantized into fragments by instrument and time interval, have dependencies on ancillary data removed and errors corrected. The catalogues will be designed so that they do not have to be held in a centralized location – the data are distributed, it is only rational that the catalogues should be treated in the same manner. If necessary, the UOC can be created as it is needed. A side effect of this design will be that it will be very easy to add new data sets, and to update catalogues of existing data sets as new observations are made. Two additional catalogues, the Solar Feature Catalogue (SFC) and the Solar Event Catalogue (SEC), are intended to provide a new entry point into solar data. They will allow the user to search for events, features and phenomena, rather than just date, time, location and wavelength. Existing lists of events or features will form the Event Catalogue. To extend these, and produce a more systematic approach to solar features, the Feature Catalogue will be produced using the feature recognition software. A search of the Feature and Event Catalogues will yield a list of dates, times and locations that then link into a search of the observing catalogues. The Solar Event Catalogue is being implemented as a stand-alone server. This will hide the complex nature of the event data that must be gathered from a large number of sources and is very heterogeneous. The Server will permit complex searches over a number of different types of list and will return the answer in a standardized format. A similar server is planned for the Solar Feature Catalogue. This will have much in common with the SEC Server, but will have additional capabilities to rotate image data to the same epoch to allow the comparison of features on demand. It will be possible to access both the SFC and SEC Servers from outside of EGSO. They will be usable by VSO and CoSEC, as well as by projects like AstroGrid. 4.3 Tools to Select the Data The entry point into EGSO for many users will be through a graphical user interface (GUI). This will allow the user to define criteria for selecting data that is based either on date & time, pointing, and wavelength, or on features, events and phenomena. To assist the user, synoptic images and other data will be used to provide the context for high-resolution observations. Once an initial search of the UOC provides a list of available observations, the user will be able to refine the selection with the aid of quick-look images and movies. An alternate entry point into EGSO will be provided for other communities that are interested in solar data. These include climate physics, astrophysics, solarterrestrial physics and space weather. This entry point will also provide access to EGSO from applications such as IDL (e.g. from within SolarSoft). 4.4 Data Provider Federation The metadata catalogues are used to relate the heterogeneous solar data. They allow the user to identify what observations are available – it then necessary to retrieve them. The observations could be anywhere around the world, in large or small data centres, and might even be held in multiple locations – also, the data could be in the public domain or proprietary. We recognise that the resources available varies from centre to centre. Some smaller data centres do not have resources for full federation, but would still like to be involve. Mechanisms will be provided to affiliate very small data sources to larger centres – requests would be serviced by the larger centres that would then interact with smaller sites through some form of trusted host arrangement The user does not need to know this – the system should take care of it all, selecting data sources and granting user access as appropriate. 5. THE EGSO DEMONSTARTOR The first demonstrator of EGSO will be available shortly. The initial implementation will include only a subset of available solar data – enough to prove the concept, and allow us to test standalone components. Data from consortium members will be used in the first instance. These provide what is needed to test EGSO: heterogeneous data (both space- and ground-based) scattered over a number of sites, with some duplication, and with a variety of data formats and catalogue capabilities. Emphasis will be place on the user interface so that we can establish the optimal way to design this, and provide maximum search capabilities. 6. SUMMARY The European Grid of Solar Observations will provide the tools necessary for a virtual observatory, but is essentially a Grid testbed. EGSO will form a sea change in the way solar data are accessed. In collaboration with other groups, we are striving to ensure that the project will find global acceptance and lead to the creation of a worldwide virtual solar observatory. Further details about the EGSO project can be found under the URL http://www.egso.org 7. REFERENCES 1. Freeland, S.L. and Handy B.N., SolarSoft, Solar Physics, Vol. 182, 497, 1998. Sanchez Duarte, L., Fleck, B. and Bentley, R., The Whole Sun Catalogue, Proceedings of 1st Advances in Solar Physics Euroconference, ASP Conf. Ser Vol 119, 382, 1997. 2.