Task Force on Seasonal Prediction

Task Force on Seasonal Prediction - data handling strategy Working Draft 1.0 - 09 Dec 2005 The TFSP meeting in Trieste (August 2005) requested that a working group should create a detailed proposal for data handling for the experimentation to be carried out for TFSP. The proposal is intended to follow the outline strategy discussed and provisionally agreed at the Trieste meeting. This document aims to develop the specifics of the proposed strategy. A. Trieste strategy After much discussion of the relative merits of a centralized versus distributed system of sharing data, it was proposed to try a hybrid solution. That is, standards will be set, and producing (or distributing) centres will be able to serve their own data so as to meet the specified standards. Alternatively, if a producing centre would prefer not to be responsible for serving its own data, it can pass the data to another centre which is willing to serve it. It is envisaged that there will be several centres which will be willing and able to serve other people’s data, at least from within a specified region. The data is envisaged as being served in CF compliant netCDF, probably with an OPeNDAP (ie DODS) interface. Other data formats and data serving options are possible as additional extras, but the netCDF service is a mandatory minimum. Issues that the working group need to consider are: * The metadata content needed * How the metadata will be specified in the netCDF * Agreement on OPeNDAP as the initial web interface standard, and any issues arising from this. * The data volumes to be expected, the extent to which they are feasible, and whether certain parts of the data might need to be made optional. * Identifying sufficient capacity to serve the expected datasets * Any recommendations on procedures to ensure correctness of served data * How the strategy relates to other data strategies and projects B. Working group proposals 0. Introduction The following proposal takes account of established practice at operational centres, usual practice in the research community, and the protocols established at PCMDI for handling the IPCC data. It also takes account of detailed work undertaken in Europe on how to make ENSEMBLES data available in CF compliant netCDF. Additionally, we bear in mind what might be needed to (partially) harmonize the metadata with the structures being developed by the operational side of WMO for use eg in TIGGE. Proposal: The TFSP data should be CF compliant netCDF data, with specified metadata content. To complete the proposal, we need to specify the required metadata, and also give rules and guidance on how the metadata is to be encoded in netCDF, and how files 1 ought to be structured for data exchange. These issues are dealt with in the following sections. 1. Metadata content In the fist instance, we assume that data to be exchanged are raw model output. If calibrated forecast products, anomalies, climatologies, verification scores etc are to be exchanged, then further metadata will be required. Note that although we will discuss below a particular representation of the metadata in netCDF, it is the metadata themselves which are the most fundamental part of this proposal. The representation of the metadata may change in the future, due either to new versions of netCDF and CF, or possibly even new data formats altogether, but the metadata should be relatively stable. The metadata discussed here are those needed to define a single model integration. Requirement: the metadata must be machine readable, must properly distinguish different datasets in a way that enables the data to be archived, and must provide metadata useful for data searching. Metadata should also be useable for automatic plot labelling. It is helpful to distinguish which metadata define the data, and which simply provide additional information. The latter could be used in database searches and for labelling purposes, but would not form part of any archive structure. These additional variables are listed below as comments. Some of the metadata are names in the form of strings. We may want to create different (linked) versions of these, for example one short fixed version (suitable for long term archival purposes) and one slightly longer more descriptive version. Many of the metadata are logically independent, in the sense that specifying one does not fix the value of another. However, metadata can be linked. For example, we might provide both a long and a short name for an institution, or we might describe certain characteristics of a given experiment identifier. Such logical connections are noted below, since in some representations of the data (notably netCDF), they may affect how the metadata can or should be coded. Defining metadata: i. originating_centre: eg Met Office - centre with scientific responsibility for integrations (STRING, max length=6 and/or 16) (definition) [It has been suggested that this should be coordinated with the work by WMO to define unique identifiers for producing centres, but initial discussions have not been promising. Perhaps we should have the definition being a unique, time invariant short string (length 6), and a separate metadata item such as centre_name, which would include a nice English language name which could be used for labelling etc, and which might change from time to time as institutes re-brand themselves.] ii. experiment_identifier (STRING, max length=6 or 16). The originating centre is fundamentally responsible for assigning unique experiment identifiers for the different datasets it makes available, and should (ideally) provide documentation of each experiment. It is possible for common experiment identifiers to be agreed between different centres, if they are carrying out a common experiment. But there is 2 no a priori guarantee that identical identifiers from different centres refers to scientifically equivalent experiments. (definition) iii. forecast_system_version_number (assigned by originating centre; scientific details of the models used etc should be provided via a web link, INTEGER) (definition) iv. forecast_method_number (default =1) (This distinguishes forecasts made with the same underlying model/forecasting system, but where variations have been introduced such that the different integrations have different properties, most importantly different climate drift. An example is the members of a perturbed parameter ensemble forecast. INTEGER) (definition) v ensemble_member_number (Different integrations made with the same model and forecasting system, which form a homogenous and statistically indistinguishable ensemble. INTEGER) (definition) Additional metadata: i. original_distributor: eg ECMWF - centre with responsibility for operational or research distribution of data, ie the centre who first made the data publicly available, and to whom queries of data integrity should be sent. (STRING, max length=16) (comment) ii. production_status: operational, research, or a user defined <project_id>. “research” should be used for general research at a centre; project_ids should be used for specified international research projects. (STRING, max length=16) (comment, logically associated with experiment identifier) iii. model_identifier (no default) (STRING, max length=16) (comment, logically associated with forecast_system_version number) iv. sst_specification (STRING, “coupled” or “observed” or “predicted” or “persisted anomaly” or “persisted absolute”, logically associated with experiment identifier) v: real_time “true” or “false”, according to whether the forecast was made in realtime. Not an attribute of the experiment or the system_version, but of the individual forecast. vi: archive_date “YYYYMMDD” or “unknown”. When the data was produced, archived or published. The aim is to provide an approximate timestamp, to easily distinguish between recent experiments and much older ones. Also, in the case that data need to be corrected in a globally distributed data system, the archive_date could be used to distinguish between the older, original data and the newer, corrected data. An attribute of the individual model integration. An appropriate definition of “real time” will need to be given. A first proposal is “a seasonal forecast issued less than one calendar month after the nominal start date; or a short to medium range weather forecast issued less than 24 hours after the nominal start date”. 3 A single experiment from a single centre might include multiple models. Note also that origin/expver/system/method form a natural ‘tuplet’ which defines a particular homogenous forecast, whose ensemble size is then spanned by ensemble_member_number. A ‘multi-model’ forecast consists of a collection of ‘tuplets’. Which elements of the tuplet vary between different members of a multimodel ensemble does not really matter for the processing of the forecast data. Data from each tuplet are treated as statistically separate; different ensemble members of a given tuplet are processed together. Although not needed for distribution and archive purposes, it is suggested that ‘comment’ metadata should be mandatory, since this will give a homogenous dataset and aid future searching of the data. The above metadata offer flexibility in describing different experiments, and are intended to allow fairly straightforward mapping from existing metadata practice in the global seasonal forecasting community. We strongly request feedback from producers of seasonal forecast as to whether the above metadata are adequate. 2. Representation of metadata in CF compliant netCDF CF compliant netCDF provides a language that can be used for describing the data content of a file. It does not provide a natural language for describing data independently of the file in which it is embedded. Further, it does not (as yet) provide a standard logical structure for describing the data in a given file. For example, a set of six fields, with specified attributes, could be described with those attributes in several structurally different ways with CF compliant netCDF. In order to produce files from different groups which are homogenous and consistent (and therefore amenable to straightforward common processing by software) it is necessary to give very detailed instructions on how the data should be written - the requirements for IPCC data are an example of this. There is an argument that the CF convention should be tightened and/or extended to simplify this process. ECMWF and the ENSEMBLES project are considering proposing an extension to the CF convention which would remove these ambiguities for seasonal forecast data. How such a proposal might look is discussed below. Whether such a proposal will succeed and become part of the CF convention is not yet known, but comments on the ideas are invited. CF compliant netCDF mandates or recommends the following global attributes, which are designed to document the overall nature of the data: Conventions “CF-1.0” Title Institution Source History References Comment The above fields are often filled in as lengthy, human-readable strings, sometimes with multiple pieces of information under one heading. TFSP recommends that 4 these fields are filled following existing best practice, in a way that clarifies to the human reader the source and nature of the data. These “human readable” metadata are intended purely for human consumption, and are not useful for categorizing the data, since they are unstructured and will be filled in in different ways by different groups. Example: Title: Meteo-France seasonal forecast data Institution: “Model run by Meteo-France. Data processed by ECMWF. Data distributed by ECMWF.” Source: “Data generated by Arpege model, run by Meteo-France at ECMWF.” History: References: “http://www.ecmwf.int/products/forecasts/seasonal/documentation.html” Comment: “Part of EUROSIP multi-model forecast system. Use of data subject to EUROSIP data policy - see web link for details” Ideally the above would contain more specific information, such as version numbers, system numbers, resolution etc. However, since data are normally generated automatically by computer programs, it is hard to have too much detail in freeflowing text of the above sort without the risk that it becomes inaccurate when details change. Better to be vague than to be wrong. TFSP recommends that a web link is given which gives access to a full description of the data, the meaning of experiment identifiers etc, and details of data policy if required. The use of a web link is much more appropriate than trying to include large amounts of detail in the netCDF file itself, and also allows relevant information to be kept up to date. The web link given in the global attributes of the file Since we are recommending a specific schema or layout for data, it may help if conformance to this is indicated by a global attribute, particularly in the case that the meaning of the file structure is not tightly definable by CF compliant standard names. Thus we propose the global attribute :schema = “TFSP-1.0” This also allows a version control on the data layout specification. The string could be WCRP instead of TFSP, if the JSC would like to adopt our standard for wider use. We now describe how the machine-readable metadata should be encoded in CF compliant netCDF. TFSP recommends that any strings used remain as standardized as possible, ie changes to case, spacing and abbreviations should be avoided. Over a long period of time, institute names etc are likely to change (eg past changes of NMC to NCEP), and it will be necessary to provide appropriate documentation of this to aid data searching. Outline of proposed CF-compliant netCDF data layout: dimensions: latitude=180, longitude=360, level=10, time=184,initial_time=20, forecast_number=5, ensemble_member_number=10; string_max=16; 5 variables: float latitude(latitude); float longitude(longitude); float level(level); double time(time); time:units=”days”; time:standard_name=”time”; time:long_name=”time elapsed since the start of the forecast” double initial_time(initial_time); initial_time:units=”days since 1900-01-01 00:00)0.0” ; initial_time:standard_name=”forecast_reference_time” int forecast_number(forecast_number); char originating_centre(forecast_number,string_max); char experiment_identifier(forecast_number,string_max); int forecast_system_version_number(forecast_number); int forecast_method_number(forecast_number); char production_status(forecast_number,string_max); char model_identifier(forecast_number,string_max); char original_distributor(forecast_number,string_max); char sst_specification(forecast_number,string_max); (A) (A) (A) (A) (B) (B) (B) (B) int ensemble_member_number(ensemble_member_number); float field(forecast_number,initial_time,ensemble_member_number, time,level,latitude,longitude) char real_time(forecast_number,initial_time, ensemble_member_number,1) (T or F) char archive_date(forecast_number,initial_time, ensemble_member_number,string_max) (C) (C) Here we have chosen to code all of the metadata in the form of variables rather than either global attributes (which are a file-based concept, and would restrict which data could be served in a single file) or attributes of variables (which only works if the attribute has a single value for all the relevant data in the file). The choice to use variables fits with the philosophy of CF, and allows more flexibility when used with appropriate applications, but can make datasets a little more awkward to use with those applications which do not like multi-dimensional datasets. Note our use of the standard names “time” and “forecast_reference_time” and associated time units to define the two time axes of a multi-dimensional dataset. We believe this is an appropriate way to code forecast data in CF compliant netCDF, even though our ‘time’ units do not match the specification used for the IPCC data. The “forecast_number” dimension is here given its own forecast_number variable, to make explicit the fact that it is the defining variable for the data within a given file, and that the other similarly-dimensioned variables (originating_centre through to sst_specification) are auxiliary variables within the meaning of the CF convention. The forecast_number is essentially a dummy variable that simply indexes the 6 forecast-defining ‘tuples’ within the file. Possibly it could be omitted from the netCDF file. The “ensemble_member_number” is the other independent dimension within the netCDF file. If a multi-model ensemble is coded in a single file, and the ensemble sizes vary between the models, then the netCDF file will be larger in size than is strictly necessary to code the data, because it will reserve space for the same ensemble size for all the forecast_number models. This is considered a tolerable state of affairs. If instead of dimensioning the data with forecast_number, we used multiple dimensions consisting of each of the defining metadata, then we could easily end up with very large sparse files when coding multi-model data. The defining metadata ‘tuples’ are not natural hypercubes. The auxiliary variables labelled (A) are the defining metadata, while those labelled (B) contain comment information. There is an issue as to how to code the comment metadata which is valid for each forecast integration separately (ie whether it was made in real time, and the date-stamp of the data). Here we simply supply it in the form of appropriately dimensioned variables real_time and archive_date. There is nothing in the netCDF file to say that these are metadata describing the field variable. However, it is just possible that these variables could be considered ancillary variables to the field, within the meaning of the CF convention, despite the difference of dimension. In this case, we could add the ‘ancillary_variables’ attribute to the field variable, to make the link explicit. Note that the above proposal is CF compliant, in that it does not introduce any new standard_name attributes for variables. However, if we want to standardize the usage that we propose here, such that application software can unambiguously interpret data files that follow our proposal, then it would be desirable to ask for the CF convention to be extended to cover our usage. Such a request might be made separately for the simple concept of “ensemble_member_number” (necessary for any sort of ensemble forecast to be represented, and presumably not controversial as a concept) and the more complex forecast_number “tuple” needed to represent multi-model forecast data. If CF approval were to be given, the above layout of variables would be unchanged, but they would each be given standard_name attributes to allow unambiguous processing of the data. In the absence of CF approval, we could ask that equivalent “TFSP_standard_name” attributes be set in the netCDF file instead. Examples of forecast defining “tuplets” and associated comment metadata: originating_centre: COLA experiment_identifier: expt_id (as assigned by COLA) forecast_system_version_number: 5 (as assigned by COLA) method_number: 1 production_status: research model_identifier: CFS original_distributor: COLA sst_specification: coupled originating_centre: IRI experiment_identifier: 1 (recommended convention for operational systems; numbers >1 for testing, non-numbers for research etc) 7 forecast_system_version_number: 1, 2 and 3 (for CCM3.2, ECHAM3.6, MRF9, as would be documented by IRI on their website) method_number: 1 production_status: operational model_identifier: CCM3.2, ECHAM3.6, MRF9 original_distributor: IRI sst_specification: predicted For operational centres, each forecast model system is assigned a unique ‘version’ number (for example, in the order in which the models were introduced) and new models and/or model versions get a new system version number. (Should we recommend that an older forecast system will always have a smaller version number than a newer one?) In research mode, however, typically the experiment_identifier will be used to distinguish experiments with different forecast systems. If a research user wants to use the system_version_number to distinguish between experiments with different models and/or major new versions of a model, that is fine, but it is not mandatory. originating_centre: Met Office experiment_identifier: 1 forecast_system_version_number: 3 method_number: 1 production_status: operational model_identifier: HADCM3 original_distributor: ECMWF sst_specification: coupled For operational centres, it is recommended that a new system number should be given whenever changes are sufficient to result in a new set of back integrations. For example, switching to a new source of SST data in a real-time forecast system would not trigger a new system number if the original back integrations continue to be used. Even changes such as this should be documented on the centre’s web page, though. originating_centre: ECMWF experiment_identifier: common_ENSEMBLES_expt_id_1 forecast_system_version_number: 1 or 2 method_number: 1 production_status: ENSEMBLES model_identifier: IFS/HOPE or IFS/OPA original_distributor: ECMWF sst_specification: coupled originating_centre: Met Office experiment_identifier: common_ENSEMBLES_expt_id_1 forecast_system_version_number: 1 or 2 method_number: 1-9 (for a 9 member ensemble with perturbed parameters) production_status: ENSEMBLES model_identifier: HADCM3 or HADGEM original_distributor: ECMWF sst_specification: coupled 8 For coordinated experiments such as ENSEMBLES, common experiment identifiers might be agreed. Since the ‘production_status’ is not part of the defining metadata in this schema, a small amount of care is needed to ensure that the experiment_identifier does not end up overwriting other operational or research data. 3. Recommended file structure for data exchange A convention on how seasonal forecast data should be described within a netCDF file is still insufficient to describe the format in which data should be made available for exchange. This is because large datasets can be placed into netCDF files in many ways. In principle, well written software should be able to extract the information from any set of conforming netCDF files, and place it in a standard archive file format of the receiving cetnre’s choosing. However, it will make life easier for us all if we agree a recommended file structure for data exchange. The files would be suitable for archive without further processing (although if someone wants to store the data differently, of course they are free to do so. If the data are served via OPeNDAP, then the file structure is dictated in large part by the request. In this case, it is sufficient to ensure that files of the recommended structure can be served by the OPeNDAP server. Proposal: The seasonal forecast data should be provided in files, each of which should contain data from a single forecast (ie a single originating centre, a single experiment identifier, a single forecast system version, and a single method), a single initial date, a single ensemble member, and a single field. To simplify and quicken the handling of files, the file names should encode the defining metadata in the following way: ORIGIN_EXPVER_SYSVER_METHOD.YYYYMMDD.NNNN.field.nc where ORIGIN, EXPVER, SYSVER and METHOD are 6 character strings, YYYYMMDD is the initial data, NNNN is the ensemble number and ‘field’ represents the physical variable archived in the file. [details TBD]. 4. Additional recommendations. The experience of PCMDI in handling the latest IPCC data is worth considering. As well as specifying the metadata content, a realization of the metadata in CF compliant netCDF and rules on how the data were to be put into files, they also provided some additional specifications. (See http://wwwpcmdi.lnl.gov/ipcc/IPCC_output_requirements.htm for their full specification). Do we want to follow any of these, or make our own equivalent rules? - Files no more than 2Gb in size. - Data must be gridded using product of two Cartesian axes. - Atmosphere data must be on standard pressure levels (exception for cloud) - Ocean fields must be on depth levels, recommended to use standard depths - Output fields to be single precision floating point - Should variable names in the netCDF files bear the same relationship to the CF standard names as mandated in the IPCC tables? 9 - Are the IPCC requirements on sign conventions and order of array dimensions already covered by the CF standard? - Double treatment of missing_value and fill_value, to help old software (or do we just stick with the up-to-date CF way of doing things?) - (Recommended) original_name attribute - (Recommended) variable-specific history attribute - (Recommended) original_units, long_name and comment attributes Coordinate variables: - Must be specified in double precision - time unit to be “days since [basetime]”, where [basetime] is user supplied Global attributes: - institution: both abbreviation and full name and location - source: specific instructions in building a long string - project_id: “IPCC fourth assessment” - realization: (an integer, specifiying which ensemble member) -experiment_id: one of a set of specified strings, corresponding to the project - (Recommended) contact: name and contact info eg email or telephone number 5. OPeNDAP The details of OPeNDAP implementation may need to be discussed. It is desirable to provide aggregation server functionality, so that a data request can be met by extracting data from multiple files - for example, all the ensemble members from a given forecast start date, or a multi-model dataset from a single start date. To do this, the aggregation server will have to be configured to handle the multi-dimensional data structure proposed above, and serve data in this form. In general, it is desirable to archive the data on their native grids. However, it is also desirable that the data can be served on specified regular grids, to allow easy intercomparison and multi-model calculations to be made. For typical atmosphere model grids, it may be feasible to obtain sub-area extraction and regridding to userspecified lat/long grids from existing software. For ocean grids, this is likely to be more difficult. We may need to recommend that ocean data are supplied on regular lat-long grids. There is perhaps an issue of data security and integrity associated with distributed data systems. We will need to ensure that one centre is not able to accidentally “overwrite” data from another centre, by releasing data with the wrong metadata. How do we ensure that something purporting to be an ECMWF forecast is not in fact a forgery? 6. Data volumes To be estimated. The data volumes for the full output specified in Trieste are quite large. Although there is undoubted benefit in high frequency data for at least some model runs (to allow study of high frequency processes) there is unlikely to be much demand for full temporal resolution data for all ensemble members and all start dates for all experiments. 10 7. Available capacity To be assessed. It is important to make progress on this, to ensure that as a minimum all data producers in TFSP have somewhere they can send their data to. 8. Quality control procedures Every data serving centre should have data quality control procedures in place. If data is acquired from elsewhere, ideally there should be some form of sanity checking / human visual inspection to check that the data look OK (ie not zeros or garbage, at least). There should be an email address or other problem-reporting mechanism, so that users of the data can report any problems. Possibly this could be provided in association with the original_distributor metadata. There should be some mechanism (as a minimum a web-viewable page of ‘incidents’) for reporting to users any known problems or failures. 9. Relationship to other data strategies a. Operational Met. services, THORPEX/TIGGE and the WMO Information System. Operational Met. services exchange model data in GRIB, and the advent of GRIB edition 2 increases the ability of the GRIB standard to handle data in ways that TFSP may require. Nonetheless, there is a strong consensus within CLIVAR that netCDF is a much preferred format when to comes to ease of use and acceptance by the research community. Since it is this research community that will be creating and in particular analysing the data from TFSP, it is clear that the initial data exchange should be in netCDF. The advent of TIGGE (THORPEX Interactive Grand Global Ensemble), an experimental real-time multi-model medium range forecasting system being created by the major NWP centres around the world, has given operational centres a need to exchange multi-model forecast data. They have responded to this by setting up a data committee, and will implement an initial system which exchanges data in GRIB2, and has three global centres which will provide a parallel archive of the data (NCAR, ECMWF and CMA). Data exchange will either be with ftp or more advanced software such as IDD/LDM. Although there are important differences between the TIGGE data exchange and that envisioned for TFSP (real-time, large volume exchanges vs delayed mode transfer; GRIB2 vs netCDF, operational vs. research community), there is the opportunity to coordinate certain aspects in a way that facilitates both present-day interoperability and possible future convergence of operational and research data systems. A key part of the design of both systems is the required metadata content. If an explicit 11 isomorphism could be made between the TFSP proposal and the developing operational application of GRIB2 to multi-model forecasting, this would substantially aid automatic translation between the formats used, and make it easy to use common underlying data systems for both projects. On the TFSP side, such a solution would maintain the advantages of netCDF for the research community (familiarity in the community; human readability, CF compliance), while introducing some advantages of WMO operational codes (in particular, unambiguous processing and archiving). In fact, a complete isomorphism does not appear to be easily achieved in the short term. The defining metadata needed by TFSP go beyond that proposed by TIGGE, and TIGGE does not appear to envisage the useful ‘comment’ metadata which we discuss here. There are also issues with finding a flexible but unambiguous method of identifying research originating centres. Nonetheless, the proposal outlined here already attempts to harmonize some of the metadata language, and since both TFSP and TIGGE envisage future evolution of their data systems, there is certainly scope for cooperation. TFSP and other elements of WCRP will strive to collaborate with TIGGE and the operational WMO community. The XIV WMO Congress in 2003 mandated the creation of a new WMO Information System, which will provide “a single coordinated global infrastructure for the collection and sharing of information in support of all WMO and related international programmes”. The FWIS (as it is generally known) is still largely a concept, although work on prototype systems has started in Europe. The aim appears to be for a system that can handle various data formats (including netCDF) and use various networks, including the internet. The FWIS is no help to us at the moment, but at some point in the future may provide powerful tools for exchanging and archiving data for WCRP projects. Our initial strategy is to use existing tools (OPeNDAP, possibly ftp) to transfer and serve the model data, because of the need to have something working immediately. b. The academic and research community (WMP, WGCM, PCMDI, ….) The academic and research community are almost universally at home with using netCDF data, hence the strong requirement to make data available in netCDF. Beyond the CF standard, there is not a strong tradition of experiment-defining metadata. The additional dimensionality of the data can causes awkwardness with some netCDF analysis applications. Nonetheless, it is expected that the proposed metadata will not cause any major problems. Regardless of the data format, software will need to be further developed to fully analyze multi-model ensemble forecasts. WMP (WCRP Modelling Panel), at its first meeting in October 2005, commissioned a white paper on data management issues within COPES (of which TFSP is a part). This has been produced, and provides a suggested outline for the data systems that will be needed by 2015. As it happens, their vision is very compatible with our proposal here. In particular, they envisage distributed data systems (with provision for groups who don’t want to serve their own data), and stress the importance of metadata standards. They suggest that WMP should consider the adoption of standards and conventions that establish requirements for coordinated experiments, which is pretty much what we are providing here for TFSP. WGCM are also devoting effort to data issues, and have proposed that a new CF oversight panel should be set up under their auspices. TFSP needs to ensure that our 12 efforts are coordinated with those of WGCM and WMP, to build a proper pan-WCRP approach to data handling. PCMDI have much experience of handling large multi-institutional datasets for internationally coordinated experiments. In particular, at the request of WGCM, they have created an organised system for the handling of data from experiments for the IPCC fourth assessment report. Although we envisage a shared rather than centralized data service, and although the specifics of the metadata requirements for TFSP are somewhat different from those for IPCC, what we have outlined can be viewed as following in the footsteps of what PCMDI provided for the fourth assessment report. Points for discussion We need to decide on maximum string lengths for the metadata. Should they be imposed at all? (It makes file handling much easier if file names have a fixed format). Should they be relatively short (eg 6 chars, nicer file names) or relatively long (allowing data producers more freedom in choosing names). Do we allow strings containing blanks, and how do we then handle the file names? Is the proposed ‘tuplet’ for defining the forecasts adequate and appropriate? In the ECMWF internal archive something equivalent to ‘production_status’ is part of the defining metadata, and is not just descriptive. This appears to be unnecessary, but it would give more flexibility in the assignment of experiment identifiers. Is the strategy of minimizing the number of ‘tuplet’ components a good one? We need to have good provision for those who will not serve their own data, and identification of willing ‘hosts’ is an early priority. References: NCAR CF pages: http://www.cgd.ucar.edu/cms/eaton/cf-metadata/index.html http://www.cgd.ucar.edu/cms/eaton/cf-metadata/conformance-req.html BADC pages on CF, with discussion and examples: http://badc.nerc.ac.uk/help/formats/netcdf/index_cf.html http://badc.nerc.ac.uk/help/formats/netcdf/cf_examples.html White paper on future of CF: http://www.cgd.ucar.edu/cms/eaton/cfmetadata/CF2_Whitepaper_PublicDraft01.pdf PCMDI pages on IPCC data handling: http://www-pcmdi.lnl.gov/ipcc/about_ipcc.php http://www-pcmdi.lnl.gov/ipcc/IPCC_output_requirements.htm WMP report on data issues within WCRP and COPES: http://copes.ipsl.jussieu.fr/Organization/COPESStructure/Reports/WMP1 /Report_Kinter_TaylorReport.pdf 13

Task Force on Seasonal Prediction

Related documents

Products

Support

Task Force on Seasonal Prediction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib