Geneva terminologyworkshop_report_draft-rev

advertisement
REPORT OF THE ECOINFORMATICS
TERMINOLOGY MEETING
DRAFT 1
10 May 2004
Sponsored by the United National Environment Programme
UN Environment House
Geneva, Switzerland
15-16 April 2004
Revised 13 May 2004
Table of Contents
Executive Summary
The purpose of the meeting was to bring together the major providers of terminologies to
discuss how new technologies are being applied and how these valuable resources can be
integrated using the Web as a platform for sharing. The United Nations Environment
Programme (UNEP), the United States Environmental Protection Agency (USEPA), the
European Environment Agency (EEA), and the United States Geological Survey (USGS)
are part of the Ecoinformatics Initiative core group. With the Ecoinformatics Technology
Working Group, UNEP has taken the lead on terminology.
The meeting was attended by ____ organizations. Four types of organizations attended -governments, intergovernmental organizations (especially UN agencies), scientific
institutions, and corporations and vendors. There were terminology developers (experts
on content and structure), IT professionals interested in putting terminologies on the
Web, and those interested in multilinguality.
During the first day, the participants presented brief descriptions of terminology
resources and their interest in providing these resources via the Web. The challenges and
opportunities related to multilinguality and different user groups were discussed during
general sessions.
The participants expressed a great interest in standards, best practices and general tools
such as web services and UDDI registries that could further the applicability of
knowledge organization systems on the Web. An example of how two knowledge
organization systems could be virtually integrated was presented during the meeting. The
SWAD-E work on the SKOS Core 1.0 was also presented. The group agreed to
investigate several terminology projects related to web services, to create a listserv to
promote better sharing of information among those in attendance, and ….
WELCOME AND INTRODUCTION
Mr. Frits Schlingemann, Director of the Regional Office for Europe, welcomed the
group. He briefly described the work of the 200 people who work in this building,
including the four convention secretariats.
The purpose of the meeting was to bring together the major providers of terminologies to
discuss how new technologies are being applied and how these valuable resources can be
integrated using the Web as a platform for sharing. The United Nations Environment
Programme (UNEP), the United States Environmental Protection Agency (USEPA), the
European Environment Agency (EEA), and the United States Geological Survey (USGS)
are part of the Ecoinformatics core group. UNEP has taken the lead on the terminology
side. It is important not to work in isolation; collaboration is needed with agreed upon
roles. The goal of the meeting was to identify specific projects with which to move
forward.
This was the first UNEP meeting on terminology since the InfoTerra meeting in Madrid
in the early 1990s. Four types of organizations attended -- governments,
intergovernmental organizations (especially UN agencies), scientific institutions, and
corporations and vendors. There were thesaurus developers (experts on content and
structure), IT professionals interested in putting terminologies on the Web, and those
interested in multilinguality.
The first day of the meeting was an opportunity for information exchange. The second
day was a more forward looking strategic day. During that time the group looked at
overlapping interests and formulated specific projects.
Introduction to Ecoinformatics (Bruce Bargemeyer, USEPA and Stefan Jensen,
EEA)
Ecoinformatics is the application of information science and information technology to
the environment. Informatics, especially bioinformatics and healthcare informatics, are
well established, but ecoinformatics is relatively new.
The Ecoinformatics Initiative leverages the resources of organizations with major health
and environmental programs in their missions, including the EEA, USEPA and several
US government agencies, plus international organizations such as the UNEP, the JRC,
and GBIF. The cooperation draws together and demonstrates principles, techniques,
standards and technologies that can form an ecoinformatics marketplace. This is set in
the larger cooperative framework of several priority areas including children’s health and
environmental indicators.
The Ecoinformatics Terminology Working Group intends to cooperate in deploying
information systems internationally, share the costs and benefits of the development and
publicize the results. Major projects include sharing experiences and results; fostering an
ecoinformatics marketplace; cooperating on emerging technologies, and developing key
elements for interoperability including terminology. Technologies that might be relevant
include computer grids, metadata registries, the semantic web, XML, terminology
systems, and ontology agents (from the JRC and the 5th framework).
Suddenly the Internet cares about semantics and meaning. Terminology is of great
interest and new technologies care about semantics. Data standards are really driven
from terminology, since they depend on .access to vocabulary, terminology and standard
metadata. How can we point to these resources in new and meaningful ways especially
on the web?
Terminology is a major area of cooperation. EEA and EPA are utilizing terminologies
and various members of the Ecoinformatics Initiative have developed terminology
resources. The Cooperative is looking to the broader group for all kinds of resources.
SPIRE is a US National Science Foundation semantic prototype in research
ecoinformatics based on semantic web technologies in which the USGS and USEPA are
involved.
The Ecoinformatics Group has several educational, promotional and technology transfer
opportunities planned, including a session at the EPA Science Forum in Washington DC,
sessions at the EnviroInfo2004, and involvement in an Ecoinformatics “Birds of a
Feather” discussion at Digital Government 2004. The Ecoinformatics Initiative web page
was developed and is maintained by EEA; it provides a single location for the documents
and news about the group. Through this meeting, the group hopes to further its
objectives, find additional areas for cooperation and gain new participants and projects
where possible.
Types and Varieties of Terminology Resources (Gail Hodge, USGS/NBII)
There are a wide variety of terminology resources, or knowledge organization systems
(KOS). The term knowledge organization system encompasses all types of schemes for
organizing information, performing collection management, promoting discovery and
retrieval, and promoting knowledge sharing and management [Hodge, 2000]. (A
taxonomy of KOSs has been developed by Hill and Hodge [2000].) KOSs range from
term lists such as authority files and glossaries to those that express complex term
relationships like ontologies, semantic networks and topic maps. KOSs can be
characterized by subject, by structure (the number of terms, the breadth (or the number of
top terms), the depth of the hierarchies and the types of relationships expressed), and by
function (database indexing, cataloging or metadata creation, shelving, etc.) These
characteristics may be useful in describing KOSs so that they can be utilized more
automatically on the Web. The full spectrum of KOSs should be considered during the
conference discussions.
Introduction to Web Services (Tim Lynch, USDA)
The term “Web services” is given to the networking technologies that allow web servers
or web sites to communicate with one another, computer to computer. Web services are
supported by a host of technologies including XML, SOAP, and RSS. More can be
found at the W3C site at www.w3.org.
Web services are used when purchasing something online. During the purchasing
process, the web site (via the submit button) processes your credit card and then if the
proper information is returned it processes the order. The activity behind the scenes is
actually quite complex, perhaps involving a bid process if there are multiple vendors as
with some discount sites. Other examples exist in government, education and scientific
research. With a web service, one organization can develop an application in its favorite
language and technology, but still communicate almost seamlessly with another.
The key attribute of a web service is that it is reliable, secure, scaleable, and works across
platforms. These attributes also describe the infrastructure that environmental
organizations need in order to communicate with one another.
A terminology web service would involve two web sites communicating about
terminologies. For example, one site provides a multilingual vocabulary and one
provides environmental information in only one language. How can the second site offer
multilingual searching? Historically, this has been done by installing the thesaurus at the
second web site. However, if the thesaurus is being updated frequently, it requires an
update and synchronization of the downloaded thesaurus. With web services this problem
is eliminated
OVERVIEW OF EXISTING ENVIRONMENTAL
THESAURUS/TERMINOLOGY INITIATIVES
Each terminology developer presented a brief description of the resource and plans for
the future. More details are provided in Appendix D.
Environmental Applications Reference Thesaurus - EARth (Paolo Plini, CNR)
Thesauri, which were primarily developed in library environments, may not satisfy all the
requirements of the new information context. CNR’s vision is to develop a new
thesaurus model for the environmental domain that combines a stable logical and
conceptual base with flexibility to support different applications. It represents a semantic
mapping of the environmental domain and different cultural dimensions to ensure
applicability for users with different levels of expertise and portability to different
technology applications.
The EARTh architecture is based on categories with a faceted structure. The
classification includes entities, attributes, dynamic aspects (such as processes, conditions,
activities), and dimensions (space and time). Node labels help in the location of terms in
the system. Thematic vocabularies will be developed for applications by combining the
classification scheme and the structure, allowing users to create their own terminology
systems. The expansion of relationship types has already begun with a set of subrelationships.
CNR is on the verge of providing the data from the thesaurus as a web service to support
organizations that would like to include thesaurus facilities but which have no resources
to produce their own thesaurus environments.
UDK Thesaurus and UMTHES (Wolf-Dieter Batschi, UBA Germany)
The UDK and UMTHES are the same thesaurus. UMTHES is used in several databases
(ULIDAT, UFORDAT, the German Environmental Information Network (GEIN), etc.)
in environmental catalogues, and in new applications such as text analysis software. It
contains over 8,900 preferred terms, non-descriptors and geographic terms in German
and English with a poly-hierarchical structure. Definitions, scope notes and a
classification scheme are provided. In addition to the standard BTs, NTs, RTs, and
synonyms, additional structures have been added to support topic maps.
TED - Thesaurus on Emergencies and Disasters (Rudolf Legat, UBA Austria)
TED is an integrated Internet portal funded by the Austrian Federal Ministry for
Education, Science and Culture for emergency and civil protection agencies, candidate
countries, local and regional authorities and UNEP.net. The objective is to bridge the gap
between different sources and users of the international and national content selected for
the information system.
The motivation for developing this thesaurus is the lack of terminological knowledge in
the field of disaster information. Disasters go beyond geographic and political
boundaries. The basis for TED terms are existing thesauri and terminologies in a variety
of related areas. While English is a key language, it isn’t sufficient in emergency
situations; and the extension of the EU makes the multilinguality even more important.
(The source for the German translation was GEMET.) There is an open invitation for
countries and organizations to do translation.
National Biological Information Infrastructure (NBII) (Lisa Zolly)
The NBII is a virtual network of information providers. Currently there are 12 nodes that
provide access to data, information or tools for natural resource managers. The nodes
have expertise in a particular biological discipline or geographic region of the United
States.
The NBII has been active in thesaurus development since its beginning when it worked
with the California Resources Evaluation System. However, when a broader thesaurus
was needed, the NBII partnered with Cambridge Scientific Abstracts to develop the
Biocomplexity Thesaurus. CSA integrated five thesauri and added additional terms
required for biodiversity creating a thesaurus of 10,000 terms with standard relationships.
Coverage includes fisheries and aquatic sciences, life sciences, social sciences and
ecotourism. There are proposals to expand the thesaurus to include fire ecology, fire
management, forestry and forest management. The thesaurus is proprietary, but it can be
used for certain non-profit uses. For example, the SPIRE semantic web project will use
the Biocomplexity Thesaurus.
The Biocomplexity Thesaurus is used for any resource provided through the NBII
metadata clearinghouse and for open GIS and web resources. In the next few weeks,
hierarchical and alphabetical views of the thesaurus will be published in addition to the
current browse. Web services are being developed to provide thesaurus look up tools on
the desktop and the development of mini thesauri on the fly. The hope is to provide users
with more precise searching and more relationships between terms.
CAB Thesaurus (James Brooks, CAB International)
CAB is an international, intergovernmental not-for-profit organization of 40 member
governments. The CABI thesaurus for the Applied Life Sciences was originally
developed for Agriculture but it has expanded to cover aquaculture, food science, water
resources management and invasive species. It contains about 47,000 descriptors and
12,000 entry terms in English, Spanish, Portuguese and Dutch.
Systematic revision of certain areas is underway, including the addition of common
names for bacteria and viruses, soil types and geographic instances. In addition, a high
level ontology will be added. Other developments include greater granularity of
relationship types (partitive, generic and instantive, hierarchical, etc.). A contextual role
may also be used.
CABI is building a new thesaurus management system to support XML output. They are
also looking at the SKOS RDF-S format. CABI is participating with multiple
stakeholders, including the FAO Agricultural Ontology Service and the Global Forestry
Information System. Research projects include text mining with several British projects
organizations and some commercial partnerships with publishers and other content
partners.
In terms of the work of this group, economic and management issues are important
considerations. Do we have the right data models and what infrastructure is required?
One which is demand led or user driven? Doesn’t think that the work of the British or the
International standards organizations will be broad enough for what we need for
interoperability
Integrated Taxonomic Information System – ITIS (Janet Gomon, Smithsonian
Institution, USA)
ITIS is a partnership of US federal agencies. It provides an authoritative scientific
opinion for the names of organisms and their placement covering four languages
(English, Spanish, French and Portuguese). ITIS can serve as an authority file for
environmental databases that involve species. It is structured both as a thesaurus and a
taxonomy, with parent-child links and synonyms linked to the scientific names.
Published references for current usage, author and date of the original publication of the
name, jurisdiction (native or introduced), and some data quality indicators are included.
The name records come from three sources, the original NODOC file which preceded
ITIS, new names from taxonomic experts who provide lists or taxonomic treatments of
species, and users of IT IS who send names for inclusion. Smithsonian staff enter the
data working with taxonomic experts on issues of placement and current usage. ITIS has
agreements with GBIF to provide names to the Catalog of Life Project. XML output is
available, but there is currently no support for web services.
SNS – The Semantic Network Service (Maria Ruether, UBA-Germany & Thomas
Bandholtz, SchlumbergerSema)
The SNS is a shared vocabulary of 39,000 environmental terms in German and English, a
gazetteer including intersections between nearly 50,000 geographic names in German and
administrative names like national parks and catchment areas. SNS has also developed a
chronology of approximately 600 historic and contemporary events that have impacted
the environment which are classified by terms such as conferences, marine disasters, etc.
In order to integrate this work into a topic map, SNS developed a semantic model. The
original structure of the components is retained with a topic type descriptor and
relationships between the terms described as associative relations. To develop the
semantic network SNS established relationships between the events, locations and
thesaurus terms to describe what happened. SNS offers three services – findTopics,
autoClassify and getPSI. The Developer Toolkit and documentation is available from the
SNS web site.
Environmental Multilingual Thesaurus of the Spanish Environment Ministry
(Carmen Casal Fornos & Arantza López de Sosoaga, MMA)
Since 1981-1982 they have been developing the glossary and multilingual thesaurus of
territorial planning, urban development and the environment. A total revision of the
terms is underway along with translations into other official Spanish languages. The
main characteristic is that it is a real documentation tool based on actual usage in
indexing and retrieval in libraries and document centers. There are several main subject
categories (microthesauri). Hierarchical and alphabetical lists are available for each
language.
The Institute of Environment Portugal is developing an environmental thesaurus which
will be turned over to the national library for integration with a larger thesaurus. The goal
is to harmonise across all sectors.
Terminology Co-ordination and Harmonisation in the EU in the light of the IATE
Project (Dieter Rummel, EU )
There are several translation services of the EU, including those for the Parliament,
Council and Commission. Many of these resources are online either on the Web or
internal to the organization. Other organizations have glossaries in Word or on card files
without any systematic access. The drawback to this situation is that there is no single
point of access where a user can retrieve up-to-date information for all institutions. There
is limited cooperation among the language services, which means users must look in
multiple places, both locally and internationally. It is hard to cooperate and reduce
duplicative effort.
The Centre was set up in the 1980s when there was limited access to these resources. A
feasibility study conducted in 1999 recommended incorporation of existing terminology
databases into a single new institutional database with interactive data entry by linguists,
integration of terminology into office automation tools, and development of a
collaborative structure for data management.
The InterAgency (soon to be renamed Interactive) Terminology Exchange (IATE) will be
operational later this year. The current version has 1.4 million concepts and 7 million
terms. Problems include detecting and dealing with perfect duplicate entries and nearperfect entries and harmonizing the use of standard values for certain fields such as
spelling variants.
Validating the quality in such a distributed system is also an issue. The system keeps
information about “the author” of the term, his expertise, language, etc. Based on these
factors it will allow users to validate data. Multiple validation stages may exist. They
have tried to keep this validation process open so that a variety of organizations can
participate. A special e-mail and internal messaging system are used to communicate
within the groups. The workflow is based on Oracle triggers.
One of the other goals of IATE is to improve the inter-institutional management of the
terminologies. Institutional working groups on user testing and data management have
been established.
Multilingual Thesaurus for the GeoSciences (Jan Jellema, TNO-NITG, Netherlands)
Geology and the environment are becoming more interrelated and in the future the
organizations working in these disciplines will need to cooperate more in terminology as
well. Geology is made up of numerous disciplines, just like the environment, which
means that homographs must be resolved. The tools required to provide robust
terminology use and management are the same in the geosciences as in the environment
even though the content is different.
The current GeoSciences Thesaurus was developed in 1972. It involves about 12
institutes worldwide and is expanding. The thesaurus contains approximately 10,000
terms and 6000 descriptors in 11 languages. The hydrology and building materials
sections have definitions. The first trials of displaying Cyrillic and Chinese characters on
the web have gone well.
There is a desire to improve the thesaurus to allow for repurposing and to give it wider
importance. Plans include RDF and XML implementations, more discipline oriented
user interfaces, distributed entry by members of the community with institutes
performing validation and integration with Internet searching and knowledge
management.
GEMET and EEA Glossary (Stefan Jensen, EEA)
EEA began development of GEMET (General Multilingual Environment Thesaurus) in
1996 through a consortium of partners. When the Topic Centre closed in 2001, GEMET
was put on hold, but there were various institutions still working on it. It is available in
90 languages, with broad coverage including administration, social sciences and
legislation. GEMET also has aspects of a dictionary, since there are definitions for
approximately 80% of the terms.
GEMET has been distributed as a standalone PC tool with a browser to more than 500
users. However, it is difficult to determine what people are doing with GEMET. A
questionnaire was distributed in 2001, but the return rate was only about 10%.
The EEA Glossary is available in 24 European languages. It has about 1100 terms that
occur on the EEA web site. About 50% of the terms in the glossary overlap with
GEMET. A multilingual web site will be available by the end of the month with content
in all the languages.
USEPA Terminology Reference Services (Larry Fitzwater, US Environmental
Protection Agency)
Terminology is key to the successful use and reuse of EPA data and other resources.
EPA did not have standard ways of naming elements and there were multiple definitions
across EPA, the federal government and the state partners, making it difficult to share and
use data, particularly from legacy systems.
The Terminology Reference System (TRS) collects terms, definitions, and the context,
along with a mapping to the legislation that created the definitions. It is based in part on
GEMET. The USEPA is interested in a glossary, dictionary or lexical system rather than
a structured set of terms.
Having created this system, EPA realized that it didn’t have a good set of terms for
cataloging, categorizing or sorting. There were three main terminology sources –a threelevel hierarchy used by the EPA Libraries for cataloging, a set of terms to catalog web
pages and another to categorize datasets. Unfortunately, none of these resources had
definitions. USEPA has embarked on a project with the states to create data standards in
order to exchange information across geography and platforms. The state group has
agreed to develop a standard scheme for the EPA for different materials and across states.
In September EPA and the states will go through a political process to finalize the work.
By adding this multi-level system, the TRS will need to become a more structured
system.
The Substance Registry System has over 73,000 chemicals that are regulated or tracked
with CAS numbers, weights, structures and appropriate legislation. Other registries
include Business Objects and Facilities.
EPA is in the process of building a web service to submit a CAS number to EPA
resources. EPA is still in the process of putting these services on the web because of
technological issues.
UN Initiatives
Food and Agriculture Organization – Agrovoc Thesaurus (Margherita Sini, FAO,
Rome)
FAO has several terminology resources including Agrovoc which covers agriculture
including forestry, fishers, nutrition, etc. in nine languages; FAOTERM which covers
agriculture, biology, forestry, fisheries, economics, etc. in six languages; and
AGRIS/CARIS, which is a categorization scheme of 130 categories at two levels in five
languages. The latter has been mapped to Agrovoc.
There are currently three ontology projects being developed to give more meaning and
semantics to their web-based systems. The Fishery Ontology (with CNR), Food Safety,
and, FNA (a bibliographic metadata ontology). The goal is to make ASFA, OneFish,
Agrovoc and other terminologies interchangeable in order to have a global system and to
allow searching across the four systems. A web site is available to demonstrate a search
by synonyms, query expansion and term disambiguation. Multilingual support is
included. A portal will be launched online soon to give semantic probabilities to the user.
The Agriculture Ontology System (AOS) Project is aimed at creating a federation of
information providers in the agricultural community with defined responsibilities and
roles to develop a common methodology and interoperable KOS standards. FAO has
collaborated on several terminology-related projects including a paper in the Journal of
Digital Information (JODI), the SEMKOS project as part of the EU 6th Framework, and
the organization of seminars and workshops in this area.
IAEA - The INIS/ETDE Joint Thesaurus at the International Nuclear Information
System (Yves Turgeon, IAEA, Vienna)
INIS was created in 1969 as a cooperative system with 100 member states and 19
international organizations interested in the peaceful uses of nuclear energy. Its main
product is a bibliographic database. The environment is an underlying theme in many of
the subject areas; perhaps 14-20% of the material is related to the environment.
Both the system’s categories and INIS/ETDE Joint Thesauurs are maintained in
collaboration with the Energy Technology Data Exchange. Standard thesaurus
relationships are used. INIS Dictionaries are available in English, French, German,
Russian, and Chinese. Arabic is under development.
Current developments include computer-assisted indexing and synonyms that work in the
background during searches. The INIS Taxonomy is being revised based on a
hierarchical category scheme because of the requirements for computer-assisted indexing.
The first version is expected at the end of the year.
UNEP - ENVOC Thesaurus (Gerard Cunningham, UNEP, Nairobi)
The UNEP Thesaurus dates back to a glossary created in 1977. The Infoterra database
was developed and then renamed to ENVOC in 1997. The target audiences are
librarians, documentalists, database developers, and environmental information
practitioners, primarily in developing countries. It is now available in 23 languages.
ENVOC is a three-tiered hierarchy with the base terms grouped under a main category
and subcategories. Content coverage is very general with highly aggregated terms.
UNEP is planning to revise ENVOC this year by eliminating the numerical codes, adding
definitions and solving the problems created by a categorized list of terms.
UNEP is looking toward a global thesaurus but with a series of microthesauri. One
would be a microthesaurus of macro, highly generalized terms. Others would be thesauri
of micro terms for the specialist in particular areas. (The latter may be important to the
use of terminology by the general public in developed countries where a regular
thesaurus may be too complicated.) Good language coverage is required because it tends
to break down the barriers to information sharing.
UNESCO Thesaurus (Liane Barsony, UNESCO, Paris)
The UNESCO Thesaurus is very multidisciplinary. It has seven large domains or microthesauri. Geographic groups are included in multiple groupings and this structure has
been adopted by the World Tourism Organization Thesaurus. The alphabetical index is
most often used by both indexers and end users. Each top term corresponds to a
UNESCO program, service or section in the administrative structure and the work
program. While it covers more than environmental subjects, there are over 1000 terms in
the environment including those related to culture and educational environments.
The UNESCO Thesaurus is used by the UNESCO library to catalog its holding and in
collective cataloging projects with field offices, by outside organizations such as the UK
Records Office, and in several integrated library management systems. The Albert
Intelligent Search Engine, developed by UNESCO, incorporates the thesaurus based
searching of descriptors and their synonyms. It is used to profile users to define subject
interests, and the UNESCO Knowledge Portal will incorporate the thesaurus.
JIAMCATT (Ali Benhadid, UNOG, Geneva)
This is a network of members and national institutions for the sharing of terminology
resources. JIAMCATT is a simple structure with an annual session and a Secretariat who
manages the web site. Each partner contributes what it can, giving the partners
responsibility to exchange expertise in the area of computer assisted technology and
exchange of vocabularies. Working groups exchange views and developments among
themselves and with software developers.
The JAIMCATT community is larger than just linguistic service. Through the web site
(a public part and members-only) access is provided to a common terminology repository
which consists of a one-stop multi-site search developed by the EC. Right now they are
not maintaining it so the links are broken. A file server is provided which archives the
partners’ terminologies. All resources are in a common place to search; though they are
investigating more distributed methods for the future. Other resources include a white
board for discussions, links to other linguistic resources, and a directory of experts.
JIAMCATT is also a coordination system. It proposes improvements and looks for
inconsistencies. Terminology resources for databases are exchanged and a mechanism is
provided for relaying information to decision makers at an annual meeting of the UN
where language areas meet to discuss conference language and publishing services.
WHO – CEHA Inter-Water Thesaurus and other WHO Sources for Health and
Environment Terminology (Mazen Malkawi, WHO/CEHA, Jordan )
The Regional Centre for Environmental Health Activities in Amman, Jordan has the
mandate to support the national program. The Centre supports the Eastern Mediterranean
Region. CEHANET is the regional environmental health information network which has
been in place since 1988 with the mandate to improve access to reliable relevant
environmental health information in the region. The approach is to develop information
systems and tools, produce regional and national information, develop human resources
and establish environmental health information centres.
The Water Thesaurus is not well known even by people in the organization. It is based
on the IRC Inter-water Thesaurus, with water supply and sanitation terms added from the
Netherlands. Materials are available in both Arabic and English. Regional and national
databases are published as well as documents. The thesaurus is primarily a printed
version because the UNESCO ISI package is being used. A mini-ISI electronic version is
available. There are several subject categories including water supply, water quality,
water use, etc. Equivalence, hierarchical and related relationships are included, along
with scope notes.
Thesaurus maintenance is of utmost importance but it is being done manually. There are
focal points in the countries who use a form for proposing changes. Other support tools
include a Subject Analysis Handbook, a procedure manual including ISO standards, and
training courses. Approximately 100 trainers and 300 users in 23 countries have been
trained. CEHA is looking for support and collaborators to finalize its environmental
health specialized dictionary which will be in Arabic. A global classification scheme for
the environment that is similar to MeSH would be very helpful.
USE CASES AND USER NEEDS
Towards a Web Service-Based Infrastructure: The Use of Environmental Thesauri
within European Commission Initiatives (Paul Smits, EU/JRC, Italy)
The DG Joint Research Centre (JRC) is part of the EC with the mission of providing
customer-driven scientific and technical support for the conception, development,
implementation and monitoring of EU policies. JRC functions as a reference centre for
the EU. It is close to the policy making process with seven institutes in five countries.
The Infrastructure for Spatial Information Europe (INSPIRE) has the goal of making
harmonized spatial data available for policy formulation and integrating spatial data
services based on a distributed network of databases linked by common standards and
protocols. INSPIRE complements other key information directives including those on
habitats, noise, and information policies such as copyright and data base protection.
INSPIRE is about interoperability – how users access local, national or European level
data resources.
INSPIRE is in the process of being proposed to the EC with possible implementation by
2008. The pre-implementation phase is now underway. INSPIRE participates in a
number of standards activities including the OpenGIS Consortium. However, standards
alone are not enough; guidelines are required for unambiguous implementation of a
standard. JRC is taking the lead in this area
Two use cases were presented. The first case involves registering a resource. A current
tool allows for selection of keywords from a thesaurus but the terms are from NASA.
Another possibility is the UDDI Registry process where you can select from a controlled
list to describe your web service. The use of UDDI registries should be exploited by the
environmental community. Use Case 2 involves the INSPIRE EU portal, which is based
on the metadata catalog. They would like to add thesauri to help guide the search.
Thesauri, gazetteers, and other KOSs are central to JRC’s architecture. They want to
ensure that efforts are not duplicated. Perhaps the ecoinformatics group could agree on a
multilingual topic map for the environment with no copyright restrictions. The
Ecoinformatic Initiative may want to contact the UDDI Consortium to make the topic
maps available for use within the UDDI web services registry. The group should also
agree on a protocol for web services, possibly based on SOAP, WSDL and UDDI. Other
possible activities include linking up with software and services providers such as the UN
GeoNetwork Initiative, encouraging producers of metadata tools to use thesauri rather
than free text input, and feeding requirements, including multilingual considerations, into
standards bodies in order to support the development of tools in standard ways. These
actions would build an infrastructure that could be used by everyone.
Discussion of Multilingual Considerations (Led by Gerard Cunningham, UNEP)
Based on a GEMET example, with definitions sources and multiple languages, the group
discussed the importance of multilinguality. One concept was a multilingual terminology
reservoir which would be a glossary. A web service could be built around a public site
that would provide access to such a glossary. These could be simple flat files with base
terms and definitions in English.
The InfoTerra partners could build a very simple system for distributing flat files of
whatever terminology is available and then engage national partners (particularly
governments and specialty institutes) to return their language equivalents of the terms
with definitions. Translating and vetting authorities would need to be established to
ensure quality. Multilingual resources such as GEMET can help to meet the needs of
developing countries because they want to see their language against other UN
languages. Environmental conventions tend to reflect a global environmental
terminology, which is also important.
It is important to put the impact of cultural differences on terminology as an open
discussion point. A glossary could be viewed as a distinct product that would have
multiple definitions to help identify specific cases of these cultural issues.
A two-track approach may be needed. There may be a larger thesaurus effort along with
a less structured effort that would provide multilingual resources for those languages that
don’t have the wherewithal to do a large thesaurus. If a more granular, structured
approach is necessary, there are a number of research issues. The Unified Medical
Language System at the US National Library of Medicine produced a semantic network
to which distinct KOSs could be mapped, but this is an extremely expensive project.
The Use of GEMET for the Swiss Environmental Catalogue (Jean-Philippe Richard,
GRID Geneva)
Envirocat is a partnership between SAFL, the Swiss agency for environment, forests and
landscape, and UNEP. There are two centralized metadata catalogues, Swiss and Alpine.
GEMET has been chosen as the thesaurus because they wanted to save the amount of
work invested in metadata collection and facilitate the importation of metadata entries
which had already been applied. The availability of terms in German, French, Italian and
English was also significant.
Only a subset of GEMET is used in order to improve system performance. The size of
the C-CDS terminology database is only 24,000 terms because of the restrictions to the
four languages. A simplified database model has been employed which uses the BTs and
NTs to create a relationship table. The data model retains the hierarchies and the
attribution to EEA themes contained in GEMET.
When indexing, GEMET terms are mandatory and the free text descriptors have been
eliminated. The indexer can choose a single language or multiple languages. GEMET is
also used in searching, including by topic.
Several needs based on these uses were identified: microthesauri to solve the problem of
GEMET’s granularity, adding or deleting terms to support local user needs, and
implementing a tool to analyze how people are searching and how authors are using the
thesaurus. It would also be beneficial to have a smaller thematic hierarchy. GEMET
themes might be linked to theme lists from the Conventions or those developed for local
uses.
Could a thesaurus core be developed and maintained through an international group to
which extensions for local needs could be added? Is it problematic to agree on a high
level framework or to agree at the granular level – where can we find a middle ground?
Do we have to agree or do we just have to show all the possibilities?
Applying GEMET – UNEP’s Perspective (Sean Khan, UNEP)
UNEP’s thesaurus applications include indexing and querying web-enabled database
systems and quick profiling of expertise and institutions. Multilingual searches have
offset translation costs. It is a way to get “a good enough” result, providing a good filter
for what should really be translated by humans.
Challenges include the multiple branch problem. For example, “biodiversity” gives you
two examples, one under science and the other under biosphere. This is an issue when
querying or displaying. A typical user will pick a single term. These broad top terms
often cause problems because users pick the wrong tree.
UNEP. Net is a framework consisting of two focused and distinct utilities: a discovery
mechanism and support tools that allow users to use and share this information. It
includes a simple user interface with a cartographic view, user defined themes, and filters
by data type and source. Different schema are allowed; UNEP.Net will have to tackle the
use of different thesauri particularly across different organizations.
Typical users are secretaries who may not know the context of what they are trying to
catalog. Several tools have been developed. One tool provides a list of all the related
terms, synonyms, BTs and NTs from which the user can select the term to be cataloged.
This has improved the indexing process. Users can also switch languages on the fly.
Discussion of Needs of Different Client Groups
The user needs are outlined in a report by the Austrian Federal Ministry of the
Environment [1996]. The concept of a megathesaurus was discussed. Is it possible to
devise a numbering or mapping scheme that would allow terms in one thesaurus to be
mapped to those in another? This would allow searching of a resource that has been
indexed using one thesaurus by using terms in another thesaurus. While this concept
works on a small set, the question is whether it will scale. A high level ontology and
some level of agreed upon facets might be an alternative to complete mappings. The
possibility of linking specific microthesauri back to appropriate places in the trees of a
more general thesaurus was also discussed. If the microthesaurus is not present then the
user would have to do a follow on search using the term from the more general thesaurus.
The question arose as to how the user or the system would know to use the fall back term.
The definition of interoperability was discussed. Do we mean interoperability between
the KOS contents or between databases?
RELEVANT TECHNOLOGIES FOR INTEROPERABLE TERMINOLOGIES
Demonstration of an Ecoinformatics Web Service (Tim Lynch)
The Agriculture Network Information Center (AgNIC) is a collaborative venture among
40 institutions in the US led by the National Agricultural Library of the USDA with the
goal of providing a single point of access to agricultural information. The site has a
number of features including a search feature which is where the use of web services is
currently focused. Instead of incorporating the thesaurus directly into the web site, they
encouraged NAL to do it as a web service which provides more flexibility and makes the
thesaurus more generally useful.
The Ag Thesaurus is queried by the web service from the search box. The information in
the database is categorized using terms from the thesaurus; the user does not have to
know the preferred terms because the web service call to the thesaurus will pull them all
together.
Mr. Lynch demonstrated the linkage that is possible between two terminologies using
web services. He searched a German term in GEMET, identified the English translation,
and then searched the AgThesaurus and the AgNIC resources. In order to do this he
cloned the software from the AgNIC web service and loaded a portion of the GEMET
database in English, Spanish and German. It required that GEMET be set up as a web
service and then about 10 lines of code were added.
Web service protocols for thesauri are needed. Work has been done in this area by the
Alexandria Digital Library. However, problems were found when requesting a hierarchy
of terms with some environments such as Coldfusion. The AgNIC protocol was adapted
from the ADL to work with a variety of web service technologies. The protocol supports
requests for BTs, NTs, a number of levels, and for a known term or term-based pattern
matching. The relevant issue is how to structure the query. ISO 5963, the specification
for searching a thesaurus, is an approach that could be used.
Issues of branding have not been addressed. AgNIC will be mirroring the Ag Thesaurus.
It is asking users of their services to register with them, and they credit services that they
are consuming on the AgNIC site. Questions of performance and reliability of services
or resources that you do not own also exist. There are several other web services used on
the AgNIC site including a common calendar feature, the featured site feed from an
LDAP-like database, and the browsing by topic.
Suggestions on Sharing Environmental Terminologies (Stefan Jensen, EEA)
Mr. Jensen proposed setting up GEMET maintenance as a WIKI to address practical
problems of maintaining GEMET in a centralized fashion. In looking for technologies to
support sharing and maintenance of a resource, Mr. Jensen came across WIKI which is
used in open resources like Wikipedia and Wik-Dictionary. Zope-Openwiki would allow
such an open structure to be built. It would make GEMET an open web-source for others
to take from and add to. This is especially important for a multilingual resource when it
is no longer feasible to do the expansion through specific physical groups.
GEMET could be used as a core terminology and users could add to it with some
moderation to assure quality. Copyright would not be retained but a GNU public license
could be used. A disclaimer would need to be considered. Issues include validation and
authoritative sourcing. A persistent version of GEMET may be needed for some
applications.
It was suggested that GEMET be implemented as a WIKI on an experimental basis, as
long as the original GEMET is maintained in a stable way and the differences between
the two versions are made clear. Mondeca offered to host such a WIKI environment to try
out this concept since Mondeca has already converted GEMET.
Thesaurus Software Tools - THESmain, THESshow and SuperThes (Hermann
Stallbaumer, TBHS, Austria)
SuperThes software is the latest version of THESmain which dates back to 1995. This
software product is used by ENVOC, UDK-Thesaurus, GEMET, AWG, etc. Several
problems have surfaced. GEMET now has 19 languages and there has always been a
problem incorporating new languages, including issues related to character sets.
Thesaurus constructors demand flexible thesaurus structures, the freedom to create
thesauri from scratch or to import them from various places, convenient data exchange
with standard office applications using state-of-the-art technologies like drag and drop,
report generation, and bulk extraction using standard file formats such as XML.
SuperThes is being implemented as a web service. ODBC and JDBC connections are
available which are open source. SuperThes currently uses an XML DTD. XML schema
are required for web services, but there are automatic tools that can be used to convert
DTDs to schema. The relationship types have been expanded.
Integrating Ontology and Thesaurus in a Semantic Web Framework Use Case:
Multimedia Dictionary of Sustainable Development (Bernard Vatant, Mondeca)
Mondeca is a small French company that has been developing KM solutions based on
concept-like semantic nets and topic maps. Mondeca tries to stay compatible with the
Semantic Web. Topic maps are not considered part of the SW framework according to
the W3C, but this is changing.
In this environment some “things” you are interested in are reflected in terminologies
(subjects) while others such as people are generally managed as part of taxonomies. The
third component is the information resource (document and data). Based on these
objects, there are several large questions. How do you make these three domains come
together? How do we agree we are speaking about the same thing either within the same
KOS or across KOSs? How do we access distributed information about something?
Even if we agree about the “thing”, how do we agree on classes and properties of that
thing in an ontology? How formal must the agreement be to disagree – is the agreement
for humans or computer? How do we relate “things” to each other? How does metadata
definition relate to the individual instances that are cataloged?
The Semantic Web tries to address these issue using web technology. Everything is
represented as a URI and everything is a resource (RDF). The URIs identify resources.
OWL provides a standard way to declare and commit to an ontology based on RDF-XML
and RDF semantics. RDF allows you to link resources together via semantic predicates.
Topic Maps bring features that are not native to RDF+OWL. These include multilingual
management using name and scope mechanisms and subject indicators which clarify the
subject a URI actually identifies. However there is still debate about how a subject
indicator defines identify. The Topic-Role Association structure of topic maps allows
multiple relationships to be identified and the natural expression of complex relationships
that occur in knowledge bases to be described. However, topic maps do not have built in
formal semantics which RDF and OWL have.
A thesaurus is a sort of ontology but with weak formal definitions of the relationships
which are really needed to do a well-formed taxonomy. Formal semantics are needed if
you want to have computer action against the ontology. A question that is often asked is
how legacy thesauri can be ported to the semantic web. How can implicit or explicit
thesaurus semantics be expressed? What languages are best to express this? There is
ongoing work on these issues within SWAD-Europe and the Semantic Web Best
Practices and Deployment Working Group of the W3C.
Thesauri should be integrated into the SW framework; it should not be thrown out or lost
in the RDF soup. In an Ontology Driven Thesaurus, the current thesaurus can be
presented and constraints can be added (this works especially if you have thesaurus terms
classified). A prototype project by Planete-Ecologie Directory of Web Resources uses
GEMET. Mondeca has two years of experience in dealing with thesauri in this manner
and they now have a full scale project called Dictionnarie Multimedia du Developpement
Durable (in French) to convert multiple thesauri.
Semantic Web Advanced Development for Europe Project: SWAD-E (Alistair
Miles, Rutherford Appleton Lab., UK)
The Web is currently information for humans, while the Semantic Web is the
development of a Web that has data so that computers can understand meaning. The goal
is to improve searching and to have the Web do more of what humans must do now. The
enabling technologies are RDF and OWL.
SWAD-E is an EU project set up to support the W3C Semantic Web activities through
R&D, demonstrations, application development and the identification of guidelines.
SWAD-E is interested in thesauri because they represent large bodies of well-engineered
knowledge that the semantic web can bootstrap from. SWAD-E is designing RDF
schemas to express thesaurus data, developing guidelines for migration and use of
thesauri, developing supporting technologies, and demonstrating the value of using
thesauri in this context.
RDF allows “chunks” of a thesaurus to be made available on the Web. A link can be
created while changing and evolving the connected data structure. The move to XML is
a step in the right direction but XML is not equal to interoperability; there are no two
thesauri on the web in the same XML format. Because of this SWAD-E has developed
the SKOS-Core 1.0, an RDF schema for thesauri.
There are several challenges related to the SKOS schema. There is a need to preserve
unique features of individual thesauri while supporting interoperability. The second
challenge is interoperating while migrating all the different types of KOSs. Therefore, the
underlying SKOS meta-model is concept oriented rather than term oriented. URIs are
assigned to concepts, and a hierarchy of extensible semantic relations is established.
Semantic mappings are created to identify the relationship between the concept in one
schema and the concept in another.
Using GEMET as an example, a few extensions would be needed to handle the nonstandard features such as groups, super-groups and themes. These would be additional
concepts that would have a certain type of relationship.
SWAD-E is planning to use a combination of SKOS Core and OWL to represent this
hybrid structure which is half way between a thesaurus and an ontology. Later, the
relationships can be extended. An SKOS Mapping supports the interchange of thesauri.
Using the SKOS APIs an interface could be built to terminology web services for
searching, querying, and browsing. A web service API could be built for the
environmental terminology community, so that we can talk to one another. A reference
implementation of such an API is under development. Issues such as versioning and
evolution of the thesaurus need to be discussed.
SWAD-E would be interested in working with the environmental terminology
community to transform thesauri for the Semantic Web. The SKOS Core 1.0 Guidelines
for Migration are under development. An XML exchange format is needed. A web
service that gets a description of an individual terminology resource is also important.
NEXT STEPS, PROPOSALS AND RESEARCH AGENDA
Geosciences Proposal (Jan Jellema)
A proposal is being written for the EU to develop tools for geo-terminology
(GEOTERM). The proposal will define the organization needed to support such an effort
and the guidelines for the content. The project would provide an Internet presence for
terminology in the geosciences and relate this terminology to other thesauri. GEMET
might be the top level and the geo terms could be linked at a lower level.
Through an XML schema for each language, this framework would create and maintain
the thesaurus in a decentralized manner and allow for more automated updating.
National and special extensions would be needed in such a framework since geosciences
terminology is by its nature geographically based. A public interface could be developed
to provide popular terms while continuing to serve the expert’s needs. Groups of
international and national experts would oversee GEOTERM’s development.
The thesaurus would integrate more easily with other applications. An HTML page
could be made for each term in the terminology. General applications such as adding
keywords into a Word document would be more feasible. Web services for search,
translation and XML import and export would be needed.
Discussion of Future Cooperation on Environmental Terminology
Based on the presentations by the various participants, it became clear that there are a
variety of KOSs that are applicable to the environment. Something is needed that will
bring all these resources together. One could view this as “web service choreography”.
The group decided that it is necessary to start with a vision, to develop a list of
components and a roadmap to achieve them, and then to set up a testbed for performance
purposes. The first issue is that of one-to-one web services.
The group also acknowledged the need for visibility in other communities and a need for
funding. A group name, ECOTERM, was suggested. Several pilot projects were
discussed. The possibility of a follow-up meeting next year was of interest to those in
attendance.
SUMMARIZATION OF RECOMMENDATIONS AND NEXT STEPS
The following is a draft work plan for future cooperation on Environmental Knowledge
Organization Systems. The action items are grouped by administrative, technical,
marketing and communications and other.
Note: KOSs include thesauri, classification schemes, authority files or any other formal
terminology system. [put a more formal definition from the CLIR report]
ADMINISTRATIVE
 Agreed the name of this group and project is Ecoterm
•Need statement of common vision (Gerry to draft)
•UNEP to finalize and distribute meeting report (Gerry Cunningham)
•Plan an annual Ecoterm meeting possibly in conjunction with a UNEP.Net meeting
(Gerry Cunningham)
TECHNICAL
•Need for use cases was identified and are requested from anyone who will submit them
(they should include specific scenarios, needed KOSs and user audiences, if possible)
–E.g., KOSs for the European Spatial Data Infrastructure; use of existing KOSs in Egovernment-wide taxonomy development; retrieving definitions from multiple KOSs)
•Develop example implementations of a thesaurus web service
•Define high level services that are needed
•SKOS reference implementation (Tim Lynch)
•GEneral Multilingual Environmental Thesaurus (GEMET) as a web service
(Sean Khan and Stefan Jensen)
•EEA will liase with SWAD-E project to get the full GEMET available in
their SKOS format
•EEA will follow up with Tim Lynch (USDA, Cornell) during 2nd week
of May to jointly work on the prototype web-service done around GEMET
in ZOPE. This can either include the full coverage of GEMET in the
USDA web-service or the finishing of the work through EEA consultants.
•Consider exchange between web-services based on the USDA prototype
through SOAP or XML/RPC or both
•EEA will elaborate on the hosting of this web-service on one of the
EIONET servers and according to that decision provide the domain part of
the URI for all GEMET terms
•Review of options related to technologies
•“Standard” formats for KOSs on the web
•Identification of APIs and requirements
•Investigate the feasibility of registering KOSs in a UDDI (Universal Description,
Discovery and Integration) Registry (Thomas Bandholtz)
•Consider the feasibility of publishing RDF encodings of KOSs on the Web
•Review the taxonomy and definitions of relationship types with SKOS (Gail Hodge,
Bernard Vatant, Margharita Sini and Alistair Miles)
•Determine what information should come back for a resource that would help the user to
evaluate multiple resources -- e.g., branding, quality, context/scope parameter, etc.
•Develop a list of standards and other resources related to KOSs, (e.g., the Alexandria
Digital Library Guidelines for Web Services) (Bruce Bargmeyer, Paul Smits and Gail
Hodge)
MARKETING AND COMMUNICATIONS
•Find a mechanism for sharing information within the group
•Update the terminology page on the Ecoinformatics site including the rename of
the project to Ecoterm
•Consider use of the bulletin board on the Ecoinformatics page for
communications within this group (Stefan Jensen)
•Put up the presentations from the meeting (Gerry Cunningham)
•Promote our KOSs with groups involved in new technologies such as the semantic web
•Form liaison with W3C Semantic Web Activity - Best Practices and Deployment
Working Group (Bruce Bargmeyer, Bernard Vatant and Alistair Miles)
• Promote our terminologies with the Earth Observation Summit (GEO) (Paul
Smits)
 Send existing questionnaire to stakeholder groups who weren't at the meeting and ask
those who were invited but could not come to submit questionnaire responses (Gerry
Cunningham and Paolo Plini)
 Distribute Draft Content Standard for KOS Description (Hill & Hodge) to the
Ecoterm group for comments and modify as needed (Gail Hodge)
 Consider a more extensive questionnaire based on the revised Content Standard (Gail
Hodge)
 Enter the questionnaire information into a database (terminology registry) (Sean Khan
and Gail Hodge)
 Put URLs and descriptions for those resources that are web accessible on the Ecoterm
web page (Gail Hodge and Stefan Jensen)
 Extend ENVOC (UNEP's Environmental Vocabulary) and GEMET to additional
languages (Gerry Cunningham and Stefan Jensen)
 Set up an e-mail reflector with an archive including everyone from the meeting
(Gerry Cunningham or Stefan Jensen) (The reflector should put the name of the group
in the beginning of the subject line; followed by very short but understandable thread
name and subject)
 Increase the visibility of environmental KOSs by investigating ways to make
individual terms accessible directly via a Google-like search
OTHER
•Geosciences Proposal to the EU on sharing -- sharing of tools
•Explore the possibility of making GEMET publicly available through WIKI
technologies. This may provide a maintenance and look-up environment. (Stefan Jensen
and Bernard Vatant)
REFERENCES
Federal Ministry for the Environment (Austria) (1996). User Experiences with
Environment Thesauri in CDS. First International Workshop: Catalogue of Data
Sources (CDS) and Thesaurus. Publication Series Umweltdatenkatalog, Volume
8. Vienna, Austria, April 1996.
Hill, Linda & Gail Hodge. Taxonomy of Knowledge Organization Sources/Systems.
NKOS. July 2000. (http://nkos.slis.kent.edu/KOS_taxonomy.htm )
Hodge, Gail. “Systems of Knowledge Organization for Digital Libraries: Beyond
Traditional Authority Files” CLIR Pub91. April 2000.
(www.clir.org/pubs/abstract/pub91abst.html )
Appendix A
Environmental Thesaurus/Terminology Workshop
United Nations Environment Programme (UNEP)
International Environment House (UNEP Regional Office for Europe)
9 – 15, chemin des Anémones
CH-1219 Châtelaine, Geneva, Switzerland
14-15 April 2004
Agenda
Purpose: Bring together the major providers of environmental terminologies to
discuss the status of their terminologies, how they are applying new
technologies, and how these resources can be “integrated” using new
technologies.
Objectives:
Bring together major providers of environmental thesauri and provide a forum for
discussion about the current use and future of their resources
Present the concept of web services and demonstrate a prototype of several
simple services
Facilitate discussion among the participants to determine whether this approach
can be useful and if other groups are interested in implementing these services,
developing new ones, etc.
Collect information from participants that can lead to an enhanced terminology
web page through the collection of key information about each terminology
Present key components of terminology management including the terminology
types and the concept of different behaviors that influence how and what
services are provided
Identify a research agenda and possible funding opportunities to move the
process forward
Create an outcome relevant to one or more of the topical focus areas identified
as being of interest to the EPA and EEA – children’s health, biodiversity,
chemicals, etc. so that the results can be presented at the Ministerial Meeting in
May.
Wednesday, 14 April 2004
Workshop Introduction
Session 1 – Chair: Gerard Cunningham, UNEP
0900h
Welcome and Opening Remarks
(Frits Schlingemann, Director, UNEP/ROE)
0915h
Participant Introductions
0945h
Ecoinformatics partnership and objectives of the workshop
(Bruce Bargmeyer/Larry Fitzwater, USEPA & Stefan Jensen, EEA)
1000h
Types and varieties of terminology resources
(Gail Hodge, USGS/NBII)
1010h
Introduction of web services concept (Tim Lynch, USDA)
1030h
Coffee break
Overview of Existing Systems and Use Scenarios
1100h
Overview of existing environmental thesaurus/terminology
initiatives (Participants to prepare one-page overviews using a questionnaire)
1100h
Environmental Applications Reference Thesaurus – EARTh
(Paolo Plini, CNR, Italy)
UDK Thesaurus and UMTHES (Wolf-Dieter Batschi, UBA Germany)
TED - Thesaurus on Emergencies and Disasters
(Rudolf Legat, UBA Austria)
1110h
1120h
1130h
1140h
1150h
National Biological Information Infrastructure – NBII
(Lisa Zolly & Vivian Hutchison, USGS)
CAB Thesaurus (James Brooks, CABI)
Integrated Taxonomic Information System – ITIS
(Janet Gomon, Smithsonian Institution, USA)
1200h
SNS – The Semantic Network Service
(Maria Ruether, UBA-Germany & Thomas Bandholtz, Consultant)
1210h
Environmental Multilingual Thesaurus of the Spanish Environment
Ministry (Carmen Casal Fornos & Arantza López de Sosoaga, MMA)
1220h
Lunch
Continuation of the presentations
Session 2 – Chair: Gail Hodge, USGS
1320h
Terminology Co-ordination and Harmonisation in the EU in the light
of the IATE Project (Dieter Rummel, EU )
1330h
Multilingual Thesaurus for the GeoSciences
(Jan Jellema, TNO-NITG, Netherlands)
1340h
GEMET and EEA Glossary (Stefan Jensen, EEA)
1350h
USEPA TRS (Larry Fitzwater, USEPA)
1400h
UN initiatives:
FAO – Agrovoc thesaurus (Margherita Sini, FAO, Rome)
IAEA - The INIS/ETDE Joint Thesaurus at the International
Nuclear Information System (Yves Turgeon, IAEA, Vienna)
UNEP - ENVOC Thesaurus (Gerard Cunningham, UNEP, Nairobi)
UNESCO Thesaurus (Liane Barsony, UNESCO, Paris)
UNITED NATIONS - JIAMCATT (Ali Benhadid, UNOG, Geneva)
WHO – CEHA Inter-Water Thesaurus and other WHO Sources for
Health and Environment Terminology (Mazen Malkawi,
WHO/CEHA, Jordan )
1500h
Coffee break
Use Cases and User Needs
1530h
Towards a web service-based infrastructure: the use of
environmental thesauri within European Commission initiatives
(Paul Smits, EU/JRC, Italy)
1600h
Multilingual considerations – discussion
1615h
The use of GEMET for Swiss environmental catalogue
(Jean-Philippe Richard, GRID-Geneva)
1630h
Applying GEMET – UNEP’s perspective
(Sean Khan, UNEP)
1645h
Needs of different client groups - discussion
1800h
1900h
Adjourn
Reception (La Terrassa restaurant)
Thursday 15 April
Session 3 – Chair: Stefan Jensen, EEA
Relevant Technologies for Interoperable Terminologies
0900h
Demonstration of Ecoinformatics Prototype (Tim Lynch, USDA)
0945h
Thesaurus Software Tools - THESmain, THESshow and
SuperThes (Hermann Stallbaumer, TBHS, Austria)
1015h
Coffee break
Presentation of Relevant Technologies contd.
1045h
Integrating Ontology and Thesaurus in a Semantic Web Framework
Use case : Multimedia Dictionary of Sustainable Development
(Bernard Vatant, Mondeca)
1115h
Semantic Web Advanced Development for Europe
project: SWAD-E (Alistair Miles, Rutherford Appleton Lab., UK)
Technology options
1145h
Some suggestions on sharing environmental terminologies
(Stefan Jensen, EEA)
1215h
Lunch
Next Steps, Proposals and Research Agenda contd.
Session 4 – Chair: Bruce Bargmeyer & Larry Fitzwater, USEPA
1315h
Geosciences Proposal (Jan Jellema, TNO-NITG, Netherlands)
1345h
Future cooperation on environmental terminology:
(discussion session aimed at reaching agreement on respective
organizational roles on future terminology collaboration)
1500h
Coffee break
1530h
Future cooperation on environmental terminology (continuation)
Closing session
1630h
1730h
Summarization of recommendations and next steps
Concluding remarks
Closure of the workshop (UNEP representative)
Adjourn
Appendix B
List of Participants
Appendix C
Terminology Survey
Appendix D
Survey Responses
Appendix E
Draft Agreement
Download