Controlled Vocabulary Workshop March 26-27, 2011

advertisement
CONTROLLED VOCABULARY WORKSHOP
MARCH 26-27, 2011
OBJECTIVES
Finalize VOCAB “Terms of Reference”
 Define use cases for the keyword database and
its development
 Develop procedures for capturing and
managing keyword taxonomies
 DONE: Identify suitable existing database
structures or software for managing the
controlled vocabulary and adopt or modify
them to meet the use cases

AGENDA
Saturday March
26
7-8 AM
8-8:30 AM
8:30- 9 AM
Breakfast at the SEV
Welcome, Review of Agenda, progress report
Setup Use Case Working Groups
9-11 AM
11-Noon
Noon – 1:30 PM
1:30-2:30 PM
Use Case Working Groups
Report back from working groups, VTC with other members for input
Lunch
Review of Controlled Vocabulary Terms of Reference – List
Management
Work on using TemaTres software, including web services – work on
taxonomies
2:30-3:30 PM
3:30-5:00 PM
6 PM
Work on some draft taxonomies
Depart for Dinner in Sicorro
AGENDA
Sunday March 27
7-8 AM
Breakfast
8-9 AM
Planning for workshop at SC with domain researchers
9-11 AM
Work on Use Cases with focus on implementation steps
11-Noon
Report on use cases, VTC with other members for input
Noon- 1:30 PM
Lunch
1:30-3 PM
Write-up Use case scenarios, work on draft taxonomys
3-3:30 PM
Wrapup
4 PM
Depart for ABQ hotels and airport
STATUS
Tematres web-based thesaurus tool installed
 Taxonomys implemented

 Habitats/Ecosystems
 Substances
 Processes
 Organisms

Terms classified (things, materials,
activities/processes, properties, etc.)
A
few terms recommended for removal
STATUS

416 terms are part of the polytaxonomy
 Includes
some new higher-level terms
264 terms remain to be linked
 Synonyms are listed, but not yet added
 A production server has been established for
the controlled vocabulary

 Ability
to create instances for individual sites
 Eda has worked a lot on import/export issues
USE CASE WORKING GROUPS

Straw Man List




Vocabulary use for searching and browsing – Eda, Don,
Corrina
Putting the vocabulary into LTER documents – Kristin,
Margaret, John
List Management – decision processes
Focus first on WHO DOES WHAT (not how)


May be a diagram/flow chart showing actors, actions and
results
Once the first step is accomplished then consider:



How it might be accomplished technically
What resources would be required
Who should be responsible for the implementation
WORKING GROUP NOTES
PUTTING WORDS IN DOCUMENTS
JP’s use case
 Draft EML document
 Use Duane’s HIVE tool to suggest probable words
 check off ones you want,
 returns

EML snippet to screen, for cut and paste into doc.
 Or Revised EML document with keywords added
 Or XML document with keywords (in keywordset node,
including thesaurus) to be used with web service client
(allowing additions to relation databases etc.)

KRISTIN’S USE CASE
Populate Drupal web site with polytaxonomy
 Within Drupal Metadata Editor - Browse – drop
down list of levels, or search to find terms
 Select term you want and it is automatically
added to backend database that is used by the
module that creates EML

MARGARET’S USE CASE
Browse or search keywords and check off
desired terms
 As things are checked off, generates internal
list that is archived at a particular URL
 Web service provides XML snippet that can go
into EML

USING FOR NON-DATASETS
E.g., publications, projects etc.
 May not have EML representations
 Browse or search to locate potential terms
 Return

Simple list for inclusion (cut and paste) into
publications etc.
 EML snippet as part of an XML document for use with a
web service client to interface with desired systems


Note: this could also use HIVE search tool instead
of raw browse
BEST PRACTICES
Need a best practices guide that addresses use of
the controlled vocabulary
 Goal – assure that LTER data is discoverable
 Examples:

Use the most specific terms you can
 Specify how many or what categories of terms should
be included where applicable - examples

 Specifing
a desirable number of terms
 E.g., At least one term from at least X of the LTER taxonomys
 Should have at least one core area
RATING DOCUMENTS
Run document through congruency checker
 It says how many keywords and taxonomys are
represented in an EML document
 Allows checking for conformance with best
practices

WORKING GROUP - MANAGING VOCABULARY

Principles

want to hit “sweet spot” for number of keywords


Enough to make reasonable search and browsing possible
Not so specific that only data from a particular site or dataset would be returned
from a search






Could be words used widely at a single site
Want to avoid words that are too esoteric
The list should be modified periodically to capture additional words as
they become widely used in the network
Each site should be able to propose new preferred terms, in suitable
forms that are widely used in datasets from the site. A proposal should
include justification, including information on related terms used at
other LTER sites and where the term might be placed into the
taxonomies
Sites can propose also non-preferred terms linked to existing preferred
terms
Sites should be able to maintain independent, site-specific controlled
vocabularies
CRITERIA FOR ACCEPTING OR REJECTING
PROPOSED PREFERRED TERMS
The proposed terms should be suitable for
inclusion (e.g., not locations or specific
taxonomic identifiers)
 Proposed terms should not be redundant with
existing term(s) already in the vocabulary
 Terms and their proposed places in taxonomys
should conform in form with NISO Z39.19 2005
and successor documents (e.g., sections 6.5.1,
8.3)

CRITERIA FOR ACCEPTANCE OF PROPOSED
NON-PREFERRED TERMS
The proposed terms should be suitable for
inclusion (e.g., not locations or specific
taxonomic identifiers)
 The proposed terms must be sufficiently close
synonyms to the preferred term to which they
will be linked

CRITERIA FOR REMOVING OR ALTERING
PREFERRED TERMS
Terms will never be altered, but they can be
demoted to non-preferred status
 Terms can only be removed if they are not
currently in use by datasets
 Removals or alterations of terms are expected
to be rare

CHANGING LOCATION OF TERMS IN
TAXONOMIES OR THESAURI
These have large subjective elements. Other
resources should be frequently consulted when
making changes
 Sites or individuals can propose and justify
changes that will be evaluated relative to NISO
Z39.19

PROCESS


VOCAB committee may do research to identify terms
that should be added based on use in site-specific
vocabularies, use in datasets and other sources of
information.
VOCAB committee receives and evaluates proposed
changes



Based on criteria make changes to development version of
the controlled vocabulary database
The Controlled Vocabulary may make immediate
changes in the current official version to correct gross
errors
New versions will be issued by VOCAB from time-to-time,
and a request for endorsement will be forwarded to
IMEXEC
SCIENCE COUNCIL WORKSHOP

Objective



Engage SC members – sell on idea – develop some advocates
Process followed: Objectives - Rules for taxonomys
Get guidance on specific issues

The Controlled vocabulary






Use Cases



Need for related terms?
Are there things missing? – core areas?
Are there things that should be removed?
Are there things that are out of place?
Specifc areas of concern
Feedback on proposed uses
Priorities for getting implemented
Tasks before workshop

Add definitions for all words to the taxonomy




Prioritize ones that are difficult
Get way to display entire vocab.
Improve diagram for content
Send SC members link to Tematres – have them do test searches
AGENDA

Introduction – 1 hour






Around the room introductions
why we need controlled vocabulary
steps taken so far
Background – procedures for creating controlled
vocabularies
Meeting objectives
How to use Controlled Vocabulary – 1 hour

Question for SC members
What are your experiences with finding LTER data
 What would most help you find data in the future?
 Discussion of data discovery use cases

SC AGENDA

Tour of Controlled Vocabulary – 1 hour




General Introduction
Breakout groups (pair SC member with IM) to look at areas
of specific interest
Feedback to entire group on things in the controlled
vocabulary that need improvement – 1 hour
Discussion of specific issues

Core areas as top level hierarchy



now integrated elsewhere
Management of the vocabulary – role of researchers
Discussion of next steps

How do we engage larger LTER community?

How much, and what sort of engagement is needed
Download