Vocabulary Framework v3.0, 5 April 2012 Vocabulary Framework for Course Data Programme This version: http://www.xcri.co.uk/KbLibrary/CourseDataProgramme_Vocabulary_Framework3.0.docx Previous versions: http://www.xcri.co.uk/KbLibrary/CourseDataProgramme_Vocabulary_Framework2.2.docx http://www.xcri.co.uk/KbLibrary/CourseDataProgramme_Vocabulary_Framework1.2.docx Author: Alan Paull, APS Ltd, alan@alanpaull.co.uk Editors: Kirstie Coolin, CIePD, University of Nottingham, kirstie.coolin@nottingham.ac.uk Sandra Winfield, CIePD, University of Nottingham, sandra.winfield@nottingham.ac.uk Scott Wilson, JISC CETIS, scott.bradley.wilson@gmail.com Thanks are also due to the members of the Data Definitions and Vocabularies Working Group. Date: 5 April 2012 Abstract 1 This document provides guidance on the appropriate creation, publication and maintenance of vocabularies for XCRI-CAP 1.2 instances produced via the JISC-funded Course Data Programme1. Its purpose is to assist publishers of XCRI-CAP vocabularies to provide necessary information, maintenance, format and quality for their users. Status of this document 2 This document is version 3.0 (first formal release). Subsequent releases will be published under version control. Potential changes will be publicised via the Course Data Programme’s JISCMAIL list, coursedatastage1@jiscmail.co.uk and the XCRI Forum at http://www.xcri.org/forum. 3 It should be used alongside the following facilities: Data Definitions for XCRI-CAP 1.2 XCRI-CAP 1.2 specification2, which defines the information model; The Course Data Programme Data Definitions Document3, which defines the information content; XCRI-CAP 1.2 XML validator4, which will report on validity of XCRI-CAP feeds; XCRI-CAP 1.2 aggregator5, which will enable the collection of XCRI-CAP 1.2 feeds. 1 http://www.jisc.ac.uk/whatwedo/programmes/elearning/coursedata.aspx http://www.xcri.org/wiki/index.php/XCRI_CAP_1.2 3 http://www.xcri.co.uk/KbLibrary/XCRI_CAP_Data_Definitions3.0.docx 4 http://validator.xcri.co.uk/ 2 1 Vocabulary Framework v3.0, 5 April 2012 Table of Contents Abstract ............................................................................................................................................................................................ 1 Status of this document .................................................................................................................................................................... 1 Table of Contents .............................................................................................................................................................................. 2 Content of this document ................................................................................................................................................................. 3 Relationship to standards ............................................................................................................................................................ 3 Conventions and terminology ...................................................................................................................................................... 3 Vocabulary definition .............................................................................................................................................................. 3 Responsible Authority ............................................................................................................................................................. 4 Using vocabularies ............................................................................................................................................................................ 4 The Course Data Programme Vocabulary Framework Recommendations ....................................................................................... 5 Vocabulary formats...................................................................................................................................................................... 5 Standardised formats .............................................................................................................................................................. 5 Formats for new vocabularies: Use of VDEX ........................................................................................................................... 5 Tokenised term vocabularies .................................................................................................................................................. 5 Output format ......................................................................................................................................................................... 6 Vocabulary publishing and maintenance ..................................................................................................................................... 6 Appendix 1: Vocabularies ................................................................................................................................................................. 7 Recommended Course Data Programme Vocabularies ............................................................................................................... 7 Attendance Mode ................................................................................................................................................................... 7 Attendance Pattern ................................................................................................................................................................. 7 Description Type ..................................................................................................................................................................... 7 Course type: CPD..................................................................................................................................................................... 8 Languages ............................................................................................................................................................................... 8 Study Mode ............................................................................................................................................................................. 8 Subject: JACS ........................................................................................................................................................................... 9 Subject: LDCS .......................................................................................................................................................................... 9 Subject: SSAC ........................................................................................................................................................................ 10 5 Feed Manager: http://coursedata.k-int.com/FeedManager/, Discovery page: http://coursedata.kint.com/discover/ 2 Vocabulary Framework Draft v2.2, 28 March 2012 Content of this document "Vocabularies are invaluable for accurate and consistent searching, querying and categorization. There is a tendency across all metadata developments now to use vocabularies wherever possible rather than relying on uncontrolled literal strings whose meanings are inconsistent and normally require individual human interpretation." Vocabulary Mapping Framework Project 4 This document provides information about the management of vocabularies used in XCRI-CAP feeds for the Course Data Programme. The Framework consists of a set of recommendations on the preferred formats for publishing vocabularies and on managing, exporting and maintaining them. Relationship to standards 5 This framework and vocabularies associated with it are not normative parts of the XCRI-CAP standard. Where vocabularies relevant to the framework already exist, these are referenced by this document. 6 This framework is a set of recommendations from the JISC Course Data Programme. It is not a formal standard and does not set requirements on existing vocabularies. Where new vocabularies are produced as part of the programme it is RECOMMENDED that this framework is used. 7 This framework uses the following additional standard: VDEX: IMS Vocabulary Definition Exchange6 Conventions and terminology 8 This section defines conventions and terms used in this document to present the technical material. 9 Where the Framework makes a recommendation, one of the key words RECOMMENDS, RECOMMENDATION or SHOULD is used. No vocabularies are mandatory. Vocabulary definition 10 In this Framework, ‘vocabulary’ means the lists, glossaries, taxonomies, encoding schemes, classifications and thesauri used in information systems to help users search and analyse course marketing information. It refers to lists of values, which may be simple lists of words or numbers, or lists in which each value has an identifier and a definition that classifies something. Definitions may be implied via simple labelling or described explicitly in text. The term 'code list' is used in Information Standards Board (ISB)7 standards, 'vocabulary encoding scheme' is used in the Dublin Core Abstract Mode (DCAM)8 and 'value list' is used in the IMS9. 11 An item in a vocabulary is called a 'term'. Most of the vocabularies to be used in the Course Data Programme use 'key-value' pairs to indicate a term. In this context the 'key' is the machine-readable identifier for the term, the 'value' is its human-readable name. In some instances a vocabulary term will 6 http://www.imsglobal.org/vdex/vdexv1p0/imsvdex_bestv1p0.html http://dfe.gov.uk/escs-isb 8 http://dublincore.org/documents/abstract-model/ 9 http://www.imsglobal.org/vdex/vdexv1p0/imsvdex_bestv1p0.html 7 3 Vocabulary Framework Draft v2.2, 28 March 2012 have additional properties, such as a formal definition, description or relationship with other terms. A term may have a URI as an identifier. Responsible Authority 12 A responsible authority is defined as the organisation that controls the vocabulary. It has the direct responsibility for managing the process of changing the vocabulary and gives it its legitimacy. In many instances the responsible authority will also manage the publication of the vocabulary, but this is not necessarily the case. Using vocabularies 13 Organisations that manage and develop courses information services for learners – universities, colleges, agencies, schools, advisory services, aggregation websites and others – use and share vocabularies in course marketing information on a day-to-day basis. These organisations need to know the provenance, quality and relevance of the vocabularies they use, in order to have confidence that their usage will not be compromised by withdrawal, by unilateral change or by using incorrect versions of the same vocabulary. These factors are doubly important for interoperability between systems, as all systems will depend on the continuity of common vocabularies for efficient data exchange. 14 Information systems can use vocabularies automatically by accessing standardised files or links on the Internet. However, in many cases vocabularies used in course marketing information are not currently published on the internet, or are made available only to closed groups. Some vocabularies are accessed using copies of local files only. Each of these use cases represents current practice that is valid and has to be supported by this Framework. 15 The primary context of this Framework is the automatic use of machine-readable vocabularies within the domain of Higher Education course marketing information in the UK. 4 Vocabulary Framework Draft v2.2, 28 March 2012 The Course Data Programme Vocabulary Framework Recommendations Vocabulary formats Standardised formats 16 It is RECOMMENDED that responsible authorities use a standardised format for existing vocabularies that are to be used in the Course Data Programme. 17 A standardised format SHOULD be published, maintained and quality controlled. It MAY be controlled by a national or international standards body (for example BSI10, IMS11, ISO12) or other responsible authority. Rationale 18 The Framework does not prescribe vocabulary formats that must be used, neither does it prohibit the use of formats not mentioned in this document. It is recognised that many vocabularies in non-standard formats are already in use successfully, and that organisations will continue to use them. However, the use of standardised formats for vocabularies makes interoperability, data exchange, mapping and re-use of data easier, more coherent and more accurate. Formats for new vocabularies: Use of VDEX 19 Where new vocabularies are created for the Course Data Programme, it is RECOMMENDED that the IMS Vocabulary Definition Exchange (VDEX) standard13 be used. Rationale 20 VDEX is a widely used and readily understood existing standard. It has four different XML bindings for different vocabulary types. It uses the same technology as the principal XCRI-CAP 1.2 binding. Tokenised term vocabularies 21 The Framework RECOMMENDS that where key-value list vocabularies are used, consumers treat them as 'tokenised term' vocabularies14 (see below), and RECOMMENDS that producers supply values from a preferred list of permitted terms. Appendix 1 in this Framework document gives some examples of recommended Course Data Programme vocabularies. Rationale 22 VDEX refers to two distinct categories of vocabulary: Those where the key is some sort of token or code and where this code effectively dereferences a human language descriptor, permitting the use of alternative human language descriptors for the same vocabulary term (see example below). Those where the key is a specific human language term. 10 http://www.bsigroup.co.uk/ http://www.imsglobal.org/ 12 http://www.iso.org/iso/home.html 13 http://www.imsglobal.org/vdex/ 14 Explained in more detail in http://www.imsglobal.org/vdex/vdexv1p0/imsvdex_bestv1p0.html 11 5 Vocabulary Framework Draft v2.2, 28 March 2012 23 This Framework is agnostic about the two types of vocabulary. Some vocabularies will consist of simple lists of single keywords with no separate token; these vocabularies will generally be of the second type. Other vocabularies will contain terms with key/value pairs, in which some information systems may use different sets of values for the same term. For example one study mode implementation may use "PT/Part time", whereas another may use "PT/part-time". For data exchange purposes, it is important in this example that the code "PT" is used to generate the appropriate human-readable term or search term for the system importing the data. However, it is recognised that some systems will not readily transform data in this fashion and will use the human-readable term directly. For this reason XCRI-CAP 1.2 feeds should output the permitted terms in addition to the keys. Output format 24 The Framework RECOMMENDS that XCRI-CAP feeds include machine-readable encoded keys in attributes and human-readable values in elements. Examples <studyMode identifier='FL'>Flexible</studyMode> <dc:subject xsi:type='http://www.hesa.ac.uk:JACS3.v1.2' identifier='V100'>History</dc:subject> Vocabulary publishing and maintenance 25 Each vocabulary needs a responsible authority to manage, publish and maintain it. For statistical vocabularies the authority is often a statutory agency, such as HESA15, or a government department. Providers are required to use them to make regular statutory reports to government and others. For UCAS16 applications data, vocabularies are either international standards with their own governance arrangements or governed by UCAS. Many other vocabularies have more nebulous governance arrangements or none. 26 For the purposes of the Course Data Programme it is very important that vocabularies to be used in XCRI-CAP feeds have the following characteristics. They SHOULD BE: Discoverable. It should be easy to find out where the vocabulary is published on the internet. Unique. Persistent and unique identifiers should be used, so that the vocabulary can be reliably referenced, or re-used, without having to search for it again. This characteristic applies to the location of the vocabulary, its version, the machine-readable identifiers and the human-readable identifiers. Persistent. The vocabulary should be available on the internet for a duration of many years. For example, IMS17 recommends that identifiers should be expected to work reliably for 10-15 years after they have been assigned. Authoritative. The vocabulary should be published and maintained by a responsible authority, recognised as authoritative in its domain, for example HESA and UCAS in the HE domain. Maintained. The vocabulary should be responsive to changes in its domain, so that it is kept up-todate. The responsible authority should have a policy and process in respect of change control, maintenance and version control. 15 http://www.hesa.ac.uk/ http://www.ucas.com/ 17 http://www.imsglobal.org/vdex/vdexv1p0/imsvdex_bestv1p0.html 16 6 Vocabulary Framework Draft v2.2, 28 March 2012 Appendix 1: Vocabularies Recommended Course Data Programme Vocabularies Attendance Mode Vocabulary Name: Attendance Mode Identifier http://xcri.org/profiles/catalog/1.2/attendanceMode Description Generic key/value pair vocabulary for attendance mode; the type of location at which the student will undertake the learning opportunity, for example distance learning, campus-based, work-based, or online. Used in presentation >> attendanceMode URL or other location http://www.xcri.co.uk/vocabularies/attendanceMode2_1.xml (VDEX) Responsible authority BSI (via BS 8581) Examples CM, Campus Comments Drawn from the XCRI-CAP 1.2 specification Attendance Pattern Vocabulary Name: Attendance Pattern Identifier http://xcri.org/profiles/catalog/1.2/attendancePattern Description Generic key/value pair vocabulary for attendance pattern; the period in the day and/or frequency during which attendance at a venue is required (if any), for example evenings, daytime, weekends. Used in presentation >> attendancePattern URL or other location http://www.xcri.co.uk/vocabularies/attendancePattern2_1.xml (VDEX) Responsible authority BSI (via BS 8581) Examples DT, Daytime Comments Drawn from the XCRI-CAP 1.2 specification Description Type Vocabulary Name: Description Type Identifier http://xcri.org/profiles/catalog/1.2/descriptionType Description Enumeration values for xsi:type for description element Used in course >> description; presentation >> description 7 Vocabulary Framework Draft v2.2, 28 March 2012 URL or other location http://www.xcri.co.uk/vocabularies/descriptionType1_0.xml (VDEX) http://www.xcri.co.uk/bindings/xcri_cap_terms_1_2.xsd (W3C schema) Responsible authority For publishing: APS Ltd Examples careerOutcome; events Comments Introduced alongside the formal XCRI-CAP 1.2 specification as a recommendation. Course type: CPD Vocabulary Name: Course Type: CPD Identifier http://xcri.org/profiles/catalog/1.2/courseTypeCPD Description Generic key/value pair vocabulary for course type in the Continuing Professional Development and short courses domain Used in course >> type URL or other location http://www.xcri.co.uk/vocabularies/courseTypeCPD1_0.xml (VDEX) Responsible authority For publishing: APS Ltd Examples 1, Class/Group based; 15, Work based Comments This vocabulary originally used for a service for Higher York Lifelong Learning Network. Languages Vocabulary Name: Languages Identifier http://www.infoterm.info/standardization/iso_639_1_2002.php Description Enumeration values for languages Used in xml:lang attributes; languageOfAssessment; languageOfInstruction URL or other location http://www.loc.gov/standards/iso639-2/php/English_list.php Responsible authority International Organization for Standardization (ISO) Examples en; fr Comments There is a range of ISO standards for languages; this is the simplest version, known as ISO-639-2 Study Mode Vocabulary Name: Study Mode Identifier http://xcri.org/profiles/catalog/1.2/studyMode 8 Vocabulary Framework Draft v2.2, 28 March 2012 Description Generic key/value pair vocabulary for study mode, a general expression of the overall amount of the student's time that is devoted to the learning opportunity, as defined by the provider. Used in presentation >> studyMode URL or other location http://www.xcri.co.uk/vocabularies/studyMode2_1.xml (VDEX) Responsible authority BSI (via BS 8581) Examples FT, Full time Comments Drawn from the XCRI-CAP 1.2 specification Subject: JACS Vocabulary Name: Joint Academic Coding System (JACS) Identifier JACS3, version 1.2 Description Generic key/value pair vocabulary for subjects; primarily used for HE statistics Used in course >> subject; presentation >> subject; qualification >> subject URL or other location http://www.hesa.ac.uk/index.php/content/view/1805/296/ SKOS version: http://jacs.dataincubator.org/.html VDEX version: http://www.xcri.co.uk/vocabularies/JACS3-v1_0.xml Responsible authority HESA, http://www.hesa.ac.uk and UCAS, http://www.ucas.com For publishing VDEX version: APS Ltd Examples A100, Pre-clinical medicine; V100, History by period Subject: LDCS Vocabulary Name: Learning Directory Classification System (LDCS) Identifier LDCS, v3 Description Thesaurus of subjects of learning; "The nationally approved subject classification system for Learning Information Databases" Used in course >> subject; presentation >> subject; qualification >> subject URL or other location Available in two volumes: http://readingroom.lsc.gov.uk/lsc/National/caslearningproviders_standards_LDCS_v3_Vol1.pdf http://readingroom.lsc.gov.uk/lsc/National/caslearningproviders_standards_LDCS_v3_vol2.pdf http://www.xcri.co.uk/vocabularies/LDCS_v3.0.xml (VDEX) 9 Vocabulary Framework Responsible authority Draft v2.2, 28 March 2012 Skills Funding Agency For publishing VDEX version: APS Ltd Subject: SSAC Vocabulary Name: Sector Subject Area Classification System (SSAC) Identifier SSAC, 2001 Description Subject classification system for regulated qualifications Used in course >> subject; presentation >> subject; qualification >> subject URL or other location http://www.ofqual.gov.uk/standards/142-statistics-articles/429-sector-subject-areaclassification-system-ssac Responsible authority Ofqual Comments There is a more recent version, but its status is unclear. 10