INTERNATIONAL
ELECTROTECHNICAL
COMMISSION
Towards an integrated platform for standards data
ISO and IEC Marketing and Communication Forum
Geneva, February 22-23, 2012
What is structured content?
Definition:
Information that is written in reusable blocks or components
Examples in standards:
Normative references
Terms and definitions
Scope
Table of contents
Tables
Figures
2
Benefits of using structured content
Reuse components in different contexts
Enrich standards metadata to facilitate search
Identify related information
3
Objectives of pilot project
Extract structured content using a simple, low-cost method
Provide an integrated framework for all standards-related information
4
Methodology
Transform PDF documents into text files
(including scanned image files)
Parse files using custom program based on regular expressions (patterns of strings)
Extract identified text segments
Load into database
5
Example: Normative references
Identify section “Normative references” in document
Look at beginning of line for header (e.g. IEC, CISPR, ISO) followed by numeric reference
6
Normative references: inverse relationship
For a given standard, which other standards use it as a normative reference
Provides an indication of the relevance of a standard and its use by other TC/SCs
Promotes dialogue and cooperation between TC/SCs and a systems approach
7
Other examples for automatic extraction
8
Centralized platform for standards data
Bring together all relevant information under a single integrated interface
Normative references
9
Standards data (cont.)
Referenced by
10
Standards data (cont.)
Terms and definitions
11
Standards data (cont.)
National adoptions
12
Conclusion
There exist simple and cost-effective ways to extract structured information from IEC standards (1 week effort to develop utility)
Providing standards data under a comprehensive, integrated framework will make it easier for end users to find and apply relevant standards
13
INTERNATIONAL
ELECTROTECHNICAL
COMMISSION