DSPL Developer Guide

advertisement
DSPL Developer Guide
DSPL stands for Dataset Publishing Language. It is a representation format for both the metadata (information
about the dataset, such as its name and provider, as well as the concepts it contains and displays) and actual data
of datasets. Datasets described in this format can be imported into the Google Public Data Explorer, a tool that
allows for rich, visual exploration of the data.
Note: To upload data to Google Public Data using the Public Data upload tool, you must have a Google
Account.
This document is intended for data owners who want their content to be available in the Public Data Explorer. It
goes beyond the Tutorial by diving deeper into the details of the DSPL schema and supported features. Only a
basic familiarity of XML is assumed, although knowledge of relational databases is also useful.
Although not a requirement, we suggest reading through the Tutorial, which is shorter and easier to digest,
before looking at this document.
Contents
1. Overview
2.
1. Process
3. XML Structure
4.
1. Overview
2. Header and Imports
3. Dataset Information
4. Provider Information
5. Concepts
6. Slices
7. Tables
8. Topics
5. DSPL Data Files
6.
1. Concept Data Files
2. Slice Data Files
7. Advanced Features
8.
1. Multi-Language Datasets
2. Mappable Concepts
3. Concept Relationships
9. Submitting Your Dataset
Overview
A DSPL dataset is a .zip file that contains an XML file and a set of CSV files. The CSV files are simple tables
containing the data of the dataset, while the XML file describes the metadata of the dataset. The latter includes
informational metadata like descriptions of measures, as well as structural metadata like references between
tables. This metadata lets non-expert users explore and visualize your data.
Process
In general, the process of creating a DSPL dataset is as follows (some steps may take place in parallel):
1.
2.
3.
4.
5.
6.
Create your DSPL XML file.
Identify any external data sources to use in your dataset.
Define your concepts, slices, and (optionally) topics. Iteratively update the content of your DSPL file.
Export your source data to .csv files.
Create a DSPL dataset.
Submit the dataset to Google.
XML Structure
Overview
The DSPL XML file defines the metadata of the dataset, including structural relationships between concepts,
slices, topics, and tables. Although it is possible to create this file by hand, data processing tools and scripts can
greatly streamline the process. See a sample DSPL file in a new window.
The file includes a number of sections, which are summarized in the table below. Following the table, we
describe each of the former in greater detail.
Section
Summary
More Info
The parent for all of the other elements of the dataset. Includes the
Header and
target namespace (i.e., identifier) for the dataset, along with the
Imports
namespaces of any imported datasets.
Documentation
Dataset
The name, description, and URL of the dataset.
Information
Documentation
Provider
The name, description, and URL of the dataset provider.
Information
Documentation
Definitions of "things" that appear in the dataset (e.g., countries,
unemployment rate, gender, etc.)
Documentation
Concepts
Each concept has a unique identifier, which can be referenced by
slices and tables.
Combinations of concepts for which there is statistical data in the
dataset. Each slice contains dimensions and metrics.
Slices
Slices reference concepts and also tables, which contain the actual
data. Each slice has a unique identifier that can be referenced by the
tables containing the actual data.
Documentation
Tables
Define the data for concepts and slices. Concept tables hold
enumerations and slice tables hold statistical data. Tables are defined Documentation
in the XML file, and point to .csv files containing the actual data.
Topics
Categories for organizing dataset concepts. While not required, these
Documentation
can be very helpful for users navigating your data.
Header and Imports
Declaring the Public Data namespace
A DSPL dataset begins with a top-level, <dspl> element. This is used to enclose all dataset information and to
indicate any namespaces that will be used throughout the file. Here's example:
<?xml version="1.0" encoding="UTF-8"?>
<dspl targetNamespace="http://www.example.com/mystats"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.google.com/dspl/2010" >
...
</dspl>
A namespace is a unique identifier that can be associated with an XML schema (a set of XML elements and
attributes). The targetNamespace provides a URI that identifies your dataset. This URI is not required to point
to an actual resource, but it's a good idea to have the URI resolve to a document describing your content or
dataset.
You are not required to provide a targetNamespace. If you don't, then one will be generated automatically for
you at import time.
The targetNamespace attribute is followed by a series of xmlns attributes specifying other XML schemas that
will be used in the file. Every DSPL file must include the Google Public Data schema, whose URI is
"http://schemas.google.com/dspl/2010" and use it as the default namespace. It should also include the standard,
W3 XML schema identified by "http://www.w3.org/2001/XMLSchema-instance". As described in the next
section, other namespaces can be added to include information from other datasets.
Importing other dataset namespaces
Datasets can reuse definitions and data from other datasets. Google, for instance, provides a number of basic
datasets that define concepts commonly appearing in user data. For example, most datasets need a concept to
represent years. Instead of defining a new concept, you can use the year concept from the
"http://www.google.com/publicdata/dataset/time" dataset. See the Canonical Concepts page for more
information.
To use an external dataset, add the <import> element to the DSPL file just after the namespace declaration, and
indicate the dataspace you are importing, like this:
<import namespace="http://www.google.com/publicdata/dataset/google/time"/>
Then, add the imported namespace (in this case,
time="http://www.google.com/publicdata/dataset/google/time")
to the namespace declaration at the
top of your file, like this:
<?xml version="1.0" encoding="UTF-8"?>
<dspl targetNamespace="http://www.stats-bureau.com/mystats"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.google.com/dspl/2010"
xmlns:time="http://www.google.com/publicdata/dataset/google/time" >
<import namespace="http://www.google.com/publicdata/dataset/google/time"/>
Your DSPL file now can reference elements from the Google Public Data time dataset. Repeat this process for
every dataset you want to reference.
Referencing content in external datasets
Once you've imported another dataset, you need to be able to refer concepts, slices, and data from that dataset.
To do this, you can use references of the format prefix:other_id, where prefix is the prefix used for the
namespace of the external dataset.
Here's an example of a reference to the year concept from the time dataset (described above):
<slices>
<slice id="country_slice">
<dimension concept="country"/>
<dimension concept="time:year"/>
<metric concept="population"/>
<table ref="country_slice_table"/>
</slice>
...
</slices>
Dataset Information
The <info> element includes descriptive information about the dataset. An example and details on the relevant
XML elements are listed below.
Example
<info>
<name>
<value>Unemployment Rates</value>
</name>
<description>
<value>Worldwide unemployment rates by region</value>
</description>
<url>
<value>http://www.example.com/mystats/info.html</value>
</url>
</info>
Elements
Element
Required?
Description
Encloses
all
descriptive
information
about the dataset. Includes the child elements
<info>
Yes
<name>, <description>, and <url>.
Child of <info>. Includes the child element <value>, which identifies the name of
<name>
Yes
the dataset.
Child of <info>. Includes the child element <value>, which includes a text
<description>Optional
description of the dataset.
<url>
Yes
Child of <info>. A link to the a URL with more information about the dataset.
Provider Information
The <provider> element lists information about the dataset provider. An example and details on the relevant
XML elements are listed below.
Example
<provider>
<name>
<value>Bureau of Statistics</value>
</name>
<url>
<value>http://www.example.com</value>
</url>
</provider>
Elements
Element Required?
Description
Encloses
all
descriptive
information
about
the dataset provider. Includes the child
<provider>Yes
elements <name> and <url>.
Child of <provider>. Includes the child element <value>, which identifies the name of
<name>
Optional
the dataset provider.
<url>
Optional Child of <info>. A link to a URL with more information about the dataset provider.
Concepts
Description
Each dataset contains one or more concepts. A concept is a definition of a type of data that appears in a dataset.
A dataset with demographic population data, for example, could have the concepts country, state, population,
and year. The data values corresponding to a given concept are called instances of that concept. Concepts are
usually described in the dataset, but some concepts (such as time or year) may be described in external datasets.
Each concept can have one or more properties. A property is a characteristic of a concept instance that is stable
over time. For example, the country concept could have the properties name, population, and capital.
Concepts can also have one or more attributes. Attributes provide information at the level of the concept, not its
individual instances. For example, if we had a dataset with an unemployment rate concept, we could use an
attribute to designate that this concept is a percentage. Another example of a common use of attributes is to
provide unit information.
Example
Here's an example of a country concept with the unique id country, and the property name. The concept id can
be used to reference the concept from slices and tables.
<concept id="country" extends="geo:location">
<info>
<name><value>Country</value></name>
<description>
<value>My list of countries.</value>
</description>
</info>
<type ref="string"/>
<property id="name">
<info>
<name><value>Name</value></name>
<description>
<value>The official name of the country</value>
</description>
</info>
<type ref="string" />
</property>
<property concept="geo:continent" isParent="true"/>
<property id="capital" concept="geo:city" />
<table ref="countries_table" />
</concept>
Here's how this example works.






This code describes the concept country, which has the id country and the properties name, continent,
and capital.
The concept extends geo:location, the canonical concept for locations. By extending geo:location,
country inherits all the properties and attributes defined by the extended concept: properties name,
description, url, latitude and longitude. It's okay for country to redefine some of these attributes and
properties, as long as the definition is consistent with the one provided by the extended concept.
The concept <info> element describes the key information about the concept. This is displayed on the
dataset's landing page in the Public Data Explorer.
The concept <type> element refers to the type of content. In this case it's string, but this could vary. The
concept Population would have the type integer; the concept Eurovision winner could have the type
boolean.
A <property> element describes each property of the concept, including its unique ID (id), info and
type. Properties may also reference concepts, to indicate that their values are valid instances of those
concepts.
The concept references a data table that points to the CSV file containing the actual data. The data table
is referenced like this: <table ref="countries_table"/>.
If your concept references a table, the associated data file must list all instances of the concept. You
cannot, for example, create a table that lists only a few of the countries included in the dataset. (If there
is a subset of countries you care about, you can create a separate concept to describe them. For example,
mycountries.)
Elements
Element
Required?
Description
<concepts>
Yes
Top-level element. Encloses all <concept> elements.
Identifies the concept. The value of the required attribute id must be unique to the
concept within the dataset. If the concept references a concept data table, the value
of id must match the column heading describing the concept in the data table. An
<concept>
Yes
extends attribute may be used to denote that this concept extends another concept.
The value of extends must match the id of a concept defined in the same dataset, or
be of the form prefix:concept_id, where concept_id is the id of a concept
defined in the imported external dataset associated with prefix.
<info>
Optional Encloses descriptive information about the concept.
Child of <info>. The name of the concept. The child element <value> contains the
<name>
Yes
text - for example, Country.
Child of <info>. Includes the child element <value>, which includes a text
<description>Optional
description of the concept.
Child of <info>. Includes the child element <value>, which includes a URL for the
<url>
Optional
concept.
Child of <info>. The plural name for the concept. The child element <value>
<pluralName> Optional
contains the text - for example, Countries.
Child of <info>. The name for the combination of all instances of the concept. The
<totalName> Optional child element <value> contains the text- in the case of a country concept, for
example, this might be World.
Identifies the type of content described by the concept. The required attribute ref
has the following allowed values:
<type>
Optional





string
float
integer
date
boolean
The type may be omitted if the concept extends another concept, in which case it is
inherited from the extended concept.
A property of the concept, such as capital. The value of the required attribute id
must be unique to the concept. An optional concept attribute may be used to
indicate that values of this property are instances of a given concept. If concept is
specified, then id may be omitted; its value is implicitly defined as the id of the
referenced concept (e.g., <property concept="geo:country"/> is equivalent to
<property id="country" concept="geo:country"/>).
<property>
A property may contain a Boolean isParent attribute, to indicate that the
Optional relationship between an instance of the concept and the value of this property is
hierarchical.
A property may contain a Boolean isMapping attribute, to indicate that there is a 11 mapping between the instances of the concept and the values of the property.
<attribute>
<table>
A property may specify a nested info and type, which are defined just as they are
for a concept. type is required if the property does not specify a concept attribute,
and must match the type of the referenced concept if it does.
An attribute of the concept. Attributes represent additional information about the
concept (e.g., GDP is a percentage). The value of the required attribute id must be
unique to the concept. An optional concept attribute may be used to indicate that
values of this attribute are instances of a given concept. If concept is specified, then
id may be omitted. Its value is implicitly defined as the id of the referenced concept.
Optional (e.g., <attribute concept="unit:unit"/> is equivalent to <attribute
id="unit" concept="unit:unit"/>.
An attribute may specify a nested info and type, which are defined just like for a
concept. type is required if the attribute does not specify a concept attribute, and
must match the type of the referenced concept if it does.
Identifies the data table containing data for the concept. The value of the required
Optional
ref attribute must match the table ID specified in the related <table> element.
Slices
Description
A slice is a combination of concepts for which data exist. A slice contains two kinds of concept references:
dimensions and metrics. A dimension is a concept that is used to segment or filter your data. A metric, on the
other hand, describes the observed value or values associated with each data point.
Generally, dimensions are categorical whereas metrics are non-categorical, time-varying, numeric values. Some
prototypical examples of each are as follows:


Dimensions: Country, state, county, region, year, month, sex, age category, industry segment
Metrics: Population, GDP, unemployment rate, literacy, revenue, cost, price
Example
<slices>
<slice id="country_slice">
<dimension concept="country"/>
<dimension concept="time:year"/>
<metric concept="population"/>
<table ref="country_slice_table"/>
</slice>
...
</slices>
Here's how this example works.




This slice represents population by country.
It has the metric population, and the dimensions country, and year. Each dimension is a concept
already defined elsewhere. The concept country and the metric population exist in the same dataset as
the current slice, and are referenced like this: concept="country"
The concept year exists in the imported dataset time, identified by the prefix used before the concept
name (year), like this: concept="time:year"
The slice references a data table that points to the CSV file containing the actual data. The data table is
referenced like this: <table ref="country_slice_table"/>. (See above for information on importing
datasets.)
Note: In general, your dataset will be more flexible if you keep metrics to a minimum, and instead create
meaningful dimensions. For example, instead of creating the metrics Female Unemployment and Male
Unemployment, create the single metric Unemployment, and add the dimension Gender that has the instances
Female and Male.
Elements
Element
Required?
Description
Yes
Top-level element. Encloses all <slice> elements.
Identifies the slice. The value of the required attribute id must be unique to the
<slice>
Optional
slice.
Defines a dimension of the slice, by referencing a concept. The value of the
<dimension>
Optional required attribute concept must exactly match the unique id of the concept, and
use a valid prefix if the concept belongs to an external imported dataset.
Defines a metric of the slice, by referencing a concept. The value of the required
<metric>
Optional attribute concept must exactly match the unique id of the concept, and use a valid
prefix if the concept belongs to an external imported dataset.
Identifies the data table containing data for the slice. The value of the required ref
<table>
Yes
attribute must match the table ID specified in the related <table> element.
Child of <table>. Contains the attributes concept and toColumn; the value of the
<mapDimension>Optional first is a dimension in the slice, and the value of the second is the table column
corresponding with the former.
<mapMetric>
Optional Child of <table>. Contains the attributes concept and toColumn; the value of the
<slices>
first is a metric in the slice, and the value of the second is the table column
corresponding with the former.
Tables
Description
The tables section of the DSPL file identifies the data tables included in the dataset. These tables can be
referenced by concepts or by slices. Each <table> element specifies the columns of the tables and their types,
and points to a CSV file containing the table data.
Example
<tables>
<table id="country_slice_table">
<column id="country" type="string"/>
<column id="year" type="date" format="yyyy"/>
<column id="population" type="integer"/>
<data>
<file format="csv" encoding="utf-8">country_slice.csv</file>
</data>
</table>
...
</tables>
Here's how this sample works.




This sample describes the table country_slice_table. The table has the columns country, year, and
population.
Each column in the table has a unique id, defined by the id attribute. This id must exactly match the
appropriate column heading in the associated data file.
The value of the optional type attribute defines the data type for each column.
The <data> element describes the actual .csv file (country_slice.csv) containing the data for the table.
The file format is always csv.
Elements
Element Required?
<tables>Yes
<table> Yes
<column>Optional
<data>
Description
Top-level element. Encloses all <table> elements.
Identifies the table. The value of the required attribute id must be unique to the table.
Child of <table>. Information about a column included in the table. Includes the
following attributes:


id (required): The id of the column.
type (optional): The data type of the information in the specified
values are: string, float, integer, date, or boolean.
column. Allowed
Child of <table>. The data file referenced by the table. If the file name is in the form of a
URL (e.g., http://...), then the file will be fetched via the appropriate protocol (HTTP,
Optional HTTPS, or FTP); otherwise, a file with this name must be bundled with the dataset. The
value of the required attribute format is always csv. Although the encoding attribute is
optional, your .csv files must be UTF-8 encoded.
Topics
Description
Topics classify concepts hierarchically, allowing users to navigate through your dataset more easily.
The <topics> element should appear right before the <concepts> element in your DSPL file. (The order of
elements is important, and you may not be able to upload your dataset if your elements appear in the wrong
order.) To use topics, reference them from the concept definition.
Example
Here's an example topic definition:
<topics>
<topic id="population_indicators">
<info>
<name>
<value>Population indicators</value>
</name>
</info>
</topic>
...
</topics>
...and here's an example reference to this topic from a concept:
<concept id="population">
<info>
<name>
<value>Population</value>
</name>
<description>
<value>Size of the resident population.</value>
</description>
<topic ref="population_indicators"/>
<type ref="integer"/>
</concept>
Topics can be nested, and a concept can reference more than one topic.
Element definition
Element Required?
<topics>Yes
<topic> Yes
<info> Optional
<name> Optional
Description
Top-level element. Encloses all <topic> elements.
Identifies the topic. The value of the required attribute id must be unique to the dataset.
Child of <topic>. Encloses information about a topic.
Child of <info>. Its child element <value> specifies the name of the topic.
DSPL Data Files
In addition to the XML metadata file, a DSPL dataset can also include one or more data files in CSV format.
Each data file supports a table in the dataset, and is referenced from the former in its <data>...</data>
section. Conceptually, these files and their associated tables are used to represent either concept definitions or
slice data. Each of these data file types is described in more detail below.
Note that, regardless of the purpose, all data files must be comma-delimited (CSV) UTF-8 text files. The files
must contain only plain text; no HTML. You can create the data files manually, but realistically you will need
to massage the data either in the tool containing the original data source (e.g., a spreadsheet), or in the exported
file itself.
Files can be bundled with the dataset or, if the name is in the form of a URL, fetched via HTTP, HTTPS, or
FTP from a remote source.
Concept Data Files
Concept data files contain relevant information for each concept. The concept definition uses the <table>
element to refer to this file.
Example
Here's an example of a table for the country concept defined above:
country, name
AD, Andorra
AF, Afghanistan
AI, Anguilla
AL, Albania
AO, Angola
AQ, Antarctica
AS, American Samoa
Here's how this example works:






Unless mappings are specified, the first line of the data file (column headings) must exactly match the
concept id and the appropriate property ids of the concept with which the data are associated. However,
the order of the columns doesn't have to be the same in the data file and the concept table. In this case,
the first column is associated with the concept country, and the second column is associated with the
property name.
The property columns are optional; if a property does not have a column in the table, then its value is
assumed to be undefined for each row. The table above, for instance, omits columns for the latitude
and longitude properties, so the countries will not be mappable.
Each value for the concept's id field (in this case, country) must be unique and non-empty (an empty
field is one with zero or only whitespace characters).
Values for properties that reference other concepts must either be empty or be a valid value of the
referenced concept.
Enclosing values in double quotes is optional except when they contain commas, double quotes, or
newline characters.
Escape a literal double quote that appears in a value by preceding it with another double quote.
Slice Data Files
Slice data files contain relevant data for each slice. The slice definition uses the <table ref="..."> element to
refer to the <table> definition, which in turn identifies this file.
Example
Here's an example of a .csv file containing the data for the population_by_country slice described above:
country, year, population
AF, 1960, 9616353
AF, 1961, 9799379
AF, 1962, 9989846
AF, 1963, 10188299
Here's how the example works:





The metric field is population. The fields country and year are dimension fields.
Each value of a dimension field must be non-empty. This includes time dimensions. Values for metric
fields can be empty. An empty value is represented by no character.
Each column heading that references a concept (for example, the first field of the example above
references the concept country) must exactly match the concept's unique id in the concept definition.
A unique combination of dimension values, e.g. AF, 2000, may occur only once.
Rows in the same time series (i.e., rows that have the same combination of all dimension values except
time) must be grouped together, though they need not be otherwise sorted.
Advanced Features
Multi-Language Datasets
Translated XML Values
You can use the xml:lang attribute with every <value> element in your DSPL file. This attribute specifies the
language of the element's content, using the standard, W3C language tags. Note that the use of this feature is
optional; if no xml:lang attribute is included, the content is assumed to be in English.
The following example shows snipets of a dataset that's in English, Bulgarian, Catalan, and Simplified Chinese:
<dspl ...>
<info>
<name>
<value
<value
<value
<value
</name>
...
</info>
xml:lang="en">World Bank, World Development Indicators</value>
xml:lang="bg">Световна банка, Индикатори за световно развитие</value>
xml:lang="ca">Banc Mundial, Indicadors del desenvolupament mundial</value>
xml:lang="zh-CN">国家/地区</value>
<concepts>
<concept id="country">
<info>
<name>
<value xml:lang="en">Country</value>
<value xml:lang="bg">Страна</value>
<value xml:lang="ca">País</value>
<value xml:lang="zh-CN">国家/地区</value>
</name>
...
</info>
...
</concept>
...
</concepts>
...
</dspl>
Translated Properties
In some cases, you may want to provide translations that go beyond concept-level metadata, applying in
addition (or instead) to individual concept instances. This is particularly useful when the values of a concept
property (e.g., name) vary by language.
To provide such values in multiple languages, create one column in the corresponding definition table for each
property/language combination. Then, link these columns to their associated properties and languages by adding
a set of <mapProperty xml:lang="..." ref="..." toColumn="..."> elements to the table reference tag for
the concept.
Here's an example that defines a country concept with names in English, Spanish, and French:
<concepts>
...
<concept id="country" extends="geo:location">
...
<property id="name">
<info>
<name>
<value>Name</value>
</name>
<description>
<value>The official name of the country</value>
</description>
</info>
<type ref="string" />
</property>
...
<table ref="countries_table">
<mapProperty xml:lang="en" ref="name" toColumn="name_en"/>
<mapProperty xml:lang="es" ref="name" toColumn="name_es"/>
<mapProperty xml:lang="fr" ref="name" toColumn="name_fr"/>
</table>
</concept>
...
</concepts>
...
<tables>
...
<table id="countries_table">
<column id="country" type="string"/>
<column id="name_en" type="string"/>
<column id="name_es" type="string"/>
<column id="name_fr" type="string"/>
...
</table>
</tables>
The CSV file for the countries_table would then have the following form:
country,name_en,name_es,name_fr,...
...
US,United States of America,Estados Unidos de América,États-Unis d'Amérique,...
...
Mappable Concepts
Many concepts (for instance: county, state, and city) have instances corresponding to geographic locations.
DSPL supports geocoding these instances so that they'll be visualizable in the Google Public Data animated
map chart.
If your concept is equivalent to World countries, US states, or US counties, then you can just link to the
corresponding Google canonical concept; no explicit geocoding is required. See the Canonical Concepts Guide
for more details.
If not, then you need to make your concept mappable. The first step is to make it extend from geo:location:
<concept id="..." extends="geo:location">
...
</concept>
Then, you must explicitly add latitude and longitude as properties:
<concept id="..." extends="geo:location">
...
<property id="latitude"/>
<property id="longitude"/>
</concept>
The values for these are then specified as columns in the corresponding concept definition data table.
Concept Relationships
Concepts are often related to other concepts in a structured way. For instance, a continent instance may include
multiple country instances, which, in turn, may contain multiple state or province instances. Encoding these
relationships in the dataset metadata allows for richer visualization features than would be otherwise possible,
e.g., showing a collapsible tree of locations to choose from.
In the sections below, we describe the concept relationships supported in the DSPL schema.
Hierarchies
Concept hierarchies are represented in DSPL through the use of an isParent="true" attribute in a
<property> tag of the child concept, which contains identifiers of instances from the parent concept.
As an example, Google's US County concept has the following form:
<concept id="us_county" extends="geo:location">
<info>
<name>
<value xml:lang="en">County</value>
</name>
...
</info>
...
<property id="state" concept="us_state" isParent="true"/>
...
<data>
<table ref="reference_us_counties"/>
</data>
</concept>
The supporting data table has a state column with the two-letter state code for each county. This type of
metadata allows the Public Data Explorer to show states and counties as a hierarchy, a feature that makes
exploration much easier for users.
Note that a concept can have many children but no more than one parent.
Mappings
Concept mappings (i.e., concepts that represent, fundamentally, the same thing) are represented through an
isMapping="true" attribute in a property tag of the mapped concept.
Specifying that one concept maps to another allows the former to inherit all of the properties and attributes of
the latter. Among other applications, this is useful for "linking" personal geographic concepts with those defined
in Google's canonical geo dataset:
<concept id="my_country" extends="geo:location">
<info>
<name>
<value xml:lang="en">Country</value>
</name>
...
</info>
...
<property id="google_country_code" concept="geo:country" isMapping="true"/>
<data>
<table ref="countries_concept"/>
</data>
</concept>
Extensions
Concept extensions are designated through an extends element in the corresponding concept definition.
Extensions are useful for indicating that a particular concept is a subclass of another, broader concept. The
extended concept inherits all of the attributes and properties of its parent, and can also add additional ones.
As an example, Google's currency concept extends unit:
<concept id="unit">
...
</concept>
<concept id="currency" extends="unit">
<info>
<name>
<value xml:lang="en">Currency unit</value>
</name>
...
</info>
...
<table ref="currency_table"/>
</concept>
See the discussion of concept extensions in the tutorial for more explanation and examples.
Submitting Your Dataset
To submit your dataset to the Google Public Data Explorer, follow these instructions:
1.
2.
3.
4.
5.
Create a directory.
Save the dataset dspl file in the directory you created. Make sure to use the .xml extension.
Save any local .csv files in same directory. Data files that are referenced via URLs can be omitted.
Zip the directory.
Upload your dataset to the Google Public Data Explorer.
Once your dataset is uploaded and validated, you can test it when signed into your Google account. It will not
be published until you've checked it and tell us it's ready.
Download