lecture - CARBOOCEAN

advertisement
Use of www to achieve
environmental data
Benjamin Pfeil
Bjerknes Centre for Climate
Research / University of Bergen
Ways how to get data...
...but also
• Often data shows a snapshot of the
environment at that time/space
• Sampling can be very expensive (average
of over 900.000 NOK for one data set for
bio-, geoscience - including costs for
expeditions, laboratories, etc)
• Therefore is data very valueable for future
scientific work and has to be archived and
made available
Why do we need data?
•
•
•
•
•
•
Verification of research results
Comparison of results
Indication of trends
Model input
Remote sensing
Etc.
Some facts about data in the
scientific community
• Scientific instruments and computer
simulations create large amount of data
• Due to new measurement (and better
precision) are the data volumes doubling
each year
• Scientific data has to be archived
according to ”Good scientific practise in
research and scholarship” (European Science
Foundation 2000)
Global increase in publications in empirical sciences
30
?
25
20
Publications
Data
15
10
5
0
1970
1980
1990
2000
2010
Good scientific practice in research and scholarship
European Science Foundation (ESF), 2000
Data accumulation, handling, and storage
36. Data are produced at all stages in experimental research and in
scholarship. Data sets are an important resource, which enable later
verification of scientific interpretations and conclusions. They may also
be the starting point for further studies. It is vital, therefore, that all
primary and secondary data are stored in a secure and accessible
form.
37. Institutions may pay particular attention to documenting and
archiving original research and scholarship data. Several codes of
good practice recommend a minimum period of 10 years, longer in
the case of especially significant or sensitive data. National or
regional discipline-based archives should be considered where there
are practical or other problems in storing data at the institution where
the research was conducted.
Principles for dissemination of scientific data (International
Council for Science/CODATA)
4. Scientific advances rely on full and open access to data. Both
science and the public are well served by a system of scholarly research
and communication with minimal constraints on the availability of data for
further analysis. The tradition of full and open access to data has led to
breakthroughs in scientific understanding, as well as to later economic
and public policy benefits. The idea that an individual or organization can
control access to or claim ownership of the facts of nature is foreign to
science.
5. The interests of database owners must be balanced with
society’s need for open exchange of ideas. Given the substantial
investment in data collection and its importance to society, it is equally
important that data are used to the maximum extent possible. Data that
were collected for a variety of purposes may be useful to science. Legal
foundations and societal attitudes should foster a balance between
individual rights to data and the public good of shared data.
International Council for Science
(ICSU)
• Founded in 1931 to promote international
scientific activity in the different branches
of science and its application for the
benefit of humanity
• One of the oldest non-governmental
organizations
• More than 135 nations adhere to it
• ISCU established the World Data Center
system in the 1950s
Source: www.iscu.org
World Data Center system
Mission Statement of the World Data Center
System
• Data constitute the raw material of scientific
understanding. The World Data Center
system works to guarantee access to solar,
geophysical and related environmental data.
It serves the whole scientific community by
assembling, scrutinizing, organizing and
disseminating data and information
Network of ICSU WDCs
•Airglow
Mitaka,Japan
•Astronomy
Beijing, China
•Meteorology
Asheville NC, USA
Beijing, China
Obninsk, Russia
•Marine Geology and Geophysics
•Nuclear Radiation
Boulder CO, USA
Tokyo, Japan
Moscow, Russia
•Atmospheric Trace Gases
Oak Ridge TN, USA
•Aurora
Tokyo, Japan
•Cosmic Rays
Toyokawa, Japan
•Earth Tides
Brussels, Belgium
•Geology
Beijing, China
•Geomagnetism
Copenhagen, Denmark
Edinburgh, UK
Kyoto, Japan
Colaba, India
•Glaciology
Boulder CO, USA
Cambridge, UK
Lanzhou, China
•Human Interactions in the Environment
Palisades NY, USA
•Ionosphere
Tokyo, Japan
•Marine Environmental Sciences
Bremen, Germany
WDC Co-ordination Offices
Washington DC, USA
Beijing, China
•Oceaography
Obninsk, Russia
Silver Spring MD, USA
Tianjin, China
•Recent Crustal Movements
Ondrejov, Czech Republic
•Paleoclimatology
Boulder CO, USA
•Renewable Resources and Environment
Beijing, China
•Remotely Sensed Land Data
Sioux Falls SD, USA
•Rockets and Satellites
Obninsk, Russia
•Rotation of the Earth
Obninsk, Russia
Washington DC, USA
•Satellite Information
Greenbelt MD, USA
•Seismology
Denver CO, USA
Beijing, China
•Soils
Wageningen, The Netherlands
•Solar Activity
Meudon, France
•Solar Radio Emission
Nagano, Japan
•Solar Terrestrial Physics
Boulder CO, USA
Didcot Oxon, UK
Moscow, Russia
Haymarket, Australia
•Solid Earth Geophysics
Beijing, China
Boulder CO, USA
Moscow, Russia
•Space Science
Beijing, China
•Space Science Satellites
Kanagawa, Japan
•Sunspot Index
Brussels, Belgium
Where and how do you get data?
• Ok, now you have been (hopefully)
listening for some time, but how can you
have access to environmental data?
• You have 15-20 minutes in order to find
environmental data using the internet
Good luck 
What are scientific data and how
can they be structured?
What are data ?
DataSet title: VogelsangE et al 2001/Age control of sediment core V23-81
Reference:
Broecker, WS et al (1988): Preliminary estimates for the radiocarbon age of deepwater …
Bond, GC et al (1993): Correlations between climate records …, Nature, 365: 143-147
Sarnthein, M; Winn, K; Jung, S J A; Duplessy, J C; Labeyrie, L D … (1994): Changes in east…
Project:
Glacial Atlantic Mapping and Prediction (GLAMAP2000)
Event:
V23-81 * LATITUDE: 54.2500 * LONGITUDE: -16.8300 * ELEVATION: -2393.0 *
DATETIME: 18 Oct 1966 00:00:00 * GEAR: Piston corer, unspec. * CAMPAIGN: V23
Parameter:
Age, dated - Age, dated [kyr] * … METHOD: AMS 14C dating. Broecker et al. 1988. …
Dated material - Age, dated material * PI: Sarnthein
Sed rate - Sedimentation rate [cm/kyrs] * PI: Sarnthein * METHOD: calculated
PI:
Sarnthein, Michael, e-mail: ms@gpi.uni-kiel.de
Data details: http://www.pangaea.de/Cores/Age/V23-81.pdf
Source:
Depth [m]
0.015
0.075
0.075
0.620
:
3.310
3.355
:
PANGAEA - DataSet ID: 59872
Age, dated
[kyr]
1.67
0.05
0.10
0.15
:
21.21
21.70
:
Age, error
[kyr]
0.090
0.090
0.100
0.150
:
:
:
Age model
[kyr]
Sed rate
[cm/kyrs]
1.32
6.73
:
:
:
3.6
:
:
:
:
:
:
Dated material
G. inflata
G. bulloides
:
:
:
N. pachyderma sin.
N. pachyderma sin.
Metadata – describing your data
who
what
how
where
when
Principal investigator(s) (PI), Project(s)
Title, Identifier (DOI)
Data types, Parameter [unit]
Quantities
Methods
Reference(s)
Spatial coverage -> geographical positions
Sampling event, Campaign, Location
Temporal coverage ->
Level of scale
Ratio
Quantitative, zero included
e.g. Kelvin scale { 15.456; -3.2; 760; 0 }
Interval
Quantitative, no zero, equal intervals (addition, subtraction), but
no proportions
e.g. Fahrenheit scale
Ordinal
Semiquantitative, rank-ordered, intervals may not be equal
e.g. { first; second; third } { rare; frequent; abundant }
Nominal
Qualitative, no ordering implied
e.g. { male; female } { red; green; blue }
Classification schemes
Technical
numerical data
text data
pictures
Processing level
tertiary data
interpretations, aggregated data
(e.g. timeslices)
SSTformam
secondary data
primary data
calculated from raw data
(e.g. paleotemperatures)
raw data
(e.g. counts, d18O)
SSTMG/CA
SSTalkenone
What are geocodes?
LATITUDE
(decimal degrees)
LONGITUDE (or degree, minute, second)
UTM (Universal Transverse Mercator)
Spatial
Temporal
DEPTH, sediment [m]
DEPTH, ice/snow [m]
DEPTH, water [m b.s.l.]
ALTITUDE [m a.s.l.]
ELEVATION [m a.s.l.]
ORDINAL NUMBER eg. Tree ring
DISTANCE [cm]
DATE/TIME
AGE [kyr BP]
Geocodes – the third dimension
Ice
Land
Lake
Outcrops
(depth, distance,
ordinal number)
Altitude /
Elevation
Depth in ice
Shelf
Ocean
Depth in water
Depth in water
Corals (distance)
Depth in sediment
Warves (ordinal number)
Depth in ice
Trees (ordinal number)
Depth in sediment
Geocodes – temporal
DATE/TIME
Calendars & timezones
GEOLOGIC AGE
relative age dating
bio- / lithostratigraphy
Absolute age dating
radiometric time scale
chronography
Warves
Corals
nominal ages
Trees
absolute ages
Ways to archive data
Technical data organisation
File systems
disadvantage: low consistency of data
advantage: fast & cheap archiving procedure (on a short run)
Relational databases (RDBs)
disadvantage: work intensive archiving procedure,
needs high degree of data organization
usage for mass data is limited
advantage: high consistency of data,
low costs for data curation,
good retrieval qualities
Mixed
Relational database -> geocoded data & metadata
File system -> mass data (geophysical data, pictures, films)
How to make data available to
science?
Possible problems in retrieving
data from the net
• Version conflicts (data is archived in many data centres – in different
stages e.g. raw data, quality controlled, etc.)
• Bad documented metadata and data (methods, units, unclear parameter
definitions, etc)
• Just metadata is available online – data has to be requested
• Naming of cruises varies in many countries > hard to identify same
cruises
• Date formats (mm/dd/yyyy; yy/mm/dd; dd/mm/yyyy etc)
• Ways to report the position (Lat/Long, UTM)
• Different export formats (plain text, xml, netCDF, etc)
• Different entities (one data set = data from one cruise or data from one
station or data from one)
• Data set is too large to be downloaded (e.g. model data)
Result: Can take a lot of time to create large
homogenic data collections!
(Some) important WDCs for
environmental data
• WDC for Atmospheric Trace Gases Carbon Dioxide
Information Analysis Center USA
• WDC for Climate Model and Data Max-Planck-Institute
for Meteorology GERMANY
• WDC for Glaciology, Boulder University of Colorado USA
• WDC for Marine Environmental Sciences Center for
Marine Environmental Sciences (MARUM) GERMANY
• WDC for Marine Geology & Geophysics, Boulder USA
• WDC for Oceanography, Silver Spring USA
Remember that WDC is a status!
There are many national and international data
centres as well which are no WDC e.g.
ICES – International Council for the Exploration
of the Sea, Denmark
BODC – British Oceanographic Data Centre, UK
BADC – British Atmospheric Data Centre, UK
NODC – National Oceanographic Data Center,
USA
NMD - Norsk marint datasenter, Norway
World Data Center for Marine Environmental Sciences
(WDC-MARE) at University of Bremen, Germany
• is aimed at collecting, scrutinizing, and disseminating
data related to global change in the fields of
environmental oceanography, marine geology,
paleoceanography, and marine biology. It focuses on
georeferenced data using the information system
PANGAEA. The WDC stores and handles numeric,
string, and image data. Users can retrieve data through
the Internet via different gateways.
• offers data management services, in particular project
data management and data publication. It maintains an
inventory of site and sampling locations for all related
fields. It provides hosting and mirroring of electronic
journals and serves software products for analyzing,
visualization, and transformation of data.
How to access dat via WDC-MARE
http://www.wdc-mare.org/ or http://www.pangaea.de/
• Data is available via www using the search
engine PangaVista
www.pangaea.de/PangaVista
• use it like
• E.g. Search by parameter, scientist, region,
project, research vessel, institute, etc
You can either view the data online
Or dowload the data
Nice,
but what else can I do with the data
Since all data at WDC-MARE is archived in
a relational database it can be easily
converted to other formats like:
Ocean Data View
ArcGIS
PanPlot (Open Source plotting software)
Pan2Application – converter for
data from WDC-MARE
Ocean Data View
http://odv.awi-bremerhaven.de/
Ocean Data View (ODV) is a software
package for the interactive exploration,
analysis and visualization of
oceanographic and other geo-referenced
profile or sequence data.
PanPlot
http://www.pangaea.de/Software/PanPlot/
Networking between different data
holders is essential
The user can use one website in order to
find metadata and data that is archived in
many different data centres
Global Change Master Directory
Gives
access to
metadata,
but can be
hard to find
the data
Gives access to metadata and links directly to the
data set
Thanks for listening!
Questions?
Comments?
Download