World Data Center Climate: Terabyte Data Storage in a Relational Database System

advertisement
World Data Center Climate:
Terabyte Data Storage in a
Relational Database System
Michael Lautenschlager,
Hannes Thiemann and Frank Toussaint
ICSU World Data Center Climate
Model and Data / Max-Planck-Institute for Meorology
Hamburg, Germany
WS Spatiotemporal Databases for
Geosciences, Biomedical sciences and Physical sciences
Edinburgh, November 1st + 2nd, 2005
WDCC Home: www.wdcc-climate.de / WDCC Contact: data@dkrz.de
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 1
Content:
Introduction of WDCC
CERA2 Data Model
Data Access
Connection to Mass Storage Archive
Summary
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 2
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 3
WDCC Content
Oktober 2005: 580 Experiments / 68.000 Data Sets
Data from
Earth System
Modelling and
Related
Observations
WOCE
ERA40
BALTEX
CARIBIC
GEBCO
HOAPS
IPCC
NCEP
EH5/MPI-OM
IPCC-AR4
CEOP
COSMOS
ERA15/40
Simulations @ MPI, GKSS,…
Start: Approved in January 2003
Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing
Centre (DKRZ)
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 4
WDCC Access
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 5
WDCC Size
4.6 Billion BLOBs
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 6
WDCC DB Storage
Storage of global
coverages per
file or BLOB :
how we get the grid data:
Files from climate model
levels
all levels, all parameters
arbitrary time intervals
parameters
postprocessing step 1:
homogenizing time and
calculation of diagnostics
levels
all levels, all parameters
1 moment (6 by 6 hours)
postprocessing step 2:
isolation of levels &
parameters and creation
of BLOB table input
parameters
1 level, 1 parameter
1 moment
(= 1 BLOB = 1 global field)
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 7
Data Model
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 8
CERA1) Concept:
Semantic Data Management
(I) Data catalogue and Unix files (pointer or BLOB-tableentry)


Enable search and identification of data
Allow for data access as they are (coarse granularity)
(II) Application-oriented data storage
 Time series of individual variables are stored as BLOB
entries in DB Tables (fine granularity)
Allow for fast and selective data access
 Storage in standard data format (GRIB, NetCDF)
Allow for application of standard data processing routines
(PINGOs, CDOs)
1) Climate and Environmental data Retrieval and Archiving
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 9
WDCC Data Topology
Level 1 - Interface:
Metadata entries
(XML, ASCII)
+ Data Files
Level 2 – Interf.:
Separate files
containing BLOB
table data in
application
adapted structure
(time series of
single variables)
Experiment
Description
Unix-Files
Table / Pointer
Dataset 1
Description
Dataset n
Description
BLOB Data
Table
BLOB Data
Table
BLOB DB Table corresponds to scalable,
virtual file at the operating system level.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 10
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 11
CERA Data Model
Reference
Status
Distribution
Contact
Coverage
Entry
Parameter
Data OrgLocal Adm.
Data Access
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 12
Spatial
Reference
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 13
CERA Modules
3 Modules:
• DATA_ACCESS
for automatted data access
( remote data access)
• DATA_ORG
organization of grid data
( geo-references of grid points in BLOBs)
• CODE
matching of (internal) model code numbers
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 14
Data Model Functions
The CERA2 data model …
allows for data search according to discipline, keyword, variable,
project, author, geographical region and time interval and for data
retrieval.
allows for specification of data processing (aggregation and selection)
without attaching the primary data.
is flexible with respect to local adaptations, to storage of different
types of geo-referenced data, and to definition of data topologies
(hierarchical, network, ….).
is open for cooperation and interchange with other database systems
(e.g. FGDC metadata standard and ISO 19115 included).
But:
is not the simplest data model for each single application.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 15
Data Access
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 16
Web Access to WDCC
METADATA:
DATA:
GUI:
display in applet
JDBC
jblob-script:
Search for
DS names
JDBC
jblob –f …
http:
- html-display
- xml-download
(ISO, DC, …)
download
http
URL:
http://…
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 17
Interactive Catalogue Access
web browser
http:
html
dynamic
html pages
Servlet / JSP
lnternet
Application
Server
request: URL
Catalogue access via WWW
• URL parsed by JSP
• integrated DB retrieval by JSP
• response in standard html
• efficient administration of
detailed meta information
HTTP and JDBC Data Download
lnternet
Application
Server
Servlet / JSP
Data download via WWW
request: html form
http:
file download
Data download via script/batch
write to
client disk
• request handeled by JSP
• return of binary file
• standard client side jdbc retrieval
• return of binary file
request: jdbc
jdbc
file download
web browser
progr. „jblob“
write to
client disk
XML Interface for http Metadata Output
request: URL
user applications
lnternet
Application
Server
xsql –query
raw xml
xhtml
http:
XML
xsl –
mapping
see wini.wdc-climate.de
Metadata access via WWW:
• xsql query to DB
• xml output from DB
• xsl mapping to any metadata format
ISO xml
DC xml
... various
metadata
formats
http Data Output
request: URL
user applications
lnternet
Application
Server
Java Servlet
plain ASCII
html tables
http:
plain, bin,
html
Data access via WWW
• URL parsed by servlet
• query: DB access by jdbc
• response in any format
binary objects
. . .
various
data
formats
Connection to
Mass Storage Archive
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 22
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 23
Oracle
DBMS
+ HSM
DXDB:
Unitree client
on DB machines for
communication
between
Oracle DB and tape
archive
Tapes
Disks
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 24
Use of DXDB
DXDB is used for
 Ordinary Oracle datafiles
 Redo logs
 Backup
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 25
Migin
Migout
dxdb
TBS - RW
TBS - RO
Tbl
Partition 1
All
tablespaces
are moved
“at once” to
dxdb
Tbl
Partition 21
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 26
Migout / Migin
 Migout takes place after files haven’t been modified for x
minutes
 Only one migout process per dxdb-filesystem
 Migin takes place immediately after a file is requested.
Only parts accessed are retrieved from the backend
storage.
 One migin process per requested file.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 27
Purging
dxdb
HWM
LWM
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 28
Pro
 It works
 It’s fast
 Applications don’t have to wait until files are completely
restored from tapes.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 29
Contra
 It works
- If the backend works
 Dxdb not supported by Oracle
 Oracle's officially supported Backend requirements do
not necessarily match requirements from other
applications like HSM systems
(i.e. connection to Unitree is not standarised).
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 30
Summary
Efficient handling of detailed metadata
• easy and structured administration of > 60 metadata tables
• access support:
Java Server Pages (JSP), Servlets, jdbc, xsql
including standard DB features (sql, views, triggers, ... )
Efficient handling of fine granularity data
• random access to arbitrary time steps of single parameters
• access support:
Java Server Pages (JSP), Servlets, jdbc
including standard DB features (authorisation, ... )
• transparent migration of bulk data to tape
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 31
The Winter TopTen Program
identifies the world’s largest and
most heavily used databases.
Email reached in September, 13th:
….. Congratulations on achieving Grand Prize award winner status (1)
in Database Size, Other, All and TopTen Winner status Database Size,
Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen
Program! .......
(1) Grand prizes are awarded for first place winners in the All
Environments categories only.
WDCC's CERA DB has been identified as the largest Linux DB.
http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 32
Download