Running a Data Centre on the Long Term François Ochsenbein (Françoise Genova)

advertisement
Running a Data Centre
on the Long Term
François Ochsenbein
(Françoise Genova)
Centre de Données astronomiques de
Strasbourg (CDS)
François Ochsenbein
Data Centre in the Long Term
1
Lessons learnt from more than 30
years of CDS history
Introduction
 CDS Organisation
 CDS Activities
 Dealing with technical evolution
 Relation with the Astronomical Community
 Conclusions

François Ochsenbein
Data Centre in the Long Term
2
A bit of history...

Creation in 1972 by (French) Institut
National d'Astronomie et de Géophysique
(INAG), now Institut National des Sciences
de l'Univers (INSU) --- 33yrs ago!
 Named CDS = Centre de Données
Stellaires, later renamed Centre de
Données de Strasbourg, after extending
the CDS scope to non-stellar objects
François Ochsenbein
Data Centre in the Long Term
3
CDS Charter
1.
2.
3.
4.

collect useful data on astronomical objects, in
electronic form;
improve them by critical evaluation, comparison
and combination;
distribute the results to the international
astronomical community;
conduct research using these data -- at the time
of CDS creation, the original objective was to
gather stellar data to study the galactic
structure.
provide science tools to the community
François Ochsenbein
Data Centre in the Long Term
4
CDS Organisation
 Catalog Service, VizieR,
data for Simbad
 homogenize  Simbad, catalog metadata
 distribute
 CDS Services
 preserve
 keep original versions
extended to images (Aladin), metadata
(nomenclature), links, registries,...

collect
François Ochsenbein
Data Centre in the Long Term
5
SIMBAD Story
A database runnning over 3 decades
 1971-1981: CSI (Catalog of Stellar Identifications) and
BSI (Bibliographical Star Index)
 1981-1990: SIMBAD was born (Set of Identifications,
Measurements and Bibliography for Astronomical Data) ,


evolved over different mainframe architectures
1990-2005: SIMBAD3 on workstations, using objectoriented concepts
2006-... SIMBAD4 based on Java technology and open
source databases
François Ochsenbein
Data Centre in the Long Term
6
Simbad Evolution...
350,000 stars
IBM360/65
PL1/Assembler
1976
700,000 objects
Univac 1110
Real-Time Update
1988
2,500,000 objects
Sun workstation
C+Object-Oriented
1998
François Ochsenbein
Data Centre in the Long Term
7
Simbad data on the long term
From 300,000 stars to 3.5million astronomical
objects (stars, galaxies, nebulae,...)
 Regular editions of the full data-base as ascii
files stored:




on paper (1972-1977)
on microfiches (1977-1985)
as disk files (1990-...) 1/year ... 1/month
 gives back details on db evolution
 all modifications archived
François Ochsenbein
Data Centre in the Long Term
8
CDS Catalog Service and
Catalog service (storage and distribution
on magnetic tapes) since CDS creation in
1972
 FTP access to CDS collection since 1991
 Reorganisation of the catalog descriptions
(metadata) since 1992
 VizieR (catalogs organised as relational
database) from 1996

François Ochsenbein
Data Centre in the Long Term
9
MetadataEvolutions
1981
François Ochsenbein
Data Centre in the Long Term
2005
10
Catalog preservation

Keep records on modifications
 on
paper until 1991
 on electronic files since 1991
 original file preserved as much as possible

Mirror copies (x4 for data files, x8 for
database)
François Ochsenbein
Data Centre in the Long Term
11
Aladin Sky Atlas
Aladin Server: started around 1994 with
digitized Schmidt plates used for the
Guide Star Catalog, organized as a
database
 Aladin portal: started around 1997 as our
first Java test and becomes a widely used
Virtual Observatory portal

François Ochsenbein
Data Centre in the Long Term
12
From
1997 to 2005
Originally: visualize catalog
data and images at CDS
 Today: a widely used VO
portal

François Ochsenbein
Data Centre in the Long Term
13
CDS organisation

The CDS team integrates staff with
different profiles: astronomers, computer
engineers, specialized librarians
(mainly permanent positions).
Scientific objectives, strategy and work
program discussed in the group chaired by
the CDS director
 From its creation, the CDS activities are
examined by a Scientific Council (6F+6f)

François Ochsenbein
Data Centre in the Long Term
14
The CDS Scientific Council
François Ochsenbein
Data Centre in the Long Term
15
François Ochsenbein
Data Centre in the Long Term
16
Partnership

Historically CDS associated with
participating Institutes:
 Lausanne/Genève
 Astronomisches
(astrometry)
(photometry)
Rechen-Institut, Heidelberg
Observatory (bibliography)
 Marseille Observatory (radial velocities)
 ... and Strasbourg Observatory (spectral
 Paris-Meudon
classification)
François Ochsenbein
Data Centre in the Long Term
17
Partnership (continued)
Participation in projects related to
missions, e.g. Hipparcos, XMM ;
 Participation to the networking of services
(observatory archives, journals, ADS, ...)
 Participation in the Virtual Observatory
enterprise (AVO, IVOA, VOTECH) (bring
interoperability skills, enables rapid
prototyping)
 Involvment with CDS user community

François Ochsenbein
Data Centre in the Long Term
18
Partnership with publishers

The data published in the specialized
literature can't be re-used
PS/PDF files are not re-usable
 interest for the data used ends with the
publication

 convince
the astronomical journal editors
(A&A, AJ, ApJ) to store the tables in a
reusable form (standardized description)
improves the data quality and reliability
François Ochsenbein
Data Centre in the Long Term
19
User Support

Communication with users (astronomers) is a
privilegied way





to get feedback on the data & service (good and bad)
to be aware of the astronomers' wishes for an
efficient research work
to get help e.g. clean up some datasets
to motivate the CDS team
Demos to users (AAS, ...), constant discussions
with other service developers (ADASS)
François Ochsenbein
Data Centre in the Long Term
20
User Support (continued)

Hotline question@simbad.u-strasbg.fr
 in
existence since 15 years
 is still our major source of error report and
feedback
 all astronomers and most engineers
participate to this duty (even the Director...)
 most questions now from non-astronomers
François Ochsenbein
Data Centre in the Long Term
21
Dealing with Technical Evolutions

Example of the Simbad history:



from mainframes managed by computing centers to
clusters of PCs
changes of languages between PL1/assembler, C,
perl, java...
from batch queries and updates on punched cards,
to interactive terminals, real-time updating, graphical
interfaces, WWW, java...
while keeping the scientific quality
 database lifetime ≈ 10/15years

François Ochsenbein
Data Centre in the Long Term
22
Taking advantage of Electronic Era

Take advantage of the existence of
electronic data
using the electronic versions of the journals
to feed Simbad (ToCs)
 using the standards to perform extensive
verifications of the database contents


improved data quality and reliability
(but some surprises from time to time...)
François Ochsenbein
Data Centre in the Long Term
23
Combine the technological evolutions

Example of Aladin
1997: first exercise of Java implementation at CDS
 make use of the open access to data servers (URLs
and HTTP)
 make use of a registry (GLU) since its beginning
 implementation of VO standards: XML, VOTable, SIA,
ConeSearch, Skynode…
 connectivity with other applications: VOPlot,
SpecView, VOSpec…
 API access (scripting mode, ExtApp interface)

François Ochsenbein
Data Centre in the Long Term
24
Technological watch

New technologies taken into account:



not too late: new technologies open new
functionalities, accessibility by recent technologies is
a requirement
...but not too early: requires reliability,
maintainability ... and time to implement!
Keep compatibility with older material
do not require the users to change their
hard/software every 6 months
 example of Aladin: still a significant fraction of users
with Java 1.1

François Ochsenbein
Data Centre in the Long Term
25
Methodological watch

New methodologies are tested
 software
architecture: example of objectorientation used since 15 years (Simbad)
 contents and metadata: ontologies prototyped
as UCD (Unified Content Descriptors) brought
extensive coherence checking methods

Pragmatic, bottom-up approach
François Ochsenbein
Data Centre in the Long Term
26
Relations with the Scientific
Community

At the beginning: essentially personnel contacts
of the CDS director and his staff with their peers


need to convince the astronomers to provide their
data for wide distribution, through scientific
collaborations
Bulletin d'Information du CDS (1971-1998)



scientific results
orientations discussed in the CDS Council meetings
news / controversies about astronomical data
François Ochsenbein
Data Centre in the Long Term
27
Relations with the Scientific
Community (continued)

Deep Impact of the Web




new non-astronomer users
necessity of improving the documentation
much larger usage  improved reliability
CDS role has changed


now asked by scientists to include their data among
datasets available from CDS and mirrors
databases play now a fundamental role, quick data
ingestion is now a requirement
François Ochsenbein
Data Centre in the Long Term
28
Conclusions
Evolution from a service offered to a small
community to major reference services
 precursor/actor of the International Virtual
Observatory
 Importance of the quality, lifetime, motivation,
diversity of the staff, mixture of scientists,
computer scientists, and librarians
 Constant search for networking / partnership

François Ochsenbein
Data Centre in the Long Term
29
Co-authors











Françoise Genova
François Ochsenbein
Mark Allen
Olivier Bienaymé
Thomas Boch
François Bonnarel
Laurent Cambrésy
Sébastien Derriere
Pascal Dubois
Pierre Fernique
Soizick Lesteven
François Ochsenbein








Cécile Loup
André Schaaff
Bernd Vollmer
Marc Wenger
Gérard Jasniewicz (GRAAL)
Emmanuel Davoust (OMP)
Daniel Egret (OSP)
the CDS Bibliographical team:
S. Borde (OSP), M. Brouty, C.
Bruneau, C. Brunet, G.
Chassagnard (IAP), S. Laloë,
A. Schreyeck, P. Vannier, P.
Vonflie, M.J. Wagner, F. Woelfel
Data Centre in the Long Term
30
Thank you
Questions?
François Ochsenbein
Data Centre in the Long Term
31
Download