Running a Data Centre on the Long Term François Ochsenbein (Françoise Genova) Centre de Données astronomiques de Strasbourg (CDS) François Ochsenbein Data Centre in the Long Term 1 Lessons learnt from more than 30 years of CDS history Introduction CDS Organisation CDS Activities Dealing with technical evolution Relation with the Astronomical Community Conclusions François Ochsenbein Data Centre in the Long Term 2 A bit of history... Creation in 1972 by (French) Institut National d'Astronomie et de Géophysique (INAG), now Institut National des Sciences de l'Univers (INSU) --- 33yrs ago! Named CDS = Centre de Données Stellaires, later renamed Centre de Données de Strasbourg, after extending the CDS scope to non-stellar objects François Ochsenbein Data Centre in the Long Term 3 CDS Charter 1. 2. 3. 4. collect useful data on astronomical objects, in electronic form; improve them by critical evaluation, comparison and combination; distribute the results to the international astronomical community; conduct research using these data -- at the time of CDS creation, the original objective was to gather stellar data to study the galactic structure. provide science tools to the community François Ochsenbein Data Centre in the Long Term 4 CDS Organisation Catalog Service, VizieR, data for Simbad homogenize Simbad, catalog metadata distribute CDS Services preserve keep original versions extended to images (Aladin), metadata (nomenclature), links, registries,... collect François Ochsenbein Data Centre in the Long Term 5 SIMBAD Story A database runnning over 3 decades 1971-1981: CSI (Catalog of Stellar Identifications) and BSI (Bibliographical Star Index) 1981-1990: SIMBAD was born (Set of Identifications, Measurements and Bibliography for Astronomical Data) , evolved over different mainframe architectures 1990-2005: SIMBAD3 on workstations, using objectoriented concepts 2006-... SIMBAD4 based on Java technology and open source databases François Ochsenbein Data Centre in the Long Term 6 Simbad Evolution... 350,000 stars IBM360/65 PL1/Assembler 1976 700,000 objects Univac 1110 Real-Time Update 1988 2,500,000 objects Sun workstation C+Object-Oriented 1998 François Ochsenbein Data Centre in the Long Term 7 Simbad data on the long term From 300,000 stars to 3.5million astronomical objects (stars, galaxies, nebulae,...) Regular editions of the full data-base as ascii files stored: on paper (1972-1977) on microfiches (1977-1985) as disk files (1990-...) 1/year ... 1/month gives back details on db evolution all modifications archived François Ochsenbein Data Centre in the Long Term 8 CDS Catalog Service and Catalog service (storage and distribution on magnetic tapes) since CDS creation in 1972 FTP access to CDS collection since 1991 Reorganisation of the catalog descriptions (metadata) since 1992 VizieR (catalogs organised as relational database) from 1996 François Ochsenbein Data Centre in the Long Term 9 MetadataEvolutions 1981 François Ochsenbein Data Centre in the Long Term 2005 10 Catalog preservation Keep records on modifications on paper until 1991 on electronic files since 1991 original file preserved as much as possible Mirror copies (x4 for data files, x8 for database) François Ochsenbein Data Centre in the Long Term 11 Aladin Sky Atlas Aladin Server: started around 1994 with digitized Schmidt plates used for the Guide Star Catalog, organized as a database Aladin portal: started around 1997 as our first Java test and becomes a widely used Virtual Observatory portal François Ochsenbein Data Centre in the Long Term 12 From 1997 to 2005 Originally: visualize catalog data and images at CDS Today: a widely used VO portal François Ochsenbein Data Centre in the Long Term 13 CDS organisation The CDS team integrates staff with different profiles: astronomers, computer engineers, specialized librarians (mainly permanent positions). Scientific objectives, strategy and work program discussed in the group chaired by the CDS director From its creation, the CDS activities are examined by a Scientific Council (6F+6f) François Ochsenbein Data Centre in the Long Term 14 The CDS Scientific Council François Ochsenbein Data Centre in the Long Term 15 François Ochsenbein Data Centre in the Long Term 16 Partnership Historically CDS associated with participating Institutes: Lausanne/Genève Astronomisches (astrometry) (photometry) Rechen-Institut, Heidelberg Observatory (bibliography) Marseille Observatory (radial velocities) ... and Strasbourg Observatory (spectral Paris-Meudon classification) François Ochsenbein Data Centre in the Long Term 17 Partnership (continued) Participation in projects related to missions, e.g. Hipparcos, XMM ; Participation to the networking of services (observatory archives, journals, ADS, ...) Participation in the Virtual Observatory enterprise (AVO, IVOA, VOTECH) (bring interoperability skills, enables rapid prototyping) Involvment with CDS user community François Ochsenbein Data Centre in the Long Term 18 Partnership with publishers The data published in the specialized literature can't be re-used PS/PDF files are not re-usable interest for the data used ends with the publication convince the astronomical journal editors (A&A, AJ, ApJ) to store the tables in a reusable form (standardized description) improves the data quality and reliability François Ochsenbein Data Centre in the Long Term 19 User Support Communication with users (astronomers) is a privilegied way to get feedback on the data & service (good and bad) to be aware of the astronomers' wishes for an efficient research work to get help e.g. clean up some datasets to motivate the CDS team Demos to users (AAS, ...), constant discussions with other service developers (ADASS) François Ochsenbein Data Centre in the Long Term 20 User Support (continued) Hotline question@simbad.u-strasbg.fr in existence since 15 years is still our major source of error report and feedback all astronomers and most engineers participate to this duty (even the Director...) most questions now from non-astronomers François Ochsenbein Data Centre in the Long Term 21 Dealing with Technical Evolutions Example of the Simbad history: from mainframes managed by computing centers to clusters of PCs changes of languages between PL1/assembler, C, perl, java... from batch queries and updates on punched cards, to interactive terminals, real-time updating, graphical interfaces, WWW, java... while keeping the scientific quality database lifetime ≈ 10/15years François Ochsenbein Data Centre in the Long Term 22 Taking advantage of Electronic Era Take advantage of the existence of electronic data using the electronic versions of the journals to feed Simbad (ToCs) using the standards to perform extensive verifications of the database contents improved data quality and reliability (but some surprises from time to time...) François Ochsenbein Data Centre in the Long Term 23 Combine the technological evolutions Example of Aladin 1997: first exercise of Java implementation at CDS make use of the open access to data servers (URLs and HTTP) make use of a registry (GLU) since its beginning implementation of VO standards: XML, VOTable, SIA, ConeSearch, Skynode… connectivity with other applications: VOPlot, SpecView, VOSpec… API access (scripting mode, ExtApp interface) François Ochsenbein Data Centre in the Long Term 24 Technological watch New technologies taken into account: not too late: new technologies open new functionalities, accessibility by recent technologies is a requirement ...but not too early: requires reliability, maintainability ... and time to implement! Keep compatibility with older material do not require the users to change their hard/software every 6 months example of Aladin: still a significant fraction of users with Java 1.1 François Ochsenbein Data Centre in the Long Term 25 Methodological watch New methodologies are tested software architecture: example of objectorientation used since 15 years (Simbad) contents and metadata: ontologies prototyped as UCD (Unified Content Descriptors) brought extensive coherence checking methods Pragmatic, bottom-up approach François Ochsenbein Data Centre in the Long Term 26 Relations with the Scientific Community At the beginning: essentially personnel contacts of the CDS director and his staff with their peers need to convince the astronomers to provide their data for wide distribution, through scientific collaborations Bulletin d'Information du CDS (1971-1998) scientific results orientations discussed in the CDS Council meetings news / controversies about astronomical data François Ochsenbein Data Centre in the Long Term 27 Relations with the Scientific Community (continued) Deep Impact of the Web new non-astronomer users necessity of improving the documentation much larger usage improved reliability CDS role has changed now asked by scientists to include their data among datasets available from CDS and mirrors databases play now a fundamental role, quick data ingestion is now a requirement François Ochsenbein Data Centre in the Long Term 28 Conclusions Evolution from a service offered to a small community to major reference services precursor/actor of the International Virtual Observatory Importance of the quality, lifetime, motivation, diversity of the staff, mixture of scientists, computer scientists, and librarians Constant search for networking / partnership François Ochsenbein Data Centre in the Long Term 29 Co-authors Françoise Genova François Ochsenbein Mark Allen Olivier Bienaymé Thomas Boch François Bonnarel Laurent Cambrésy Sébastien Derriere Pascal Dubois Pierre Fernique Soizick Lesteven François Ochsenbein Cécile Loup André Schaaff Bernd Vollmer Marc Wenger Gérard Jasniewicz (GRAAL) Emmanuel Davoust (OMP) Daniel Egret (OSP) the CDS Bibliographical team: S. Borde (OSP), M. Brouty, C. Bruneau, C. Brunet, G. Chassagnard (IAP), S. Laloë, A. Schreyeck, P. Vannier, P. Vonflie, M.J. Wagner, F. Woelfel Data Centre in the Long Term 30 Thank you Questions? François Ochsenbein Data Centre in the Long Term 31