Stellar spectra parametrization by neural networks: extraction of Teff, logg and [Fe/H] in the Gaia RVS spectral region Minia Manteiga Outeiro1 , Diego Ordóñez Blanco2 , Carlos Dafonte Vázquez2 Bernardino Arcay Varela2 and Iciar Carricajo Marı́n1 1 2 Dep. of Navigation and Earth Sciences. University of A Coruña manteiga@udc.es Dep. of Information Technologies and Communications. University of A Coruña dordonez@udc.es Summary. The Gaia satellite, which is foreseen to be launched near the end of 2011, is one of the key scientific missions of the European Space Agency. Gaia will carry out what is called a census of the Galaxy, by compiling exact information on the nature and motion of its main components. It will perform precise astrometry, multiepoch spectrophotometry and medium resolution spectroscopy (R=11500) for the brighter sources. Gaia will be equiped with one spectrograph, the RVS spectrograph that will contribute to the study of the nature of the sources and will allow to determine radial motions with precisions between 1-10 km/s for V=16-17 mag. The RVS domain is the Ca II infrared region, 847-874 nm, a region which is rich in diagnostic lines for the determination of stellar atmosphere parameters, in particular, effective temperatures, surface gravities and overall metallicities. A Comprehensive information about the Gaia project can be found in the Gaia web area at http://www.rssd.esa.int/gaia Data handling, analysis and classification of information regarding the complete sky down to magnitude 17-18 is, with no doubt, a challenge for both Astrophysics and Computational Sciences. We present here our first results on the automatic derivation of stellar parameters in the RVS spectral region, by the use of artificial neural networks (ANN) trained with synthetic model spectra. It is shown that the results achieved are comparable to those obtained by the use of spectrophotometry, beeing their accuracy highly dependant on the spectral signal to noise ratio. 1 Gaia RVS instrument The Radial Velocity Spectrometer (RVS) is an integral-field spectrograph dispersing the light of the field of view with a nominal dispersion R 11500. The RVS instrument operates in time-delayed integration mode, observing each source about 40 times during the 5 years of the mission. The RVS wavelength range, 847-875 nm, has been selected to coincide with the energy-distribution 2 D. Ordóñez Blanco et al. peaks of G and K-type stars which are the most abundant RVS targets. For these late type stars the wavelength interval displays three strong ioniced Calcium lines and numerous weak lines, mainly due to Fe, Si and Mg. In early type stars, RVS spectra will be dominated by Hydrogen Paschen lines and may contain weak lines such as CaII, He I, He II and NI. Over the 5 years mission, RVS will observe around 5 billion transit spectra of the brightest 100-150 million stars on the sky. The on-ground analysis of these spectroscopic data set will be a complex and challenging task, not only because of the volume but because the interdependance of different instruments and modes of observation. As a consequence, data extraction and parametrization should be performed completely in an automatic fashion. We think that the use of Artificial Intelligence techniques and, in particular, artificial neural networks (ANN), is a good approach to be tested for the case of Gaia-RVS dataset. 2 Spectralib: a library of synthetic spectra for Gaia RVS. Our initial approach consist in performing simulations on stellar parameter extraction by means of synthetic spectra. We have used the Gaia RVS Spectralib, a library of 9285 stellar spectra compiled by A. Recio-Blanco and P. de Laverny fron Niza Observatory, and B. Plez from Montpellier University. The spectra are based on the new generation of MARC models from The Uppsala Observatory in the spectral region 847.5-874.5 nm. A technical note is available describing the models used for the atmospheres from which the synthetic spectra were calculated and what parameters were used (Recio Blanco et al., 2006). The grid consist on spectra corresponding to effective temperatures between 4000 and 8000 K (step 250K), logg between -1.0 to 5.0 (step 0.5dex), and overall metallicities between -5.0 and 1.0 (with variable step from 1.0 to 0.25 dex). For each model atmosphere, alpha-elements abundance variations of +0.4, +0.2, +0.0, -0.2 and -0.4 dex, were considered with respect to the original abundances in the models. Access to RVS-Spectralib is open via the ESA Gaia web pages (http://gaia.esa.int/spectralib/). 3 The use of ANN for spectral parametrization Among the different techniques of Artificial Intelligence, ANN have already proved their success in classification problems: they are generally capable of learning the intrinsic relations that reside in the patterns with which they were trained. Some well-known previous works have applied this artificial intelligence technique to the problem of stellar spectral parametrization, obtaining different grades of resolution in the extraction of parameters Teff, logg, [Fe/H] and [alpha/H]. A summary of the current status of automated stellar classification techniques and achievable accuracies can be found in the reviews by Bailer-Jones (2001)and Allende Prieto (2004). Gaia RVS spectra parametrization by ANN 3 Fig. 1. Schematic figure illustrating the location of the RVS optical module and CCDs. Figure courtesy of EADS Astrium. In order to probe the ability of extraction of stellar atmospheric parameters from Gaia-RVS spectra, we choose to train ANN with the ad-hoc calculated synthetic spectra already introduced in the previous section. A simple network architecture was chosen to perform the initial tests: a feedforward ANN with 333 input nodes (the number of pixels in the spectra), 1 hidden layer with 150 nodes, and 3 output layers providing the three atmospheric parameters, Teff, logg and [Fe/H]. The original synthetic spectra were degraded in spectral dispersion, averaging the flux each three pixels in order to reduce the input nodes from 1004 to 333, avoiding very long, and probably unnecessary, computational time. An empirical rule restricts the number of the nodes in the hidden layer to 0.5-0.3 times the number of input nodes. A total of well distributed 1764 spectra were considered in the training sets, while tests were performed on subsets of 465 spectra. Typically, a good performance in the network convergence and low parameter errors were achieved after about 3000 training cycles, which translates to about 3 hours of computational time on an AMD 64 computer. 4 D. Ordóñez Blanco et al. Gaia-RVS spectra will be of very different quality depending mostly on the stellar brightness. It has been proposed that the end-of-mission SN ratio for a typical star in the Galaxy, a G5V with V=15.5 or a F2II with V=14.5 will be about 10. In order to have into account the effect of the SN values in the ANN performance, we have delivered tests taking into account four values of SN ratio: 10, 20, 50 and 100. The model of noise considered was a simple gaussian white noise, introduced using IRAF mknoise routine. Fig. 2. Results on Teff parametrization. Columns show the mean errors in K for the different values of the SN ratio of the training set, while lines refer to errors in the validation sample. 4 Preliminary results The mean errors in the extraction of effective temperatures, gravities and metallicities is shown in figures 1, 2 and 3. In each of the figures, columns show the mean errors in each of the stellar parameter for the different values of the SN ratio of the training set, while lines refer to that of the test sample. The diagonal line in each of the threee figures shows the performance of the network when the training and test sets have the same SN ratio. Mean errors as low as 21.5 K, 0.09 dex and 0.08 dex for effective temperatures, logg and metal abundance, respectively, were reached for ANN trained and tested on synthetic spectra with no noise added. The errors grow to values of 67 K, 0.16 dex and 0.11 dex in the case of SN 100; 95 K, 0.22 dex and 0.16 dex for SN 50; 204 K, 0.44 dex, 0.27 dex for SN 20 and finally to 382 K, 0.75 dex and 0.46 dex for SN 10. From the data in the figures, it is obvious that the SN heavenly influence the learning process, and that essentially poor results are encounter when training and testing spectral samples with different SN values. 5 Conclusions and future work We presented our first results on the automatic derivation of stellar parameters in the RVS spectral region, by the use of artificial neural networks trained Gaia RVS spectra parametrization by ANN 5 Fig. 3. Results on logg parametrization. Columns and lines as in Figure 1. Fig. 4. Results on metallicity parametrization. Columns and lines as in previous figures. with synthetic model spectra. The results achieved are comparable to those obtained by the use of spectrophotometry, beeing the accuracy highly dependant on the signal to noise ratio of the spectra. Our results show that ANN can be a good approach to extract atmospheric parameters from Gaia-RVS spectra, providing that the SN ratio of the training and testing spectral set be well characterized. Mean errors as low as 95 K, 0.22 dex and 0.16 dex for effective temperatures, logg and metal abundance, respectively, were reached for ANN trained and tested on synthetic spectra with SN 50. Future work includes the performance of tests with different ANN architectures; the consideration of spectra with the original Gaia-RVS dispersion, 1004 flux points; an improvement in the statistics of the test set, and an statistical consideration of the effect of the noise in the ANN performance. References 1. A. Recio-Blanco, P. de laverny and B. Plez: European Space Agency Technical Note RVS-ARB-001. (2005) 2. C. Bailer-Jones: Automated stellar classification for large surveys: a review of methods and results. In: Automated Data Analysis in Astronomy, ed by R. Gupta, H.P. Singh, C.A.L. Bailer-Jones (Narosa Publishing House, New Delhi, India). 3. C. Allende-Prieto: Appl. Phys. A 61, 33 (1995)