A systematic approach for the identification of classes of gamma-ray sources Diego F. Torres & Olaf Reimer More details in Torres & Reimer: ApJ Letters 629, 141 (2005) Institute for Space Sciences Barcelona, Spain Plausible diversity of high-energy gamma-ray sources ONLY KNOWN POPULATIONS OF (GeV) GAMMA-RAY SOURCES ARE PULSARS AND AGN Variability: the more direct way to acknowledge the existence of several different gamma-ray sources *Clearly defined variable and non-variable sources *No correlation of variability with sky position Circumstantial evidence + Theoretical analysis suggest: Possible Galactic Sources: -Plerions and SNRs (NV) -Isolated Black holes, X-ray binaries, microquasars (V) -Massive Stars/winds (?) -Molecular clouds (high and low latitude) (NV) Possible Extragalactic Sources: -Radiogalaxies (?) -Clusters of galaxies (NV) -Regions of star formation, starbursts and ULIGS (NV) During the all-sky survey, LAT will have sufficient sensitivity after one day to detect (5s) the weakest EGRET sources. Thousand of sources only at high lat. 100 seconds - GRB940217 (100sec) - PKS 1622-287 flare - 3C279 flare - Vela Pulsar - Crab Pulsar - 3EG 2020+40 (SNR g Cygni?) 1 orbit 1 day - 3EG 1835+59 - 3C279 lowest 5s detection - 3EG 1911-2000 (AGN) - Mrk 421 - Weakest 5s EGRET source Understanding the challenge Identifying on a case-by-case basis all LAT sources using multiwavelength techniques with ad hoc simultaneous observations is simply not possible due to the number of sources. Use of individual classifiers (e.g., Mattox, et al. 2001 or Soward-Emerds et al. 2003) can (and will) make a relative order of correctness within what we already know exist as population of sources (AGNs) in EGRET data •they will work fine (providing sound identifications) for the brightest of the sources (excellent agreement on EGRET, for instance) •there will be unavoidable uncertainties for less bright sources, for sources along the Galactic plane, with no apparent way of distinguishing between classes (is it an AGN seen through the disk or something else of Galactic origin) •AGNs SED are too varied: there is a lack of reliable templates for AGNs SEDs (and in addition the same source may exhibit large spectral variations with time) •it would be up to the reader to decide what to believe in a particular source, with no information about other possible populations Understanding the challenge Problem 1: when an incomplete catalog is complete enough? Enlarge the catalogs of AGNs and pulsars (e.g.: No clear correlation between gamma and radio emission in blazars.) and correlate these with the LAT detections Problem 2: discovery of new populations would depend on members identification. without a precise (a priori) knowledge of which AGNs and which pulsars are able to emit gamma-rays, given their respective SEDs (no real veto system) (implies lack of confidence level for the population as a whole unless extensive multi-years multi-frequency studies are done for many members.) the number of identifications will only be limited by the number of sources in the counterpart catalog considered. Problem 3: simultaneity of multiwavelength studies can be secured for a very handful of sources. (We can not use this technique to explore a discovery space.) Inverting the problem to strike the eye Understanding the challenge (now we consider a large number of gamma-ray sources instead of a large number of counterparts) In the last BATSE map, if one gives account of the positional error boxes, there was a detection of one or more GRB for every line of sight of any instrument at any wavelength used to compile any list of possible counterparts. Real BATSE MAP Correlation analysis potential is completely lost. Understanding the challenge How far is LAT from the former? At low Galactic Latitude (no priors) 1000 LAT sources in the Galactic Plane (|b|<10) with 12’ uncertainty = 20% coverage At high Galactic Latitude (no priors) 103 LAT sources out of Galactic Plane (|b|>10) with 12’ uncertainty = 0.3% coverage 104 LAT sources out of Galactic Plane (|b|>10) with 30’ uncertainty = 20% coverage We need an scheme that allow us to classify populations of sources, and use it before internal relative scales of the goodness of detected individuals are applied (within already known populations) to make sure that we do not overidentify up to the point were discovering new populations is no longer possible. An appealing goal An appealing goal for the first year all sky survey should be, in our opinion, to be able to say which kind of populations have been detected in the GLAST sky, which is the statistical confidence for the detection of each of them (systematically quantified using the same technique) which are the most likely detected individuals of each class, so that multi-frequency obs. can proceed with confidence This classification should go beyond what we already know from EGRET (i.e., pulsars and blazars). Reaching this goal should end up with a high quality publication which could be viewed as the solution to the problem of unidentified EGRET sources (all what we could not identify in the GLAST era should represent a new problematic). Scheme We propose the establishment of an a priori protocol of source population discovery, based on a controlled analysis of positional coincidences. Three parts are involved: Theoretical censorship: prohibits executing repeated searches that would reduce the statistical significance of any possible positive class correlation; Discovery protection: that protects the significance by which one claims the discovery of a number of important population candidates and that gives guidelines as to how to manage the probability budget; Common significance assessment: that assigns probabilities both in the large and in the small number statistical regime. Part 1: what to consider as potential counterparts? Part 1: Theoretical Censorship We request as part of the criterion that predictions (ideally multiwavelength ones) are available for a subset of the proposed counterpart class. This request is made to avoid the blind testing of populations that may or may not produce gamma-rays, but for which no other than a positional correlation result can be a posteriori achieved. If there is no strong theoretical indication that a population can emit gamma-rays before making the search, such population should not be sought. Part 2: why don’t we just try with everything there is? Part 2: Discovery protection If one probes a large number of samples, and make an equally large number of trials with the same instrument detections, one will find positive correlations, at least as a result of statistical fluctuations. To claim significance, one would have to check if the penalties that must be paid for such a finding (i.e., the fact that there were a number of trials that led to null results) does not overcome the significance achieved. This may turn out to be practically impossible (if there is not an a priori established source selection). We request as part of the criterion that the populations that are to be tested, and the testing protocol, be defined before the data release. Lessons to learn from ultra high energy cosmic ray physics: few events, large number of claims, many of them plainly wrong (see discussion by Torres, Reimer, et al. ApJ Letters 595, 13, 2003) Part 2-3: Budgeting, managing probabilities, identifying sources Budget: the total amount of available probability for identification of classes (it is not infinity). In the case of GRBs, a completely uniform population with 4 degrees positional uncertainty, it is 0. In the case of GLAST, it depends on Galactic latitudes, and on assumed priors (e.g., the spiral arms) The budget is a number that tells what is the probability that our claim, the identified classes of sources, is just a chance result. For instance, we want all our claims to require a 1 in 10 4 value for chance capitalization. Part 2: Simple basis for a protocol Suppose that the total budget is a chance probability equal to B, and that we want to test A,B,C,... classes of different sources. The total budget can then be divided into individual chance probabilities, PA, PB, etc., such that the sum of Pi=B. Population i will be claimed as detected if the a posteriori experimental probability for its random correlation, Pexp(i), is less than the a priori assigned Pi (as opposed to be less only than the larger, total budget. Important!: this allows to discover simultaneously different populaitons) We can then manage the budget of probabilities: For some populations we can less confidently agree that they will be detected, or for some others, the number of their members may be low enough such that a detection of few of its individuals would be needed to claim a great significance. In this situation, we would choose a relatively large Pi, so as to make easier for the test to pass. For others, say AGNs and pulsars, we can assign a relatively small Pi in such a way to make harder for the test [whether the inequality Pexp.(i) < Pi is fulfilled] to pass, and that they take less of the total budget. If one or more of the tests are passed, the results are individually significant because first we protected our search by the a priori establishment of the protocol (it was a blind test) and second, because the overall chance probability is still less than the total budget B. In the example below we choose to test Galaxy Clusters and Starbursts with 40% of the high latitude budget each. These are new populations, if discovered, so we want to privilege the chance of spotting them. If B=10-4 then Pexp(clusters) < 0.4 B in order for the population to be claimed as discovered (the significance level of that is discussed below) For others, say classes of AGNs, we can assign a relatively small P(AGN) in such a way to make harder for the test [whether the inequality Pexp.(AGN) < P(AGN) is fulfilled] to pass, and that they take less of the total budget. In this example, P(FSRQ) = 0.1 B. FSRQs is not a new population, so we don’t want to spend our budget on them: it is exactly the same as requiring a very high confidence level for the discovery of this population. FSRQs BL Lacs Clusters Starbursts Part 3: Quality evaluation C(A) number of members within population A that coincide with LAT detections. N(A) number of known sources in the particular candidate population A under analysis U number of detections. P probability that in a random direction of the sky we find a gamma-ray source. As we have seen earlier, P is not overwhelmingly large (uniform distribution with no priors gives P less than a few percent for less than 10 000 sources). A more careful treatment will reduce the value of P from these simple estimations. Such low values for P make the product P x N(A) typically in the range 1-100, for all different candidate populations. We can refer to this product as the noise expectation. Then the excess number of coincidences over the noise is: E(A)=C(A) - P x N(A). E(A) = C(A) - P x N(A). Excess = Coincidences - Noise Pulsars and blazars will present the largest number of positional coincidences. Let us assume that there are 2000 catalogued AGNs; with P ~10-2 or 10-3, all coincidences in excess of 6-60 are beyond the random expectation. Now, C(AGN) >> P x N(A), and thus the number of excesses would be large: we are in the domain of a large number statistics and a probability for the number of excesses to occur by chance, Pexp(AGN) could be computed. When both terms in in the expression for E(A) are small quantities (small number statistics): we should test the null hypothesis for a new source population against a reduced random noise. Methods such as Feldman & Cousins (1998) or Gehrels (1986) are useful to assess quality in this case and obtain Pexp(A) An example of a null hypothesis is “X-ray binaries are not LAT sources”. We have 0 predicted signal events (coincidences) and P x N(A) background. With N(A) ~ 200 and P~ 3 x 10-3, detecting more than 5 coincidences rules out the null hypothesis at 95% CL. If the budgeted P(X –ray bin.) < Pexp (X –ray bin.) have uncovered a new population of sources with 95% CL. Outlook and conclusions: Some technical issues ahead Which are the populations to be tested? how large should the a priori probability be for each of them? how to best compute the random probability P? how large the total budget B should be? all must be answered to completely determine the protocol. By researching and ultimately establishing a protocol along these lines, the problem of identifying the classes of detections will be solved by early 2008, with individually high levels of confidence and collective low random probability. The GOAL IS A SINGLE POWERFUL PAPER BEFORE THE FIRST YEAR, REPORTING THE DISCOVERY OF DIFFERENT CLASSES OF POPULATIONS, EACH WITH ITS CORRESPONDING CONFIDENCE LEVEL, ALL ANALYZED USING THE SAME TECHNIQUE. This would immediately open the possibility of centering efforts only in a case-by-case object-oriented astroparticle physics, but knowing that the class has been detected with, say 95% CL. More details in Torres & Reimer: ApJ Letters 629, 141 (2005) First Assessment Clearly: all 66 (40-100) AGNs that have been claimed to be related with EGRET sources DO NOT have simultaneous multiwavelength studies! Identification is done by statistical methods based on position only (e.g. Mattox) and the correlated variability for some individuals gives support to the existence of the class. But... There is no quantification of the quality of the population as a whole, even in the cases in which we are absolutely sure of individual classifications. A similar problem applies to radio-quiet pulsars! And to pulsars for which there is no timing solutions! There are a few members tracked in multiwavelength studies, and the rest is positional coincidences.... It is not a coincidence that the two identified populations are variable/periodic Lack of statistical classification power. Large error boxes contribute to the problem but it is not the only cause of it. Variability certainly helps, but... If there is a previous prediction of a periodic signal of the flux, that alone unambiguously label the source. Ok. But: This will happen for only a very very small fraction of detections: absence of completeness in the pulsar timing parameters, and shortage of precise variability predictions for accretion powered X-ray binaries. Even if a theoretically compatible variability timescale appears, if we have not identified the class of sources to which the sought counterpart pertains, that in itself will constitute the reason by which to justify the need of follow-up observational campaigns. In any case, most of the sources will either be steady or show no definitive variability timescale. And worse, for most classes of sources, we theoretically expect no variability. Sensitivity and completeness of catalogs is not always good Not having complete catalogs of “identified” populations is not something to fear, but the reflection of a discovery opportunity. We know we are already missing one or several new source populations, both at low and at high Galactic latitudes There are strong indications of variable and non-variable, non-periodic, point-like and extended, low latitude sources, as well as of non-variable, high latitude, extended sources, all of which are beyond the expected behavior of pulsars and AGNs. If many (all) sources were to be correlated with AGNs, for instance, only a case by case analysis could show that the classification by position only is wrong. But remember, GLAST will see >1000 sources! Objects vs. populations? Are we confident of the detection of individual pulsars? 33 ms (20 frames) Crab Yes: pulsar timing in gamma-rays Are we confident of the detection of the population? Yes: pulsar timing in gamma-rays for many pulsars. 237 ms (20 frames) Geminga Objects and populations II 3C279 g-rays X-rays UV Optical IR Radio Simultaneous variability discovered for some blazars. Current strategy for source classification: Top – Down (D. Thompson, multiwavelength group of GLAST) Concept: at some level, gamma-ray sources will have X-ray counterparts. IF the X-ray counterpart can be “found”, the better X-ray position information allows deep searches at longer wavelengths. The approach: using an X-ray image of a gamma-ray source error box, eliminate most of the X-ray sources from consideration based on their X-ray, optical, and radio properties. Look for a non-thermal source with a plausible way to produce gamma rays. The classic example is Geminga. Bignami, Caraveo, Lamb, and Halpern started this search in 1983. The final result appeared in 1992 with the detection of pulsations from this isolated neutron star. Source by source classification 3EG J1835+5918: A New Geminga? Parallel effort by two groups, Mirabal/Halpern and Reimer/Carramiñana – used the same approach and reached the same conclusion for 3EG J1835+5918 Take deep optical (mag. 25) images to try to identify all the X-ray sources. Most turn out to be stars or QSOs, unlikely gamma-ray sources. One candidate has no obvious optical counterpart: RX J1836.2+5925. Start with deep ROSAT image (soft X-rays) Use radio search to look for possible radio pulsar. None found. Construct MW spectrum. It resembles that of Geminga, a spinpowered pulsar. No pulsations have yet been found for 3EG J1835+5918. Use Chandra to obtain X-ray spectrum of the candidate: two components, one thermal, one power law. ~ 4 years of work, and yet it must be confirmed by shrinking the error box, or finding the gamma pulsations Current strategy for source classification: Bottom - Up (D. Thompson, multiwavelength group of GLAST) Concept: the largest class of identified gamma-ray sources is blazars, all of which have radio emission. IF a flat-spectrum radio source with strong, compact emission at 5 GHz or above is found in a gamma-ray source error box, it becomes a blazar candidate. The approach: use radio catalogs to search for flat-spectrum radio sources. If a candidate is found, follow up with other observations to locate other blazar characteristics such as polarization and time variability. The EGRET team used this approach in compiling the EGRET catalogs. Mattox et al. quantified the method based on proximity and radio intensity. SowardsEmmerd, Romani, and Michelson have expanded the number of known blazars with this approach. Blazar Identification Example: 3EG J2006-2321 First Clue: Gamma-ray variability Radio sources in the error box One flat-spectrum radio source, 260 mJy at 5 GHz; one marginally-flat source, 49 mJy; other sources are much weaker Optical observations: The 49 mJy source is a normal galaxy; Wallace et al. Spectral energy distribution is bimodal like other blazars Probably a flat spectrum radio quasar (FSRQ) The 260 mJy source has an optical counterpart with a redshift z=0.83 Variable optical polarization is seen. Only an X-ray upper limit found. ~3 yr of work, and yet, it must be confirmed by shrinking the error box of the g-ray detection.