A systematic approach for the identification of classes of gamma-ray sources

advertisement
A systematic approach for the
identification of classes of
gamma-ray sources
Diego F. Torres & Olaf Reimer
More details in Torres & Reimer: ApJ Letters 629, 141 (2005)
Institute for Space Sciences
Barcelona, Spain
Plausible diversity of high-energy gamma-ray sources
ONLY KNOWN POPULATIONS OF (GeV) GAMMA-RAY SOURCES ARE PULSARS AND AGN
Variability: the more direct way to acknowledge the
existence of several different gamma-ray sources
*Clearly defined variable and non-variable sources
*No correlation of variability with sky position
Circumstantial evidence + Theoretical analysis suggest:
Possible Galactic Sources:
-Plerions and SNRs (NV)
-Isolated Black holes, X-ray binaries, microquasars (V)
-Massive Stars/winds (?)
-Molecular clouds (high and low latitude) (NV)
Possible Extragalactic Sources:
-Radiogalaxies (?)
-Clusters of galaxies (NV)
-Regions of star formation, starbursts and ULIGS (NV)
During the all-sky survey, LAT
will have sufficient sensitivity
after one day to detect (5s) the
weakest EGRET sources.
Thousand of sources only at high lat.
100 seconds
- GRB940217 (100sec)
- PKS 1622-287 flare
- 3C279 flare
- Vela Pulsar
- Crab Pulsar
- 3EG 2020+40 (SNR g Cygni?)
1 orbit
1 day
- 3EG 1835+59
- 3C279 lowest 5s detection
- 3EG 1911-2000 (AGN)
- Mrk 421
- Weakest 5s EGRET source
Understanding the challenge
Identifying on a case-by-case basis all LAT sources using multiwavelength
techniques with ad hoc simultaneous observations is simply not possible due to
the number of sources.
Use of individual classifiers (e.g., Mattox, et al. 2001 or Soward-Emerds et al.
2003) can (and will) make a relative order of correctness within what we
already know exist as population of sources (AGNs) in EGRET data
•they will work fine (providing sound identifications) for the brightest of
the sources (excellent agreement on EGRET, for instance)
•there will be unavoidable uncertainties for less bright sources, for sources
along the Galactic plane, with no apparent way of distinguishing between
classes (is it an AGN seen through the disk or something else of Galactic
origin)
•AGNs SED are too varied: there is a lack of reliable templates for AGNs
SEDs (and in addition the same source may exhibit large spectral
variations with time)
•it would be up to the reader to decide what to believe in a particular
source, with no information about other possible populations
Understanding the challenge
Problem 1: when an incomplete
catalog is complete enough?
Enlarge the catalogs of AGNs and pulsars
(e.g.: No clear correlation
between gamma and radio
emission in blazars.)
and correlate these with the LAT detections
Problem 2: discovery of new
populations would depend
on members identification.
without a precise (a priori) knowledge of
which AGNs and which pulsars are able to
emit gamma-rays, given their respective
SEDs (no real veto system)
(implies lack of confidence
level for the population as a
whole unless extensive
multi-years multi-frequency
studies are done for many
members.)
the number of identifications will only be
limited by the number of sources in the
counterpart catalog considered.
Problem 3: simultaneity
of
multiwavelength studies can
be secured for a very
handful of sources.
(We can not use this
technique to explore a
discovery space.)
Inverting the problem to strike the eye
Understanding the challenge
(now we consider a large number of gamma-ray sources instead of a large number of counterparts)
In the last BATSE map, if one gives
account of the positional error boxes,
there was a detection of one or more
GRB for every line of sight of any
instrument at any wavelength used to
compile any list of possible
counterparts.
Real BATSE MAP
Correlation analysis potential is completely lost.
Understanding the challenge
How far is LAT from the former?
At low Galactic Latitude (no priors)
1000 LAT sources in the Galactic Plane (|b|<10) with 12’ uncertainty = 20% coverage
At high Galactic Latitude (no priors)
103 LAT sources
out of Galactic Plane (|b|>10) with 12’ uncertainty = 0.3% coverage
104 LAT sources
out of Galactic Plane (|b|>10) with 30’ uncertainty = 20% coverage
We need an scheme that allow us to classify populations of sources, and use it
before internal relative scales of the goodness of detected individuals are
applied (within already known populations) to make sure that we do not overidentify up to the point were discovering new populations is no longer possible.
An appealing goal
An appealing goal for the first year all sky survey should be, in our
opinion, to be able to say



which kind of populations have been detected in the GLAST sky,
which is the statistical confidence for the detection of each of
them (systematically quantified using the same technique)
which are the most likely detected individuals of each class, so
that multi-frequency obs. can proceed with confidence
This classification should go beyond what we already know from
EGRET (i.e., pulsars and blazars).
Reaching this goal should end up with a high quality publication which could be viewed as the solution to
the problem of unidentified EGRET sources (all what we could not identify in the GLAST era should
represent a new problematic).
Scheme
We propose the establishment of an a priori protocol of source population
discovery, based on a controlled analysis of positional coincidences.

Three parts are involved:
Theoretical censorship: prohibits executing repeated searches that would reduce the
statistical significance of any possible positive class correlation;
Discovery protection: that protects the significance by which one claims the discovery
of a number of important population candidates and that gives guidelines as to how
to manage the probability budget;
Common significance assessment: that assigns probabilities both in the large and in the
small number statistical regime.
Part 1: what to consider as potential counterparts?
Part 1: Theoretical Censorship
We request as part of the criterion that predictions (ideally multiwavelength
ones) are available for a subset of the proposed counterpart class.
This request is made to avoid the blind testing of populations that may or
may not produce gamma-rays, but for which no other than a positional
correlation result can be a posteriori achieved.
If there is no strong theoretical indication that a population can emit
gamma-rays before making the search, such population should not be
sought.
Part 2: why don’t we just try with everything there is?
Part 2: Discovery protection
If one probes a large number of samples, and make an equally large number
of trials with the same instrument detections, one will find positive
correlations, at least as a result of statistical fluctuations.
To claim significance, one would have to check if the penalties that must be
paid for such a finding (i.e., the fact that there were a number of trials that
led to null results) does not overcome the significance achieved. This may
turn out to be practically impossible (if there is not an a priori established
source selection).
We request as part of the criterion that the populations that are to be tested,
and the testing protocol, be defined before the data release.
Lessons to learn from ultra high energy cosmic ray physics: few events, large number of claims,
many of them plainly wrong (see discussion by Torres, Reimer, et al. ApJ Letters 595, 13, 2003)
Part 2-3: Budgeting, managing probabilities, identifying sources
Budget: the total amount of available probability for identification of classes (it is not infinity). In the case of
GRBs, a completely uniform population with 4 degrees positional uncertainty, it is 0. In the case of GLAST,
it depends on Galactic latitudes, and on assumed priors (e.g., the spiral arms)
The budget is a number that tells what is the probability that our claim, the identified classes of sources, is
just a chance result. For instance, we want all our claims to require a 1 in 10 4 value for chance capitalization.
Part 2: Simple basis for a protocol
Suppose that the total budget is a chance probability equal to B, and that we want to test
A,B,C,... classes of different sources.
The total budget can then be divided into individual chance probabilities, PA, PB, etc., such
that the sum of Pi=B.
Population i will be claimed as detected if the a posteriori experimental probability for its
random correlation, Pexp(i), is less than the a priori assigned Pi (as opposed to be less only
than the larger, total budget. Important!: this allows to discover simultaneously different
populaitons)
We can then manage the budget of probabilities: For some populations we can less
confidently agree that they will be detected, or for some others, the number of their
members may be low enough such that a detection of few of its individuals would be
needed to claim a great significance. In this situation, we would choose a relatively large Pi,
so as to make easier for the test to pass. For others, say AGNs and pulsars, we can assign a
relatively small Pi in such a way to make harder for the test [whether the inequality Pexp.(i) <
Pi is fulfilled] to pass, and that they take less of the total budget.
If one or more of the tests are passed, the results are individually significant because first we
protected our search by the a priori establishment of the protocol (it was a blind test) and
second, because the overall chance probability is still less than the total budget B.
In the example below we choose to test Galaxy Clusters and Starbursts with 40% of the high
latitude budget each. These are new populations, if discovered, so we want to privilege the
chance of spotting them.
If B=10-4 then Pexp(clusters) < 0.4 B in order for the population to be claimed as discovered
(the significance level of that is discussed below)
For others, say classes of AGNs, we can assign a relatively small P(AGN) in such a way to
make harder for the test [whether the inequality Pexp.(AGN) < P(AGN) is fulfilled] to pass, and
that they take less of the total budget. In this example, P(FSRQ) = 0.1 B. FSRQs is not a new
population, so we don’t want to spend our budget on them: it is exactly the same as requiring a
very high confidence level for the discovery of this population.
FSRQs
BL Lacs
Clusters
Starbursts
Part 3: Quality evaluation
C(A) number of members within population A that coincide with LAT detections.
N(A) number of known sources in the particular candidate population A under analysis
U number of detections.
P probability that in a random direction of the sky we find a gamma-ray source.
As we have seen earlier, P is not overwhelmingly large (uniform distribution with no
priors gives P less than a few percent for less than 10 000 sources). A more careful
treatment will reduce the value of P from these simple estimations.
Such low values for P make the product P x N(A) typically in the range 1-100, for all
different candidate populations. We can refer to this product as the noise expectation.
Then the excess number of coincidences over the noise is:
E(A)=C(A) - P x N(A).
E(A)
= C(A) - P x N(A).
Excess = Coincidences -
Noise
Pulsars and blazars will present the largest number of positional coincidences.
Let us assume that there are 2000 catalogued AGNs; with P ~10-2 or 10-3, all coincidences in
excess of 6-60 are beyond the random expectation.
Now, C(AGN) >> P x N(A), and thus the number of excesses would be large: we are in the
domain of a large number statistics and a probability for the number of excesses to occur by
chance, Pexp(AGN) could be computed.
When both terms in in the expression for E(A) are small quantities (small number statistics): we
should test the null hypothesis for a new source population against a reduced random noise.
Methods such as Feldman & Cousins (1998) or Gehrels (1986) are useful to assess quality in this
case and obtain Pexp(A)
An example of a null hypothesis is “X-ray binaries are not LAT sources”. We have 0 predicted
signal events (coincidences) and P x N(A) background. With N(A) ~ 200 and P~ 3 x 10-3,
detecting more than 5 coincidences rules out the null hypothesis at 95% CL. If the budgeted P(X
–ray bin.) < Pexp (X –ray bin.) have uncovered a new population of sources with 95% CL.
Outlook and conclusions: Some technical issues ahead
Which are the populations to be tested? how large should the a priori
probability be for each of them? how to best compute the random
probability P? how large the total budget B should be? all must be
answered to completely determine the protocol.
By researching and ultimately establishing a protocol along these lines, the
problem of identifying the classes of detections will be solved by early
2008, with individually high levels of confidence and collective low
random probability.
The GOAL IS A SINGLE POWERFUL PAPER BEFORE THE FIRST
YEAR, REPORTING THE DISCOVERY OF DIFFERENT CLASSES OF
POPULATIONS, EACH WITH ITS CORRESPONDING CONFIDENCE
LEVEL, ALL ANALYZED USING THE SAME TECHNIQUE.
This would immediately open the possibility of centering efforts only in a
case-by-case object-oriented astroparticle physics, but knowing that the
class has been detected with, say 95% CL.
More details in Torres & Reimer: ApJ Letters 629, 141 (2005)
First Assessment
Clearly: all 66 (40-100) AGNs that have been claimed to be related with
EGRET sources DO NOT have simultaneous multiwavelength studies!
Identification is done by statistical methods based on position only (e.g. Mattox)
and the correlated variability for some individuals gives support to the existence
of the class.
But... There is no quantification of the quality of the population as a whole,
even in the cases in which we are absolutely sure of individual classifications.
A similar problem applies to radio-quiet pulsars! And to pulsars for which there
is no timing solutions! There are a few members tracked in multiwavelength
studies, and the rest is positional coincidences....
It is not a coincidence that the two identified populations are variable/periodic
 Lack of statistical classification power. Large error boxes contribute to the
problem but it is not the only cause of it.
Variability certainly helps, but...
If there is a previous prediction of a periodic signal of the flux, that alone unambiguously label
the source. Ok.
But: This will happen for only a very very small fraction of detections: absence of
completeness in the pulsar timing parameters, and shortage of precise variability predictions for
accretion powered X-ray binaries.
Even if a theoretically compatible variability timescale appears, if we have not
identified the class of sources to which the sought counterpart pertains, that in itself will
constitute the reason by which to justify the need of follow-up observational campaigns.
In any case, most of the sources will either be steady or show no definitive variability timescale.
And worse, for most classes of sources, we theoretically expect no variability.
Sensitivity and completeness of catalogs is not always good
Not having complete catalogs of “identified” populations is not something to fear, but the
reflection of a discovery opportunity.
We know we are already missing one or several new source populations, both at low and at
high Galactic latitudes
There are strong indications of variable and non-variable, non-periodic, point-like and
extended, low latitude sources, as well as of non-variable, high latitude, extended sources,
all of which are beyond the expected behavior of pulsars and AGNs.
If many (all) sources were to be correlated with AGNs, for instance, only a case by case
analysis could show that the classification by position only is wrong. But remember, GLAST
will see >1000 sources!
Objects vs. populations?
Are we confident of
the detection of
individual pulsars?
33 ms (20 frames)
Crab
Yes: pulsar timing in gamma-rays
Are we confident of the
detection of the population?
Yes: pulsar timing in gamma-rays
for many pulsars.
237 ms (20 frames)
Geminga
Objects and populations II
3C279
g-rays
X-rays
UV
Optical
IR
Radio
Simultaneous variability
discovered for some blazars.
Current strategy for source classification: Top – Down
(D. Thompson, multiwavelength group of GLAST)
Concept: at some level, gamma-ray sources will have X-ray counterparts.
IF the X-ray counterpart can be “found”, the better X-ray position information
allows deep searches at longer wavelengths.
The approach: using an X-ray image of a gamma-ray source error box, eliminate
most of the X-ray sources from consideration based on their X-ray, optical, and
radio properties. Look for a non-thermal source with a plausible way to produce
gamma rays.
The classic example is Geminga. Bignami, Caraveo, Lamb, and Halpern started
this search in 1983. The final result appeared in 1992 with the detection of
pulsations from this isolated neutron star.
Source by source classification
3EG J1835+5918: A New Geminga?
Parallel effort by two groups, Mirabal/Halpern and Reimer/Carramiñana – used the
same approach and reached the same conclusion for 3EG J1835+5918
Take deep optical (mag. 25) images to try to identify all
the X-ray sources. Most turn out to be stars or QSOs,
unlikely gamma-ray sources. One candidate has no
obvious optical counterpart: RX J1836.2+5925.
Start with deep ROSAT image
(soft X-rays)
Use radio search to
look for possible radio
pulsar. None found.
Construct MW
spectrum. It
resembles that of
Geminga, a spinpowered pulsar. No
pulsations have yet
been found for 3EG
J1835+5918.
Use Chandra to obtain X-ray
spectrum of the candidate: two
components, one thermal, one
power law.
~ 4 years of work, and yet it must be confirmed by shrinking the error box, or finding the gamma pulsations
Current strategy for source classification: Bottom - Up
(D. Thompson, multiwavelength group of GLAST)
Concept: the largest class of identified gamma-ray sources is blazars, all of which
have radio emission.
IF a flat-spectrum radio source with strong, compact emission at 5 GHz or above
is found in a gamma-ray source error box, it becomes a blazar candidate.
The approach: use radio catalogs to search for flat-spectrum radio sources. If a
candidate is found, follow up with other observations to locate other blazar
characteristics such as polarization and time variability.
The EGRET team used this approach in compiling the EGRET catalogs. Mattox
et al. quantified the method based on proximity and radio intensity. SowardsEmmerd, Romani, and Michelson have expanded the number of known blazars
with this approach.
Blazar Identification Example: 3EG J2006-2321
First Clue:
Gamma-ray variability
Radio sources in the error box
One flat-spectrum radio
source, 260 mJy at 5 GHz;
one marginally-flat source, 49
mJy; other sources are much
weaker
Optical observations:
The 49 mJy source is a normal galaxy;
Wallace et al.
Spectral energy distribution is bimodal like other blazars
Probably a flat spectrum radio quasar (FSRQ)
The 260 mJy source has an optical
counterpart with a redshift z=0.83
Variable optical polarization is seen.
Only an X-ray upper limit found.
~3 yr of work, and yet, it must be confirmed by shrinking the error box of the g-ray detection.
Download