O1.1 Rule-based Cross-matching of Very Large Catalogs

advertisement

Rule-based Cross-matching of

Very Large Catalogs

Patrick Ogle and the NED Team

IPAC, California Institute of Technology

NASA Extragalactic Database (NED)

A fusion of multi-wavelength extragalactic data from journal articles and large catalogs

NED Holdings (October 2014)

2MASS PSC

And much more, including classifications, notes, images, spectra…

New Cross-matching Algorithm

• Very Large Catalogs

(VLCs, >10 7 sources)

• Find candidate matches in

NED

• Select best match

– Rule-based

– Statistical analysis

• Match data recorded in DB

• Reversible and iterable

GALEX ASC (NUV) vs. SDSS DR6 (gri, 6’x6’)

Cross-match Inputs

• VLC Source and NED Object Positions (RA, Dec, ±)

 Source-Object Separation (s, ±σ)

• Source and Object Types

(galaxy, galaxy cluster, star, UV source, etc…)

• Background Object Density (measured for each source)

• Instrumental Beam Size

• Other: redshift, photometry, diameters

NED Pipeline for Very Large Catalogs

• Source Loader

– Load Very Large Catalog (VLC) source names and positions into NED.

• CSearch (PostgreSQL)

– Find match candidates with NED near position search

– Count background objects

– Spatial indexing will speed up search (e.g. Q3C, HTM)

• MatchExpert (python)

– Select best match from CSearch match candidates

– Object associations for no-matches

– Record match statistics for each match

– Match statistic distributions and integrals

– Code migration to DBMS for speed

• Object Loader (PostgreSQL)

– Create NED cross-IDs

– new objects

– associations

Source

Loader

CSearch

MatchEx

Object

Loader

Match

List from

Csearch

S<Scut

MatchEx Logic

Thresholds

Type

Match

P>Pcut

Create NED object and associations

No

Match

S1/S2

<0.33

Error

Circles

Overlap

NED Cross-ID Match

NED dup.

Name

Prefix

Match

Single

Good

Match

Associations

• Where a match is not made to a nearby object, an association record may be created.

• Association types:

– Source and object position error circles overlap (  )

– Object is within the beam (PSF) of the source (  )

Error

Circles

Overlap

Create Error Overlap

Association record

No

Match

S<beam

Create In Beam

Association record

Application to GALEX ASC Catalog

GALEX ASC (NUV) vs. NED

NED object

GALEX search region

Background region

SDSS DR6

(g,r,i)

SDSS DR6 (gri, 6’x6’)

• GALEX All-Sky Catalog of ~40 milllion unique

NUV sources created by

M. Seibert (2012)

• Matched against ~180 million NED objects

(2013)

Poisson Match Probability

Search radius: r s

= 7.5″ for GALEX

Background radius: r b

= 46.5″ for GALEX

Density of background NED objects: n = N/(πr

Expected number inside s: <N s

> = N(s/r b

) b

2 )

2 , s = separation

Poisson probability of x = k objects closer than s:

– P s

(x=k) = <N s

> k exp(-<N s

>)/k!

– For k=0, simplifies to:

P s

(x=0) = exp(-<N s

>) = exp(-N(s/r b

False-match probability: P f

= 1-P s

(0)

) 2 ) r b

Example:

N = 4, s/r b

= 0.08

P s

(0) = 0.975

P f

= 0.025

s r s

Optimizing Match Selection

• Optimize on 100K subsample in SDSS region

• False-positive rate decreases with increasing

Poisson cutoff.

• False negative rate increases with Poisson cutoff.

• Give 10x weight to false positives--it’s worse to make an incorrect match than to miss a match.

• Poisson cutoff value of 90% minimizes the combined, weighted error rate.

GALEX ASC Match Results: Totals

• 39,570,031 input GALEX ASC UV sources

• NED (2013) contained ~180 million distinct objects

• 10,595,382 (26.8%) of the ASC sources matched NED objects  Cross-IDs

• 28,974,649 (73.2%) are not matched  new NED objects

– 68.2% of GASC sources are in blank NED fields

– 5.0% have multiple match candidates

Image credit : GALEX

NASA/JPL-Caltech/SSC

GALEX ASC Match Results: Background

Rejection and False-Negative Rate

• Uncorrelated background out to 15 arcsec fit by straight line: dN/ds ~ s

• MatchEx is successful at filtering out this background.

• False-negative rate f n

= 2.4% estimated by comparison to background-subtracted match candidates (red line). false negatives

Separation (arcsec)

GALEX ASC Results: False Positive Rate

• The false-positive match rate is estimated by summing the

Poisson statistic (1-P) over all matches and dividing by the total number of sources : f p

=0.25%

20

15

10

5

GALEX ASC Results: Position Error

Distribution

• The distribution of normalized separation r=s/σ deviates from a Gaussian. The peak is at 0.9 instead of

1.0, and the tail is stronger.

Derivative of a Gaussian

Important Lessons Learned:

1. Do not assume reported catalog position errors are correct.

2. Do not assume position error distributions are Gaussian.

3. A 3.5σ threshold on match separation rejected more candidates than expected.

r=s/σ

Comparison to SDSS Photometry

• While no color criteria were used to select matches to

GALEX sources, the NUV-g colors of GALEX-SDSS matches were checked:

Most matches have -7<NUV-g<7 • GALEX ASC range: 14<NUV<24

• Detection rate falls at NUV>21.7

Results by Object Type

• Object Types ordered by candidate match frequency

• Most GALEX sources matched to galaxies (G) and stars (*)

• QSO, Galactic star (!*), UV excess object (UvES), and WD* matches overrepresented, as might be expected for a UV-selected catalog.

• Matches to RadioS, XrayS, GGroup, and GPair candidates were disallowed.

GALEX Photometry in NED

• GALEX ASC photometry added to NED spectral energy distribution of 3C 382 (CGCG 173-014)

• Over 145 million GALEX ASC NUV and FUV photometry records added to NED (2 extraction methods per band)

VLCs in NED, now and future

 GALEX ASC: ~40,000,000 UV sources loaded and matched (2013)

 GALEX MSC: ~22,000,000 UV sources loaded and matched (2014)

• Spitzer Source List: ~42,000,000 MIR sources (2014)

• 2MASS PSC: ~471,000,000 NIR sources loaded (2015 finish)

• AllWISE: ~748,000,000 MIR sources (2015 start)

• SDSS DR10: ~469,000,000 Vis sources (2015 start)

 SDSS DR6: ~154,000,000 Vis sources loaded and matched (out of 217M), excluding sources with undesirable flag values (2008)

NED aims to quadruple its object holdings in the next year!

Download