here

advertisement
Sean McLaughlin
Summer 2014 Research Summary
Introduction
Humans have been looking up at the stars for our entire existence. By the 1610’s,
Galileo took the science of astronomy a huge leap forward by inventing his spyglass and
pointing it toward the sky. Just by that simple act, he was able to see things that no one had
ever seen before, and change the way humanity thought about its place in the Universe.
Astronomers today are still doing the same basic thing; pointing bigger and bigger telescopes at
the sky and trying to see what has so far remained unseen. However, unlike Galileo, we no
longer peer through our telescopes and sketch what we see. Now, massive automated
telescopes store high quality digital images in massive databases. There are so many images in
these databases that its simply not feasible for scientists to scroll through them one by one
looking for interesting objects. Instead, scientists are developing computational techniques to
sift through these images for us. These programs tell us what, if any, interesting objects are
present in these images. Improving the accuracy and efficiency of these programs is an ongoing
problem, but one that is essential to its continued development. It presents a lot of interesting
challenges, but ultimately will allow astronomers to determine what is out there, and do so far
more efficiently than Galileo and his spyglass.
This summer I worked to hone a small part of this database crawling program.
Specifically, I worked to develop a computer program that would find strong gravitational hidden
by the light of bright galaxies. Gravitational lensing is a directly observable effect of general
relativity. Relativity predicts that the presence of mass causes light beams to bend. This
manifests in a couple different ways, but the most dramatic of which is strong lensing. Strong
lens systems are created when an incredibly massive, nearby galaxy bends the light of ancient,
far-away galaxy into a beautiful arc of light. Aside from the aesthetic and scientific beauty of
these objects, they are incredibly useful for scientists. The light from most ancient galaxies
would normally have already dispersed such that it could no longer be seen from earth. The
lensing effect refocuses that light, and allows astronomers to study these galaxies from the
beginning of the universe. It is also possible to study the properties of the nearby lensing galaxy
from the shape of the lens arc. Studying strong gravitational lenses can shed light on the nature
of dark matter and dark energy, and the origin of our universe.
Fig. 1: An example of a strong gravitational lens, captured by the Hubble Space Telescope.
The lensing galaxy is the central orange sphere, and the lens is the arc of blue light.
Credit to ESA/Hubble & NASA
Sean McLaughlin
Summer 2014 Research Summary
The trouble is, they’re not easy to find. There’s no way to predict where they will be in
the sky, since they occur with the random alignment of two galaxies and our planet. Also, the
scenario where the lens arcs occur a considerable distance away from the lensing galaxy is
rare. The radius the lens arcs occur from the lensing galaxy increases with mass. So if the
lensing galaxy is not massive enough, the lens arcs will instead be mixed in with the light of the
central galactic bulge. This makes finding these objects in telescope databases a brute force
problem, exhaustively checking everywhere in the sky for them. My goal was to develop a
technique to find lenses that are hidden behind the light of their host galaxy, and streamline the
technique so it could be run en masse on millions of images.
Methods
The lens detection process has several steps. The first part is fitting the light profile of
the main galactic bulge with a mixture of Gaussians (MOG) using a Markov Chain Monte Carlo
(MCMC) method. Then, the model can be subtracted from the light profile, and the parts that
don’t fit the model can be inspected. Those residuals are classified and analyzed to see if they
have the properties we would expect to see in lenses. We can then assign a probability that the
object in question really represents a lens.
The MCMC fit to the light profile is the most computationally expensive and involved
portion of the process. It is usually not possible to check all possible combinations of
parameters (height, width, location, etc.) in a model to see what the best fit is. Instead, a
probabilistic approach can be used to find what parameters are probably correct. That is what
an MCMC computation does. It randomly selects some starting parameters and sees how they
match up with the data. It then takes a step in a random direction (increase the height, decrease
the width, etc.) and checks if that step was better or worse than where it started. It then repeats
that process thousands of times, taking steps increasingly in the right direction. By the end, the
chain will have converged to an ideal set of parameters. One of the many advantages to this
approach is that it is scalable to large computing structures; it can efficiently run on a huge
number of images quickly. Another advantage to this approach is that loss and probability
functions are fully customizable. The designer can choose what traits the model should favor,
leading to a better model. A mixture of Gaussians, meanwhile, is simply that: a sum of multiple
Gaussian (bell-curve) functions. Most procedures for fitting to galaxy light profiles are of a
Gaussian form. A mixture of Gaussians allows for all of the parts of the galaxy to be fit to
properly. One of the challenges of performing a MOG fit using MCMC is that the number of
Gaussians in MOG cannot be a free parameter. Instead, to select the best model, each number
of Gaussians has to be attempted and the best one has to be selected from among them using
a statistical test. I wrote a program using Python that implemented an MCMC fit of MOG
parameters on astronomical images. Thus far, I’ve tested the algorithm on a large catalog of
simulated lenses. Eventually, I hope to generalize the approach to work on any astronomical
image.
After the best sum of Gaussians is found, it is subtracted from the image. The remaining
objects that did not match the model have to be identified and analyzed. This was done using a
Sean McLaughlin
Summer 2014 Research Summary
clustering algorithm. Pixels in the image that are above a standard deviation are placed into
groups. Then, the clusters are tested to see if they fulfill lens properties. In a lens, each piece of
the arc has to be about the same distance from the center: the Einstein Radius. If all objects
identified are about the same distance from the center, it hints that the residual is likely a lens.
Another test is the orientation of the objects. The long side of an object is usually parallel to the
line that points toward the center of the lensing galaxy. If the object is oriented this way, it is also
likely a lens. There are a few other, simple tests the program does, and if enough parameters
are satisfied the object is deemed a lens candidate. This analysis cannot determine with
absolute certainty if an object is a lens, but that is not the point. Rather, it is trying to
probabilistically determine if the object warrants further study by human eyes. Even the most
basic use of this program will study about 1,000 images, and the pool should be narrowed down
to as few as possible. At this point in time, I do not have a true probabilistic test, only a binary
one. I hope to soon have the test not return a yes or no, but a probability of the object being a
lens. This is more rigorous, but more in the spirit of probabilistic computing.
After each of these parts were developed, along with a few smaller modules, I finished
the summer working on combining them into one, complete package. The goal is to have a fully
functioning, standalone program that can be used by other scientists for their own experiments.
As of this writing, the full source code is nearly complete, and available on my github page.
Results
Fig 2: Output of the fit on a simulated lens image. The top row shows 3 unedited images of the
object, in 3 bands. The bands from left to right go from bluer to redder light. The 2 images on
the bottom left show the result of subtraction of 2 different fits. The final image is the difference
between the 2 fits.
In Figure 2 above I have the output of the main MCMC fit portion of my program. This is
one of nearly 1000 similar plots created with the simulated lens dataset. The second fit is
Sean McLaughlin
Summer 2014 Research Summary
performed in the i-band, and then subtracted from the g-band. This technique was generally
more successful. The light of lensing galaxies is usually very red, while the light of the lens itself
is usually very blue. This means the galaxy profile is dominant in the i-band, while the lens is
dominant in the g-band. So, by fitting the shape of the galaxy light in the i-band, we get a better
idea of that it will look like in the g-band, unobstructed by the lens. Figure 2 shows that the
central bulge can be removed, and the residual objects are clear. Figure 3 below shows the
result of the lens identification process run on those residuals.
Fig 3: The result of the lens ID process. The image on the left is the raw image after the galaxy model is
removed. The image on the right indicates clusters by color, and the orientation by the plotted axes. This
object was correctly identified as a lens (though it is truly a simulation).
The remaining clusters are identified, and the axes are plotted for each cluster. As is
evident from the plot. the correct remainders are found, and their axes show that they are all
oriented circularly around the center. This is the case for the majority of simulated images.
Conclusion and Future Work
I conclude from this work that this technique is possible and could serve as a reliable
lens detection technique. I hope to get some real lens images as well as some false positives to
test my program against. After those tests, this process could be used in telescopic surveys to
detect lenses.
I plan to continue working with this project for the foreseeable future. There is a lot of
room for further optimization on my program. I also plan to experiment with different MCMC and
clustering algorithms to see how it influences the success of the program. My advisor and I also
hope to start writing a paper to publish the work in the coming months.
Sean McLaughlin
Summer 2014 Research Summary
Acknowledgements
The author would like to thank his advisor, Dr. Robert Brunner for his experience and
guidance as well as Matias Carrasco-Kind for his advise. This work has been supported by the
Campus Honors Program and the Office of Undergraduate research, and Professor Brunner
ackowledges support from the National Science Foundation Grant No. AST-1313415.
Computing support was provided by the Open Science Data Cloud, and the author would also
like to thank the developers of emcee and scikit-learn for their open source software libraries,
which were essential to this project.
Download