This research synopsis was submitted by Dae-Won Kim, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA, USA. A preprint of this paper, which is to appear in the Astrophysical Journal, is available on the arXiv preprint server.
Modern astronomy is entering into the completely new era driven by immense amount of observational data. For instance, ongoing and future large-scale surveys such as Pan-STARRS and LSST will produce more than several terabytes of data per night. Wide-field data mapping of the sky will open a new paradigm of astronomy not only in both scientific and data-handling aspects. Especially, it will be practically impossible to manually examine all the data in order to discover scientifically meaningful information. In other words, innovative and novel algorithms that can automatically analyze the data with minimal human intervention, and that can deliver only the meaningful information to astronomers are becoming more and more important.
This paper introduced such an algorithm to select QSOs (Quasi-Stellar Objects) that typically show strong non-periodic or pseudo-periodic variability. In the absence of spectroscopic data, such an algorithm will be a very powerful tool to select QSOs. Especially, for the future large-scale surveys (Pan-STARRS and LSST), spectroscopic observation will be very expensive due to their wide field of views and limiting magnitudes.
We introduced 11 time series features that quantify different variability characteristics of light curves, which was confirmed to be practical to separate QSOs from other types of variable stars (e.g. Cepheids, RR Lyraes, eclipsing binaries, Be stars, micro-lensing, long-period variables, etc.) and non-varying stars. Figure 1 shows an example of the scatter plot of two time series features. As the figure shows, the two features are useful to separate each of the different types of variable stars. We then employed a supervised machine learning technique called `Support Vector Machine’ that can train a classification model in any hyper dimension. We claim that using hyper-plane cuts derived on the basis of the 11-D space (i.e. 11 time series feature space) is much more adequate to separate QSOs rather than using conventional 2-D hard cut.
We applied the algorithm to the MACHO database consisting of 40million light curves and found 1,620 QSO candidates. We then used the Harvard Odyssey cluster to analyze the whole dataset, which took about two days. Identified candidates were cross-matched with mid-IR catalogs and X-ray catalogs, and confirmed that the majority of their candidates are very strong QSO candidates.
Figure 1. Scatter plot of two time series features. Each axis is different time series feature. Different symbols and colors are different variable sources (gray dots: non-variables, black x’s: eclipsing binaries, magenta crosses: micro-lensing, yellow x’s: RR Lyraes, green x’s: Cepheids, cyan crosses: long-period variables, blue crosses: Be stars, red squares: QSOs). Most of the different variable types are grouped in the different regions.”