Papers | QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database

May 1, 2011

This research synopsis was submitted by Dae-Won Kim, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA, USA. A preprint of this paper, which is to appear in the Astrophysical Journal, is available on the arXiv preprint server.

Modern astronomy is entering into the completely new era driven by immense amount of observational data. For instance, ongoing and future large-scale surveys such as Pan-STARRS and LSST will produce more than several terabytes of data per night. Wide-field data mapping of the sky will open a new paradigm of astronomy not only in both scientific and data-handling aspects. Especially, it will be practically impossible to manually examine all the data in order to discover scientifically meaningful information. In other words, innovative and novel algorithms that can automatically analyze the data with minimal human intervention, and that can deliver only the meaningful information to astronomers are becoming more and more important.

This paper introduced such an algorithm to select QSOs (Quasi-Stellar Objects) that typically show strong non-periodic or pseudo-periodic variability. In the absence of spectroscopic data, such an algorithm will be a very powerful tool to select QSOs. Especially, for the future large-scale surveys (Pan-STARRS and LSST), spectroscopic observation will be very expensive due to their wide field of views and limiting magnitudes.

We introduced 11 time series features that quantify different variability characteristics of light curves, which was confirmed to be practical to separate QSOs from other types of variable stars (e.g. Cepheids, RR Lyraes, eclipsing binaries, Be stars, micro-lensing, long-period variables, etc.) and non-varying stars. Figure 1 shows an example of the scatter plot of two time series features. As the figure shows, the two features are useful to separate each of the different types of variable stars. We then employed a supervised machine learning technique called `Support Vector Machine’ that can train a classification model in any hyper dimension. We claim that using hyper-plane cuts derived on the basis of the 11-D space (i.e. 11 time series feature space) is much more adequate to separate QSOs rather than using conventional 2-D hard cut.

We applied the algorithm to the MACHO database consisting of 40million light curves and found 1,620 QSO candidates. We then used the Harvard Odyssey cluster to analyze the whole dataset, which took about two days. Identified candidates were cross-matched with mid-IR catalogs and X-ray catalogs, and confirmed that the majority of their candidates are very strong QSO candidates.

Figure 1. Scatter plot of two time series features. Each axis is different time series feature. Different symbols and colors are different variable sources (gray dots: non-variables, black x’s: eclipsing binaries, magenta crosses: micro-lensing, yellow x’s: RR Lyraes, green x’s: Cepheids, cyan crosses: long-period variables, blue crosses: Be stars, red squares: QSOs). Most of the different variable types are grouped in the different regions.”


Papers | Synthetic Milky Way Galaxies

January 24, 2011
Artist's conception of the Milky Way galaxy.

Image via Wikipedia

Future surveys such as the LSST and GAIA will create object catalogs of staggering size. But beyond such qualitative statements, how do astronomers anticipate the scientific return on these projects? To some extent, you can extrapolate on past surveys taking into account anticpated advances in instrumentation. “Survey X imaged to magnitude M over P percent of the sky. Survey X’ will image to magnitude M’ over P’ percent of the sky, therefore….” But there are really a great number of variables to consider. In order to anticipate the productivity and capabilities of future surveys, one approach is to generate synthetic astrometric catalogs based upon models for the density and distribution of stars throughout the galaxy. As explained in a recent paper by Bland-Hawthorn, Johnston, and Binney (“Galaxia: A Code to Generate a Synthetic Survey of the Milky Way“), such synthetic catalogs are useful for:

a. Interpreting observational data
b. Testing theories on which the models are based, and
c. Testing the capabilities of different instruments and for defining strategies to reduce measurement errors (BJB, 2011).

Using a program that the authors developed, known as Galaxia, the authors implemented a complex model of the “stellar content” of the Galaxy as a function of position, velocity, age, metallicity, and mass. Different components of the Milky Way (thin/thick disc, stellar halo, galactic bulge) are modeled separately.

A fragment of the complex model encoded within Galaxia

In order to consider how future surveys might perform, you also have to take into account factors such as extinction due to interstellar dust, which itself requires a 3D model for the distribution of dust in the galaxy. The figure below shows an impressive correlation with observations obtained between Hipparcos observations and those that would be anticipated based upon the models encoded within Galaxia.


Papers | The Submillimeter Universe

September 2, 2010

Iconic optical image of the Eagle Nebula taken by the Hubble Space Telescope, showing the ‘elephant trunk’ columns protruding from the molecular cloud, illuminated by nearby young stars, but with the youngest objects buried inside. Right: SCUBA image at 450 μm showing thermal dust emission, unveiling the cold cores where the earliest stages of star formation can be studied. (Image and text from Scott et al., 2010 - see below)

The study of the Universe at submillimeter wavelengths (200 μm to 1 mm) enables astronomers to study cold dusty regions such as are found around newly forming star systems in both the Milky way and other galaxies.    Though shrouded in debris disks, submillimeter astronomy can probe deep within to study the underlying sources of thermal radiation.    Recently, Scott et al. posted a review of submillimeter astronomy in support of the Canadian Long Range Plan (Canada’s version of the Decadal Survey). They write:

A full grasp of galaxy evolution requires understanding star formation in detail, but there are still many unsolved issues in the star formation problem.   The youngest stars form in dusty cores – findnig them demands wide submm surveys, with high resolution continuum and spectroscopic follow-up to probe accretion, outflows and disks….Details of the formation, structure and evolution of stars and thstellar systems are still poorly understood, largely because of the physical complexity, involving accretion, atomic and molecular cooling, astrochemistry, dynamics,  and magnetic fields.  Theoretical modelling struggles to keep pace with the quality of the data, and observations must erach the scale of the protostars themselves – ultimately requiring space-based interferometers.

The paper further describes the role of the James Clerk Maxwell Telescope (JCMT) in supporting a broad range of submm survey initiatives, and touches briefly upon some of the unique computational  challenges for map-making at submm wavelengths.


Papers | The ASI Science Data Centre Data Explorer Tool

August 1, 2010

Posted on the arXiv by  D’Elia et al., ASI Science Data Center,
http://arxiv.org/abs/1007.5377

We present here the Data Explorer tool developed at the ASI Science Data Center (ASDC). This tool is designed to provide an efficient and user-friendly way to display information residing in several catalogs stored in the ASDC servers, to cross-correlate this information and to download/analyze data via our scientific tools and/or external services. Our database includes GRB catalogs (such as Swift and Beppo-SAX), which can be queried through the Data Explorer. The GRB fields can be viewed in multiwavelength and the data can be analyzed or retrieved.

From the ASDC Data Explorer Tutorial


Follow

Get every new post delivered to your Inbox.