Machine Learning Classifiers for Intermediate Redshift Emission Line Galaxies
Authors:
Kai Zhang,
David J. Schlegel,
Brett H. Andrews,
Johan Comparat,
Christoph Schäfer,
Jose Antonio Vazquez Mata,
Jean-Paul Kneib,
Renbin Yan
Abstract:
Classification of intermediate redshift ($z$ = 0.3--0.8) emission line galaxies as star-forming galaxies, composite galaxies, active galactic nuclei (AGN), or low-ionization nuclear emission regions (LINERs) using optical spectra alone was impossible because the lines used for standard optical diagnostic diagrams: [NII], H$α$, and [SII] are redshifted out of the observed wavelength range. In this…
▽ More
Classification of intermediate redshift ($z$ = 0.3--0.8) emission line galaxies as star-forming galaxies, composite galaxies, active galactic nuclei (AGN), or low-ionization nuclear emission regions (LINERs) using optical spectra alone was impossible because the lines used for standard optical diagnostic diagrams: [NII], H$α$, and [SII] are redshifted out of the observed wavelength range. In this work, we address this problem using four supervised machine learning classification algorithms: $k$-nearest neighbors (KNN), support vector classifier (SVC), random forest (RF), and a multi-layer perceptron (MLP) neural network. For input features, we use properties that can be measured from optical galaxy spectra out to $z < 0.8$---[OIII]/H$β$, [OII]/H$β$, [OIII] line width, and stellar velocity dispersion---and four colors ($u-g$, $g-r$, $r-i$, and $i-z$) corrected to $z=0.1$. The labels for the low redshift emission line galaxy training set are determined using standard optical diagnostic diagrams. RF has the best area under curve (AUC) score for classifying all four galaxy types, meaning highest distinguishing power. Both the AUC scores and accuracies of the other algorithms are ordered as MLP$>$SVC$>$KNN. The classification accuracies with all eight features (and the four spectroscopically-determined features only) are 93.4% (92.3%) for star-forming galaxies, 69.4% (63.7%) for composite galaxies, 71.8% (67.3%) for AGNs, and 65.7% (60.8%) for LINERs. The stacked spectrum of galaxies of the same type as determined by optical diagnostic diagrams at low redshift and RF at intermediate redshift are broadly consistent. Our publicly available code (https://github.com/zkdtc/MLC_ELGs) and trained models will be instrumental for classifying emission line galaxies in upcoming wide-field spectroscopic surveys.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
The Fifteenth Data Release of the Sloan Digital Sky Surveys: First Release of MaNGA Derived Quantities, Data Visualization Tools and Stellar Library
Authors:
D. S. Aguado,
Romina Ahumada,
Andres Almeida,
Scott F. Anderson,
Brett H. Andrews,
Borja Anguiano,
Erik Aquino Ortiz,
Alfonso Aragon-Salamanca,
Maria Argudo-Fernandez,
Marie Aubert,
Vladimir Avila-Reese,
Carles Badenes,
Sandro Barboza Rembold,
Kat Barger,
Jorge Barrera-Ballesteros,
Dominic Bates,
Julian Bautista,
Rachael L. Beaton,
Timothy C. Beers,
Francesco Belfiore,
Mariangela Bernardi,
Matthew Bershady,
Florian Beutler,
Jonathan Bird,
Dmitry Bizyaev
, et al. (209 additional authors not shown)
Abstract:
Twenty years have passed since first light for the Sloan Digital Sky Survey (SDSS). Here, we release data taken by the fourth phase of SDSS (SDSS-IV) across its first three years of operation (July 2014-July 2017). This is the third data release for SDSS-IV, and the fifteenth from SDSS (Data Release Fifteen; DR15). New data come from MaNGA - we release 4824 datacubes, as well as the first stellar…
▽ More
Twenty years have passed since first light for the Sloan Digital Sky Survey (SDSS). Here, we release data taken by the fourth phase of SDSS (SDSS-IV) across its first three years of operation (July 2014-July 2017). This is the third data release for SDSS-IV, and the fifteenth from SDSS (Data Release Fifteen; DR15). New data come from MaNGA - we release 4824 datacubes, as well as the first stellar spectra in the MaNGA Stellar Library (MaStar), the first set of survey-supported analysis products (e.g. stellar and gas kinematics, emission line, and other maps) from the MaNGA Data Analysis Pipeline (DAP), and a new data visualisation and access tool we call "Marvin". The next data release, DR16, will include new data from both APOGEE-2 and eBOSS; those surveys release no new data here, but we document updates and corrections to their data processing pipelines. The release is cumulative; it also includes the most recent reductions and calibrations of all data taken by SDSS since first light. In this paper we describe the location and format of the data and tools and cite technical references describing how it was obtained and processed. The SDSS website (www.sdss.org) has also been updated, providing links to data downloads, tutorials and examples of data use. While SDSS-IV will continue to collect astronomical data until 2020, and will be followed by SDSS-V (2020-2025), we end this paper by describing plans to ensure the sustainability of the SDSS data archive for many years beyond the collection of data.
△ Less
Submitted 10 December, 2018; v1 submitted 6 December, 2018;
originally announced December 2018.