-
Galaxy Zoo CEERS: Bar fractions up to z~4.0
Authors:
Tobias Géron,
R. J. Smethurst,
Hugh Dickinson,
L. F. Fortson,
Izzy L. Garland,
Sandor Kruk,
Chris Lintott,
Jason Shingirai Makechemu,
Kameswara Bharadwaj Mantha,
Karen L. Masters,
David O'Ryan,
Hayley Roberts,
B. D. Simmons,
Mike Walmsley,
Antonello Calabrò,
Rimpei Chiba,
Luca Costantin,
Maria R. Drout,
Francesca Fragkoudi,
Yuchen Guo,
B. W. Holwerda,
Shardha Jogee,
Anton M. Koekemoer,
Ray A. Lucas,
Fabio Pacucci
Abstract:
We study the evolution of the bar fraction in disc galaxies between $0.5 < z < 4.0$ using multi-band coloured images from JWST CEERS. These images were classified by citizen scientists in a new phase of the Galaxy Zoo project called GZ CEERS. Citizen scientists were asked whether a strong or weak bar was visible in the host galaxy. After considering multiple corrections for observational biases, w…
▽ More
We study the evolution of the bar fraction in disc galaxies between $0.5 < z < 4.0$ using multi-band coloured images from JWST CEERS. These images were classified by citizen scientists in a new phase of the Galaxy Zoo project called GZ CEERS. Citizen scientists were asked whether a strong or weak bar was visible in the host galaxy. After considering multiple corrections for observational biases, we find that the bar fraction decreases with redshift in our volume-limited sample (n = 398); from $25^{+6}_{-4}$% at $0.5 < z < 1.0$ to $3^{+6}_{-1}$% at $3.0 < z < 4.0$. However, we argue it is appropriate to interpret these fractions as lower limits. Disentangling real changes in the bar fraction from detection biases remains challenging. Nevertheless, we find a significant number of bars up to $z = 2.5$. This implies that discs are dynamically cool or baryon-dominated, enabling them to host bars. This also suggests that bar-driven secular evolution likely plays an important role at higher redshifts. When we distinguish between strong and weak bars, we find that the weak bar fraction decreases with increasing redshift. In contrast, the strong bar fraction is constant between $0.5 < z < 2.5$. This implies that the strong bars found in this work are robust long-lived structures, unless the rate of bar destruction is similar to the rate of bar formation. Finally, our results are consistent with disc instabilities being the dominant mode of bar formation at lower redshifts, while bar formation through interactions and mergers is more common at higher redshifts.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Galaxy Zoo JWST: Up to 75% of discs are featureless at $3<z<7$
Authors:
R. J. Smethurst,
B. D. Simmons,
T. Géron,
H. Dickinson,
L. Fortson,
I. L. Garland,
S. Kruk,
S. M. Jewell,
C. J. Lintott,
J. S. Makechemu,
K. B. Mantha,
K. L. Masters,
D. O'Ryan,
H. Roberts,
M. R. Thorne,
M. Walmsley,
M. Calabrò,
B. Holwerda,
J. S. Kartaltepe,
A. M. Koekemoer,
Y. Lyu,
R. Lucas,
F. Pacucci,
M. Tarrasse
Abstract:
We have not yet observed the epoch at which disc galaxies emerge in the Universe. While high-$z$ measurements of large-scale features such as bars and spiral arms trace the evolution of disc galaxies, such methods cannot directly quantify featureless discs in the early Universe. Here we identify a substantial population of apparently featureless disc galaxies in the Cosmic Evolution Early Release…
▽ More
We have not yet observed the epoch at which disc galaxies emerge in the Universe. While high-$z$ measurements of large-scale features such as bars and spiral arms trace the evolution of disc galaxies, such methods cannot directly quantify featureless discs in the early Universe. Here we identify a substantial population of apparently featureless disc galaxies in the Cosmic Evolution Early Release Science (CEERS) survey by combining quantitative visual morphologies of $\sim 7,000$ galaxies from the Galaxy Zoo JWST CEERS project with a public catalogue of expert visual and parametric morphologies. While the highest-redshift featured disc we identify is at $z_{\rm{phot}}=5.5$, the highest-redshift featureless disc we identify is at $z_{\rm{phot}}=7.4$. The distribution of Sérsic indices for these featureless systems suggests that they truly are dynamically cold: disc-dominated systems have existed since at least $z\sim 7.4$. We place upper limits on the featureless disc fraction as a function of redshift, and show that up to $75\%$ of discs are featureless at $3.0<z<7.4$. This is a conservative limit assuming all galaxies in the sample truly lack features. With further consideration of redshift effects and observational constraints, we find the featureless disc fraction in CEERS imaging at these redshifts is more likely $\sim29-38\%$. We hypothesise that the apparent lack of features in a third of high-redshift discs is due to a higher gas fraction in the early Universe, which allows the discs to be resistant to buckling and instabilities.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Euclid: Quick Data Release (Q1) -- A census of dwarf galaxies across a range of distances and environments
Authors:
F. R. Marleau,
R. Habas,
D. Carollo,
C. Tortora,
P. -A. Duc,
E. Sola,
T. Saifollahi,
M. Fügenschuh,
M. Walmsley,
R. Zöller,
A. Ferré-Mateu,
M. Cantiello,
M. Urbano,
E. Saremi,
R. Ragusa,
R. Laureijs,
M. Hilker,
O. Müller,
M. Poulain,
R. F. Peletier,
S. J. Sprenger,
O. Marchal,
N. Aghanim,
B. Altieri,
A. Amara
, et al. (182 additional authors not shown)
Abstract:
The Euclid Q1 fields were selected for calibration purposes in cosmology and are therefore relatively devoid of nearby galaxies. However, this is precisely what makes them interesting fields in which to search for dwarf galaxies in local density environments. We take advantage of the unprecedented depth, spatial resolution, and field of view of the Euclid Quick Release (Q1) to build a census of dw…
▽ More
The Euclid Q1 fields were selected for calibration purposes in cosmology and are therefore relatively devoid of nearby galaxies. However, this is precisely what makes them interesting fields in which to search for dwarf galaxies in local density environments. We take advantage of the unprecedented depth, spatial resolution, and field of view of the Euclid Quick Release (Q1) to build a census of dwarf galaxies in these regions. We have identified dwarfs in a representative sample of 25 contiguous tiles in the Euclid Deep Field North (EDF-N), covering an area of 14.25 sq. deg. The dwarf candidates were identified using a semi-automatic detection method, based on properties measured by the Euclid pipeline and listed in the MER catalogue. A selection cut in surface brightness and magnitude was used to produce an initial dwarf candidate catalogue, followed by a cut in morphology and colour. This catalogue was visually classified to produce a final sample of dwarf candidates, including their morphology, number of nuclei, globular cluster (GC) richness, and presence of a blue compact centre. We identified 2674 dwarf candidates, corresponding to 188 dwarfs per sq. deg. The visual classification of the dwarfs reveals a slightly uneven morphological mix of 58% ellipticals and 42% irregulars, with very few potentially GC-rich (1.0%) and nucleated (4.0%) candidates but a noticeable fraction (6.9%) of dwarfs with blue compact centres. The distance distribution of 388 (15%) of the dwarfs with spectroscopic redshifts peaks at about 400 Mpc. Their stellar mass distribution confirms that our selection effectively identifies dwarfs while minimising contamination. The most prominent dwarf overdensities are dominated by dEs, while dIs are more evenly distributed. This work highlights Euclid's remarkable ability to detect and characterise dwarf galaxies across diverse masses, distances, and environments.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1). LEMON -- Lens Modelling with Neural networks. Automated and fast modelling of Euclid gravitational lenses with a singular isothermal ellipsoid mass profile
Authors:
Euclid Collaboration,
V. Busillo,
C. Tortora,
R. B. Metcalf,
J. W. Nightingale,
M. Meneghetti,
F. Gentile,
R. Gavazzi,
F. Zhong,
R. Li,
B. Clément,
G. Covone,
N. R. Napolitano,
F. Courbin,
M. Walmsley,
E. Jullo,
J. Pearson,
D. Scott,
A. M. C. Le Brun,
L. Leuzzi,
N. Aghanim,
B. Altieri,
A. Amara,
S. Andreon,
H. Aussel
, et al. (290 additional authors not shown)
Abstract:
The Euclid mission aims to survey around 14000 deg^{2} of extragalactic sky, providing around 10^{5} gravitational lens images. Modelling of gravitational lenses is fundamental to estimate the total mass of the lens galaxy, along with its dark matter content. Traditional modelling of gravitational lenses is computationally intensive and requires manual input. In this paper, we use a Bayesian neura…
▽ More
The Euclid mission aims to survey around 14000 deg^{2} of extragalactic sky, providing around 10^{5} gravitational lens images. Modelling of gravitational lenses is fundamental to estimate the total mass of the lens galaxy, along with its dark matter content. Traditional modelling of gravitational lenses is computationally intensive and requires manual input. In this paper, we use a Bayesian neural network, LEns MOdelling with Neural networks (LEMON), for modelling Euclid gravitational lenses with a singular isothermal ellipsoid mass profile. Our method estimates key lens mass profile parameters, such as the Einstein radius, while also predicting the light parameters of foreground galaxies and their uncertainties. We validate LEMON's performance on both mock Euclid data sets, real Euclidised lenses observed with Hubble Space Telescope (hereafter HST), and real Euclid lenses found in the Perseus ERO field, demonstrating the ability of LEMON to predict parameters of both simulated and real lenses. Results show promising accuracy and reliability in predicting the Einstein radius, axis ratio, position angle, effective radius, Sérsic index, and lens magnitude for simulated lens galaxies. The application to real data, including the latest Quick Release 1 strong lens candidates, provides encouraging results, particularly for the Einstein radius. We also verified that LEMON has the potential to accelerate traditional modelling methods, by giving to the classical optimiser the LEMON predictions as starting points, resulting in a speed-up of up to 26 times the original time needed to model a sample of gravitational lenses, a result that would be impossible with randomly initialised guesses. This work represents a significant step towards efficient, automated gravitational lens modelling, which is crucial for handling the large data volumes expected from Euclid.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1). The Strong Lensing Discovery Engine E -- Ensemble classification of strong gravitational lenses: lessons for Data Release 1
Authors:
Euclid Collaboration,
P. Holloway,
A. Verma,
M. Walmsley,
P. J. Marshall,
A. More,
T. E. Collett,
N. E. P. Lines,
L. Leuzzi,
A. Manjón-García,
S. H. Vincken,
J. Wilde,
R. Pearce-Casey,
I. T. Andika,
J. A. Acevedo Barroso,
T. Li,
A. Melo,
R. B. Metcalf,
K. Rojas,
B. Clément,
H. Degaudenzi,
F. Courbin,
G. Despali,
R. Gavazzi,
S. Schuldt
, et al. (321 additional authors not shown)
Abstract:
The Euclid Wide Survey (EWS) is expected to identify of order $100\,000$ galaxy-galaxy strong lenses across $14\,000$deg$^2$. The Euclid Quick Data Release (Q1) of $63.1$deg$^2$ Euclid images provides an excellent opportunity to test our lens-finding ability, and to verify the anticipated lens frequency in the EWS. Following the Q1 data release, eight machine learning networks from five teams were…
▽ More
The Euclid Wide Survey (EWS) is expected to identify of order $100\,000$ galaxy-galaxy strong lenses across $14\,000$deg$^2$. The Euclid Quick Data Release (Q1) of $63.1$deg$^2$ Euclid images provides an excellent opportunity to test our lens-finding ability, and to verify the anticipated lens frequency in the EWS. Following the Q1 data release, eight machine learning networks from five teams were applied to approximately one million images. This was followed by a citizen science inspection of a subset of around $100\,000$ images, of which $65\%$ received high network scores, with the remainder randomly selected. The top scoring outputs were inspected by experts to establish confident (grade A), likely (grade B), possible (grade C), and unlikely lenses. In this paper we combine the citizen science and machine learning classifiers into an ensemble, demonstrating that a combined approach can produce a purer and more complete sample than the original individual classifiers. Using the expert-graded subset as ground truth, we find that this ensemble can provide a purity of $52\pm2\%$ (grade A/B lenses) with $50\%$ completeness (for context, due to the rarity of lenses a random classifier would have a purity of $0.05\%$). We discuss future lessons for the first major Euclid data release (DR1), where the big-data challenges will become more significant and will require analysing more than $\sim300$ million galaxies, and thus time investment of both experts and citizens must be carefully managed.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1). The Strong Lensing Discovery Engine D -- Double-source-plane lens candidates
Authors:
Euclid Collaboration,
T. Li,
T. E. Collett,
M. Walmsley,
N. E. P. Lines,
K. Rojas,
J. W. Nightingale,
W. J. R. Enzi,
L. A. Moustakas,
C. Krawczyk,
R. Gavazzi,
G. Despali,
P. Holloway,
S. Schuldt,
F. Courbin,
R. B. Metcalf,
D. J. Ballard,
A. Verma,
B. Clément,
H. Degaudenzi,
A. Melo,
J. A. Acevedo Barroso,
L. Leuzzi,
A. Manjón-García,
R. Pearce-Casey
, et al. (313 additional authors not shown)
Abstract:
Strong gravitational lensing systems with multiple source planes are powerful tools for probing the density profiles and dark matter substructure of the galaxies. The ratio of Einstein radii is related to the dark energy equation of state through the cosmological scaling factor $β$. However, galaxy-scale double-source-plane lenses (DSPLs) are extremely rare. In this paper, we report the discovery…
▽ More
Strong gravitational lensing systems with multiple source planes are powerful tools for probing the density profiles and dark matter substructure of the galaxies. The ratio of Einstein radii is related to the dark energy equation of state through the cosmological scaling factor $β$. However, galaxy-scale double-source-plane lenses (DSPLs) are extremely rare. In this paper, we report the discovery of four new galaxy-scale double-source-plane lens candidates in the Euclid Quick Release 1 (Q1) data. These systems were initially identified through a combination of machine learning lens-finding models and subsequent visual inspection from citizens and experts. We apply the widely-used {\tt LensPop} lens forecasting model to predict that the full \Euclid survey will discover 1700 DSPLs, which scales to $6 \pm 3$ DSPLs in 63 deg$^2$, the area of Q1. The number of discoveries in this work is broadly consistent with this forecast. We present lens models for each DSPL and infer their $β$ values. Our initial Q1 sample demonstrates the promise of \Euclid to discover such rare objects.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1). The Strong Lensing Discovery Engine C -- Finding lenses with machine learning
Authors:
Euclid Collaboration,
N. E. P. Lines,
T. E. Collett,
M. Walmsley,
K. Rojas,
T. Li,
L. Leuzzi,
A. Manjón-García,
S. H. Vincken,
J. Wilde,
P. Holloway,
A. Verma,
R. B. Metcalf,
I. T. Andika,
A. Melo,
M. Melchior,
H. Domínguez Sánchez,
A. Díaz-Sánchez,
J. A. Acevedo Barroso,
B. Clément,
C. Krawczyk,
R. Pearce-Casey,
S. Serjeant,
F. Courbin,
G. Despali
, et al. (328 additional authors not shown)
Abstract:
Strong gravitational lensing has the potential to provide a powerful probe of astrophysics and cosmology, but fewer than 1000 strong lenses have been confirmed previously. With \ang{;;0.16} resolution covering a third of the sky, the \Euclid telescope will revolutionise strong lens finding, with \num{170000} lenses forecasted to be discovered amongst its 1.5 billion galaxies. We present an analysi…
▽ More
Strong gravitational lensing has the potential to provide a powerful probe of astrophysics and cosmology, but fewer than 1000 strong lenses have been confirmed previously. With \ang{;;0.16} resolution covering a third of the sky, the \Euclid telescope will revolutionise strong lens finding, with \num{170000} lenses forecasted to be discovered amongst its 1.5 billion galaxies. We present an analysis of the performance of five machine-learning models at finding strong gravitational lenses in the quick release of \Euclid data (Q1), covering 63\,deg$^{2}$. The models are validated with citizen scientists and expert visual inspection. We focus on the best performing network: a fine-tuned version of the \texttt{Zoobot} pretrained model, originally trained to classify galaxy morphologies in heterogeneous astronomical imaging surveys. Of the one million Q1 objects that \texttt{Zoobot} was tasked to find strong lenses within, the top 1000 ranked objects contained 122 grade A lenses (almost certain lenses), and 41 grade B lenses (probable lenses). A deeper search with the five networks combined with visual inspection discovered 250 (247) grade A (B) lenses, of which 224 (182) are ranked in the top \num{20000} by \texttt{Zoobot}. When extrapolated to the full \Euclid survey, the highest ranked one million images will contain \num{75000} grade A or B strong gravitational lenses.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1) The Strong Lensing Discovery Engine B -- Early strong lens candidates from visual inspection of high velocity dispersion galaxies
Authors:
Euclid Collaboration,
K. Rojas,
T. E. Collett,
J. A. Acevedo Barroso,
J. W. Nightingale,
D. Stern,
L. A. Moustakas,
S. Schuldt,
G. Despali,
A. Melo,
M. Walmsley,
D. J. Ballard,
W. J. R. Enzi,
T. Li,
A. Sainz de Murieta,
I. T. Andika,
B. Clément,
F. Courbin,
L. R. Ecker,
R. Gavazzi,
N. Jackson,
A. Kovács,
P. Matavulj,
M. Meneghetti,
S. Serjeant
, et al. (314 additional authors not shown)
Abstract:
We present a search for strong gravitational lenses in Euclid imaging with high stellar velocity dispersion ($σ_ν> 180$ km/s) reported by SDSS and DESI. We performed expert visual inspection and classification of $11\,660$ \Euclid images. We discovered 38 grade A and 40 grade B candidate lenses, consistent with an expected sample of $\sim$32. Palomar spectroscopy confirmed 5 lens systems, while DE…
▽ More
We present a search for strong gravitational lenses in Euclid imaging with high stellar velocity dispersion ($σ_ν> 180$ km/s) reported by SDSS and DESI. We performed expert visual inspection and classification of $11\,660$ \Euclid images. We discovered 38 grade A and 40 grade B candidate lenses, consistent with an expected sample of $\sim$32. Palomar spectroscopy confirmed 5 lens systems, while DESI spectra confirmed one, provided ambiguous results for another, and help to discard one. The \Euclid automated lens modeler modelled 53 candidates, confirming 38 as lenses, failing to model 9, and ruling out 6 grade B candidates. For the remaining 25 candidates we could not gather additional information. More importantly, our expert-classified non-lenses provide an excellent training set for machine learning lens classifiers. We create high-fidelity simulations of \Euclid lenses by painting realistic lensed sources behind the expert tagged (non-lens) luminous red galaxies. This training set is the foundation stone for the \Euclid galaxy-galaxy strong lensing discovery engine.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1): The Strong Lensing Discovery Engine A -- System overview and lens catalogue
Authors:
Euclid Collaboration,
M. Walmsley,
P. Holloway,
N. E. P. Lines,
K. Rojas,
T. E. Collett,
A. Verma,
T. Li,
J. W. Nightingale,
G. Despali,
S. Schuldt,
R. Gavazzi,
A. Melo,
R. B. Metcalf,
I. T. Andika,
L. Leuzzi,
A. Manjón-García,
R. Pearce-Casey,
S. H. Vincken,
J. Wilde,
V. Busillo,
C. Tortora,
J. A. Acevedo Barroso,
H. Dole,
L. R. Ecker
, et al. (350 additional authors not shown)
Abstract:
We present a catalogue of 497 galaxy-galaxy strong lenses in the Euclid Quick Release 1 data (63 deg$^2$). In the initial 0.45\% of Euclid's surveys, we double the total number of known lens candidates with space-based imaging. Our catalogue includes 250 grade A candidates, the vast majority of which (243) were previously unpublished. Euclid's resolution reveals rare lens configurations of scienti…
▽ More
We present a catalogue of 497 galaxy-galaxy strong lenses in the Euclid Quick Release 1 data (63 deg$^2$). In the initial 0.45\% of Euclid's surveys, we double the total number of known lens candidates with space-based imaging. Our catalogue includes 250 grade A candidates, the vast majority of which (243) were previously unpublished. Euclid's resolution reveals rare lens configurations of scientific value including double-source-plane lenses, edge-on lenses, complete Einstein rings, and quadruply-imaged lenses. We resolve lenses with small Einstein radii ($θ_{\rm E} < 1''$) in large numbers for the first time. These lenses are found through an initial sweep by deep learning models, followed by Space Warps citizen scientist inspection, expert vetting, and system-by-system modelling. Our search approach scales straightforwardly to Euclid Data Release 1 and, without changes, would yield approximately 7000 high-confidence (grade A or B) lens candidates by late 2026. Further extrapolating to the complete Euclid Wide Survey implies a likely yield of over 100000 high-confidence candidates, transforming strong lensing science.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images
Authors:
Euclid Collaboration,
G. Stevens,
S. Fotopoulou,
M. N. Bremer,
T. Matamoro Zatarain,
K. Jahnke,
B. Margalef-Bentabol,
M. Huertas-Company,
M. J. Smith,
M. Walmsley,
M. Salvato,
M. Mezcua,
A. Paulino-Afonso,
M. Siudek,
M. Talia,
F. Ricci,
W. Roster,
N. Aghanim,
B. Altieri,
S. Andreon,
H. Aussel,
C. Baccigalupi,
M. Baldi,
S. Bardelli,
P. Battaglia
, et al. (249 additional authors not shown)
Abstract:
Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an acti…
▽ More
Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an active galactic nucleus (AGN) feature intense, concentrated emission from gas accretion around supermassive black holes, superimposed on regular galactic light, while quasi-stellar objects (QSO) are the extreme case of the AGN emission dominating the galaxy. The challenge of identifying AGN and QSO has been discussed many times in the literature, often requiring multi-wavelength observations. This paper introduces a novel approach to identify AGN and QSO from a single image. Diffusion models have been recently developed in the machine-learning literature to generate realistic-looking images of everyday objects. Utilising the spatial resolving power of the Euclid VIS images, we created a diffusion model trained on one million sources, without using any source pre-selection or labels. The model learns to reconstruct light distributions of normal galaxies, since the population is dominated by them. We condition the prediction of the central light distribution by masking the central few pixels of each source and reconstruct the light according to the diffusion model. We further use this prediction to identify sources that deviate from this profile by examining the reconstruction error of the few central pixels regenerated in each source's core. Our approach, solely using VIS imaging, features high completeness compared to traditional methods of AGN and QSO selection, including optical, near-infrared, mid-infrared, and X-rays. [abridged]
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1), A first look at the fraction of bars in massive galaxies at $z<1$
Authors:
Euclid Collaboration,
M. Huertas-Company,
M. Walmsley,
M. Siudek,
P. Iglesias-Navarro,
J. H. Knapen,
S. Serjeant,
H. J. Dickinson,
L. Fortson,
I. Garland,
T. Géron,
W. Keel,
S. Kruk,
C. J. Lintott,
K. Mantha,
K. Masters,
D. O'Ryan,
J. J. Popp,
H. Roberts,
C. Scarlata,
J. S. Makechemu,
B. Simmons,
R. J. Smethurst,
A. Spindler,
M. Baes
, et al. (314 additional authors not shown)
Abstract:
Stellar bars are key structures in disc galaxies, driving angular momentum redistribution and influencing processes such as bulge growth and star formation. Quantifying the bar fraction as a function of redshift and stellar mass is therefore important for constraining the physical processes that drive disc formation and evolution across the history of the Universe. Leveraging the unprecedented res…
▽ More
Stellar bars are key structures in disc galaxies, driving angular momentum redistribution and influencing processes such as bulge growth and star formation. Quantifying the bar fraction as a function of redshift and stellar mass is therefore important for constraining the physical processes that drive disc formation and evolution across the history of the Universe. Leveraging the unprecedented resolution and survey area of the Euclid Q1 data release combined with the Zoobot deep-learning model trained on citizen-science labels, we identify 7711 barred galaxies with $M_* \gtrsim 10^{10}M_\odot$ in a magnitude-selected sample $I_E < 20.5$ spanning $63.1 deg^2$. We measure a mean bar fraction of $0.2-0.4$, consistent with prior studies. At fixed redshift, massive galaxies exhibit higher bar fractions, while lower-mass systems show a steeper decline with redshift, suggesting earlier disc assembly in massive galaxies. Comparisons with cosmological simulations (e.g., TNG50, Auriga) reveal a broadly consistent bar fraction, but highlight overpredictions for high-mass systems, pointing to potential over-efficiency in central stellar mass build-up in simulations. These findings demonstrate Euclid's transformative potential for galaxy morphology studies and underscore the importance of refining theoretical models to better reproduce observed trends. Future work will explore finer mass bins, environmental correlations, and additional morphological indicators.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1): First visual morphology catalogue
Authors:
Euclid Collaboration,
M. Walmsley,
M. Huertas-Company,
L. Quilley,
K. L. Masters,
S. Kruk,
K. A. Remmelgas,
J. J. Popp,
E. Romelli,
D. O'Ryan,
H. J. Dickinson,
C. J. Lintott,
S. Serjeant,
R. J. Smethurst,
B. Simmons,
J. Shingirai Makechemu,
I. L. Garland,
H. Roberts,
K. Mantha,
L. F. Fortson,
T. Géron,
W. Keel,
E. M. Baeten,
C. Macmillan,
J. Bovy
, et al. (330 additional authors not shown)
Abstract:
We present a detailed visual morphology catalogue for Euclid's Quick Release 1 (Q1). Our catalogue includes galaxy features such as bars, spiral arms, and ongoing mergers, for the 378000 bright ($I_E < 20.5$) or extended (area $\geq 700\,$pixels) galaxies in Q1. The catalogue was created by finetuning the Zoobot galaxy foundation models on annotations from an intensive one month campaign by Galaxy…
▽ More
We present a detailed visual morphology catalogue for Euclid's Quick Release 1 (Q1). Our catalogue includes galaxy features such as bars, spiral arms, and ongoing mergers, for the 378000 bright ($I_E < 20.5$) or extended (area $\geq 700\,$pixels) galaxies in Q1. The catalogue was created by finetuning the Zoobot galaxy foundation models on annotations from an intensive one month campaign by Galaxy Zoo volunteers. Our measurements are fully automated and hence fully scaleable. This catalogue is the first 0.4% of the approximately 100 million galaxies where Euclid will ultimately resolve detailed morphology.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Euclid Quick Data Release (Q1): From images to multiwavelength catalogues: the Euclid MERge Processing Function
Authors:
Euclid Collaboration,
E. Romelli,
M. Kümmel,
H. Dole,
J. Gracia-Carpio,
E. Merlin,
S. Galeotta,
Y. Fang,
M. Castellano,
F. Caro,
E. Soubrie,
L. Maurin,
R. Cabanac,
P. Dimauro,
M. Huertas-Company,
M. D. Lepinzan,
T. Vassallo,
M. Walmsley,
I. A. Zinchenko,
A. Boucaud,
A. Calabro,
V. Roscani,
A. Tramacere,
M. Douspis,
A. Fontana
, et al. (323 additional authors not shown)
Abstract:
The Euclid satellite is an ESA mission that was launched in July 2023. \Euclid is working in its regular observing mode with the target of observing an area of $14\,000~\text{deg}^2$ with two instruments, the Visible Camera (VIS) and the Near IR Spectrometer and Photometer (NISP) down to $I_{\rm E} = 24.5~\text{mag}$ ($10\, σ$) in the Euclid Wide Survey. Ground-based imaging data in the \textit{ug…
▽ More
The Euclid satellite is an ESA mission that was launched in July 2023. \Euclid is working in its regular observing mode with the target of observing an area of $14\,000~\text{deg}^2$ with two instruments, the Visible Camera (VIS) and the Near IR Spectrometer and Photometer (NISP) down to $I_{\rm E} = 24.5~\text{mag}$ ($10\, σ$) in the Euclid Wide Survey. Ground-based imaging data in the \textit{ugriz} bands complement the \Euclid data to enable photo-$z$ determination and VIS PSF modeling for week lensing analysis. Euclid investigates the distance-redshift relation and the evolution of cosmic structures by measuring shapes and redshifts of galaxies and clusters of galaxies out to $z\sim 2$. Generating the multi-wavelength catalogues from \Euclid and ground-based data is an essential part of the \Euclid data processing system. In the framework of the \Euclid Science Ground Segment (SGS), the aim of the MER Processing Function (PF) pipeline is to detect objects in the \Euclid imaging data, measure their properties, and MERge them into a single multi-wavelength catalogue. The MER PF pipeline performs source detection on both visible (VIS) and near-infrared (NIR) images and offers four different photometric measurements: Kron total flux, aperture photometry on PSF-matched images, template fitting photometry, and Sérsic fitting photometry. Furthermore, the MER PF pipeline measures a set of ancillary quantities, spanning from morphology to quality flags, to better characterise all detected sources. In this paper, we show how the MER PF pipeline is designed, detailing its main steps, and we show that the pipeline products meet the tight requirements that Euclid aims to achieve on photometric accuracy. We also present the other measurements (e.g. morphology) that are included in the OU-MER output catalogues and we list all output products coming out of the MER PF pipeline.
△ Less
Submitted 16 May, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
Euclid: Finding strong gravitational lenses in the Early Release Observations using convolutional neural networks
Authors:
B. C. Nagam,
J. A. Acevedo Barroso,
J. Wilde,
I. T. Andika,
A. Manjón-García,
R. Pearce-Casey,
D. Stern,
J. W. Nightingale,
L. A. Moustakas,
K. McCarthy,
E. Moravec,
L. Leuzzi,
K. Rojas,
S. Serjeant,
T. E. Collett,
P. Matavulj,
M. Walmsley,
B. Clément,
C. Tortora,
R. Gavazzi,
R. B. Metcalf,
C. M. O'Riordan,
G. Verdoes Kleijn,
L. V. E. Koopmans,
E. A. Valentijn
, et al. (170 additional authors not shown)
Abstract:
The Early Release Observations (ERO) from Euclid have detected several new galaxy-galaxy strong gravitational lenses, with the all-sky survey expected to find 170,000 new systems, greatly enhancing studies of dark matter, dark energy, and constraints on the cosmological parameters. As a first step, visual inspection of all galaxies in one of the ERO fields (Perseus) was carried out to identify can…
▽ More
The Early Release Observations (ERO) from Euclid have detected several new galaxy-galaxy strong gravitational lenses, with the all-sky survey expected to find 170,000 new systems, greatly enhancing studies of dark matter, dark energy, and constraints on the cosmological parameters. As a first step, visual inspection of all galaxies in one of the ERO fields (Perseus) was carried out to identify candidate strong lensing systems and compared to the predictions from Convolutional Neural Networks (CNNs). However, the entire ERO data set is too large for expert visual inspection. In this paper, we therefore extend the CNN analysis to the whole ERO data set, using different CNN architectures and methodologies. Using five CNN architectures, we identified 8,469 strong gravitational lens candidates from IE-band cutouts of 13 Euclid ERO fields, narrowing them to 97 through visual inspection, including 14 grade A and 31 grade B candidates. We present the spectroscopic confirmation of a strong gravitational lensing candidate, EUCLJ081705.61+702348.8. The foreground lensing galaxy, an early-type system at redshift z = 0.335, and the background source, a star-forming galaxy at redshift z = 1.475 with [O II] emission, are both identified. Lens modeling using the Euclid strong lens modeling pipeline reveals two distinct arcs in a lensing configuration, with an Einstein radius of 1.18 \pm 0.03 arcseconds, confirming the lensing nature of the system. These findings highlight the importance of a broad CNN search to efficiently reduce candidates, followed by visual inspection to eliminate false positives and achieve a high-purity sample of strong lenses in Euclid.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Euclid: A complete Einstein ring in NGC 6505
Authors:
C. M. O'Riordan,
L. J. Oldham,
A. Nersesian,
T. Li,
T. E. Collett,
D. Sluse,
B. Altieri,
B. Clément,
K. Vasan G. C.,
S. Rhoades,
Y. Chen,
T. Jones,
C. Adami,
R. Gavazzi,
S. Vegetti,
D. M. Powell,
J. A. Acevedo Barroso,
I. T. Andika,
R. Bhatawdekar,
A. R. Cooray,
G. Despali,
J. M. Diego,
L. R. Ecker,
A. Galan,
P. Gómez-Alvarez
, et al. (173 additional authors not shown)
Abstract:
We report the discovery of a complete Einstein ring around the elliptical galaxy NGC 6505, at $z=0.042$. This is the first strong gravitational lens discovered in Euclid and the first in an NGC object from any survey. The combination of the low redshift of the lens galaxy, the brightness of the source galaxy ($I_\mathrm{E}=18.1$ lensed, $I_\mathrm{E}=21.3$ unlensed), and the completeness of the ri…
▽ More
We report the discovery of a complete Einstein ring around the elliptical galaxy NGC 6505, at $z=0.042$. This is the first strong gravitational lens discovered in Euclid and the first in an NGC object from any survey. The combination of the low redshift of the lens galaxy, the brightness of the source galaxy ($I_\mathrm{E}=18.1$ lensed, $I_\mathrm{E}=21.3$ unlensed), and the completeness of the ring make this an exceptionally rare strong lens, unidentified until its observation by Euclid. We present deep imaging data of the lens from the Euclid Visible Camera (VIS) and Near-Infrared Spectrometer and Photometer (NISP) instruments, as well as resolved spectroscopy from the Keck Cosmic Web Imager (KCWI). The Euclid imaging in particular presents one of the highest signal-to-noise ratio optical/near-infrared observations of a strong gravitational lens to date. From the KCWI data we measure a source redshift of $z=0.406$. Using data from the Dark Energy Spectroscopic Instrument (DESI) we measure a velocity dispersion for the lens galaxy of $σ_\star=303\pm15\,\mathrm{kms}^{-1}$. We model the lens galaxy light in detail, revealing angular structure that varies inside the Einstein ring. After subtracting this light model from the VIS observation, we model the strongly lensed images, finding an Einstein radius of 2.5 arcsec, corresponding to $2.1\,\mathrm{kpc}$ at the redshift of the lens. This is small compared to the effective radius of the galaxy, $R_\mathrm{eff}\sim 12.3\,\mathrm{arcsec}$. Combining the strong lensing measurements with analysis of the spectroscopic data we estimate a dark matter fraction inside the Einstein radius of $f_\mathrm{DM} = (11.1_{-3.5}^{+5.4})\%$ and a stellar initial mass-function (IMF) mismatch parameter of $α_\mathrm{IMF} = 1.26_{-0.08}^{+0.05}$, indicating a heavier-than-Chabrier IMF in the centre of the galaxy.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Classifying merger stages with adaptive deep learning and cosmological hydrodynamical simulations
Authors:
Rosa de Graaff,
Berta Margalef-Bentabol,
Lingyu Wang,
Antonio La Marca,
William J. Pearson,
Vicente Rodriguez-Gomez,
Mike Walmsley
Abstract:
Hierarchical merging of galaxies plays an important role in galaxy formation and evolution. Mergers could trigger key evolutionary phases such as starburst activities and active accretion periods onto supermassive black holes at the centres of galaxies. We aim to detect mergers and merger stages (pre- and post-mergers) across cosmic history and test whether it is better to detect mergers and their…
▽ More
Hierarchical merging of galaxies plays an important role in galaxy formation and evolution. Mergers could trigger key evolutionary phases such as starburst activities and active accretion periods onto supermassive black holes at the centres of galaxies. We aim to detect mergers and merger stages (pre- and post-mergers) across cosmic history and test whether it is better to detect mergers and their merger stages simultaneously or hierarchically. In addition, we want to test the impact of merger time relative to the coalescence of merging galaxies. First, we generated realistic mock JWST images of simulated galaxies selected from the IllustrisTNG cosmological hydrodynamical simulations. Then we trained deep learning (DL) models in the Zoobot Python package to classify galaxies into merging/non-merging galaxies and their merger stages. We used two different set-ups: (i) two-stage, in which we classify galaxies into mergers and non-mergers and then classify the mergers into pre-mergers and post-mergers, and (ii) one-stage, in which merger/non-merger and merger stages are classified simultaneously. We found that the one-stage classification set-up moderately outperforms the two-stage set-up, offering better overall accuracy and precision, particularly for the non-merger class. Pre-mergers can be classified with the highest precision in both set-ups, possibly due to the more recognisable merging features and the presence of merging companions. The image signal-to-noise ratio affects the performance of the DL classifiers, but not much after a certain threshold is crossed. Both precision and recall of the classifiers depend strongly on merger time, finding it more difficult to identify true mergers observed at stages that are more distant to coalescence. For pre-mergers, we recommend selecting mergers which will merge in the next 0.4 Gyrs, to achieve a good balance between precision and recall.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
COSMOS-Web: The emergence of the Hubble Sequence
Authors:
M. Huertas-Company,
M. Shuntov,
Y. Dong,
M. Walmsley,
O. Ilbert,
H. J. McCracken,
H. B. Akins,
N. Allen,
C. M. Casey,
L. Costantin,
E. Daddi,
A. Dekel,
M. Franco,
I. L. Garland,
T. Géron,
G. Gozaliasl,
M. Hirschmann,
J. S. Kartaltepe,
A. M. Koekemoer,
C. Lintott,
D. Liu,
R. Lucas,
K. Masters,
F. Pacucci,
L. Paquereau
, et al. (7 additional authors not shown)
Abstract:
Leveraging the wide area coverage of the COSMOS-Web survey, we quantify the abundance of different morphological types from $z\sim 7$ with unprecedented statistics and establish robust constraints on the epoch of emergence of the Hubble sequence. We measure the global (spheroids, disk-dominated, bulge-dominated, peculiar) and resolved (stellar bars) morphologies for about 400,000 galaxies down to…
▽ More
Leveraging the wide area coverage of the COSMOS-Web survey, we quantify the abundance of different morphological types from $z\sim 7$ with unprecedented statistics and establish robust constraints on the epoch of emergence of the Hubble sequence. We measure the global (spheroids, disk-dominated, bulge-dominated, peculiar) and resolved (stellar bars) morphologies for about 400,000 galaxies down to F150W=27 using deep learning, representing a two-orders-of-magnitude increase over previous studies. We then provide reference Stellar Mass Functions (SMFs) of different morphologies between $z\sim 0.2$ and $z\sim 7$ and best-fit parameters to inform models of galaxy formation. All catalogs and data are made publicly available. (a)At redshift z > 4.5, the massive galaxy population ($\log M_*/M_\odot>10$) is dominated by disturbed morphologies (~70%) -- even in the optical rest frame -- and very compact objects (~30%) with effective radii smaller than ~500pc. This confirms that a significant fraction of the star formation at cosmic dawn occurs in very dense regions, although the stellar mass for these systems could be overestimated.(b)Galaxies with Hubble-type morphologies -- including bulge and disk-dominated galaxies -- arose rapidly around $z\sim 4$ and dominate the morphological diversity of massive galaxies as early as $z\sim 3$. (c)Using stellar bars as a proxy, we speculate that stellar disks in massive galaxies might have been common (>50%) among the star-forming population since cosmic noon ($z\sim2$-2.5) and formed as early as $z\sim 7$ (d)Massive quenched galaxies are predominantly bulge-dominated from z~4 onward, suggesting that morphological transformations briefly precede or are simultaneous to quenching mechanisms at the high-mass end. (e) Low-mass ($\log M_*/M_\odot<10$) quenched galaxies are typically disk-dominated, pointing to different quenching routes in the two ends of the stellar mass spectrum from cosmic dawn.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks
Authors:
Sneh Pandya,
Purvik Patel,
Brian D. Nord,
Mike Walmsley,
Aleksandra Ćiprijanović
Abstract:
Modern neural networks (NNs) often do not generalize well in the presence of a "covariate shift"; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels remains unchanged. In such cases, NN generalization can be reduced to a problem of learning more domain-invariant features. Domain adaptation (DA) methods include a r…
▽ More
Modern neural networks (NNs) often do not generalize well in the presence of a "covariate shift"; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels remains unchanged. In such cases, NN generalization can be reduced to a problem of learning more domain-invariant features. Domain adaptation (DA) methods include a range of techniques aimed at achieving this; however, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observations. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs). We find that SIDDA enhances the generalization capabilities of NNs, achieving up to a $\approx40\%$ improvement in classification accuracy on unlabeled target data. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group $D_N$, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA enhances model calibration on both source and target data--achieving over an order of magnitude improvement in the ECE and Brier score. SIDDA's versatility, combined with its automated approach to domain alignment, has the potential to advance multi-dataset studies by enabling the development of highly generalizable models.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data
Authors:
The Multimodal Universe Collaboration,
Jeroen Audenaert,
Micah Bowles,
Benjamin M. Boyd,
David Chemaly,
Brian Cherinka,
Ioana Ciucă,
Miles Cranmer,
Aaron Do,
Matthew Grayling,
Erin E. Hayes,
Tom Hehir,
Shirley Ho,
Marc Huertas-Company,
Kartheik G. Iyer,
Maja Jablonska,
Francois Lanusse,
Henry W. Leung,
Kaisey Mandel,
Juan Rafael Martínez-Galarza,
Peter Melchior,
Lucas Meyer,
Liam H. Parker,
Helen Qu,
Jeff Shen
, et al. (4 additional authors not shown)
Abstract:
We present the MULTIMODAL UNIVERSE, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, the MULTIMODAL UNIVERSE contains hundreds of millions of astronomical observations, constituting 100\,TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated sc…
▽ More
We present the MULTIMODAL UNIVERSE, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, the MULTIMODAL UNIVERSE contains hundreds of millions of astronomical observations, constituting 100\,TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and "metadata". In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics. This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the MULTIMODAL UNIVERSE and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Euclid: Searches for strong gravitational lenses using convolutional neural nets in Early Release Observations of the Perseus field
Authors:
R. Pearce-Casey,
B. C. Nagam,
J. Wilde,
V. Busillo,
L. Ulivi,
I. T. Andika,
A. Manjón-García,
L. Leuzzi,
P. Matavulj,
S. Serjeant,
M. Walmsley,
J. A. Acevedo Barroso,
C. M. O'Riordan,
B. Clément,
C. Tortora,
T. E. Collett,
F. Courbin,
R. Gavazzi,
R. B. Metcalf,
R. Cabanac,
H. M. Courtois,
J. Crook-Mansour,
L. Delchambre,
G. Despali,
L. R. Ecker
, et al. (182 additional authors not shown)
Abstract:
The Euclid Wide Survey (EWS) is predicted to find approximately 170 000 galaxy-galaxy strong lenses from its lifetime observation of 14 000 deg^2 of the sky. Detecting this many lenses by visual inspection with professional astronomers and citizen scientists alone is infeasible. Machine learning algorithms, particularly convolutional neural networks (CNNs), have been used as an automated method of…
▽ More
The Euclid Wide Survey (EWS) is predicted to find approximately 170 000 galaxy-galaxy strong lenses from its lifetime observation of 14 000 deg^2 of the sky. Detecting this many lenses by visual inspection with professional astronomers and citizen scientists alone is infeasible. Machine learning algorithms, particularly convolutional neural networks (CNNs), have been used as an automated method of detecting strong lenses, and have proven fruitful in finding galaxy-galaxy strong lens candidates. We identify the major challenge to be the automatic detection of galaxy-galaxy strong lenses while simultaneously maintaining a low false positive rate. One aim of this research is to have a quantified starting point on the achieved purity and completeness with our current version of CNN-based detection pipelines for the VIS images of EWS. We select all sources with VIS IE < 23 mag from the Euclid Early Release Observation imaging of the Perseus field. We apply a range of CNN architectures to detect strong lenses in these cutouts. All our networks perform extremely well on simulated data sets and their respective validation sets. However, when applied to real Euclid imaging, the highest lens purity is just 11%. Among all our networks, the false positives are typically identifiable by human volunteers as, for example, spiral galaxies, multiple sources, and artefacts, implying that improvements are still possible, perhaps via a second, more interpretable lens selection filtering stage. There is currently no alternative to human classification of CNN-selected lens candidates. Given the expected 10^5 lensing systems in Euclid, this implies 10^6 objects for human classification, which while very large is not in principle intractable and not without precedent.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
The Galaxy Zoo Catalogs for the Galaxy And Mass Assembly (GAMA) Survey
Authors:
Benne W. Holwerda,
Clayton Robertson,
Kyle Cook,
Kevin A. Pimbblet,
Sarah Casura,
Anne E. Sansom,
Divya Patel,
Trevor Butrum,
David H. W. Glass,
Lee Kelvin,
Ivan K. Baldry,
Roberto De Propris,
Steven Bamford,
Karen Masters,
Maria Stone,
Tim Hardin,
Mike Walmsley,
Jochen Liske,
S M Rafee Adnan
Abstract:
Galaxy Zoo is an online project to classify morphological features in extra-galactic imaging surveys with public voting. In this paper, we compare the classifications made for two different surveys, the Dark Energy Spectroscopic Instrument (DESI) imaging survey and a part of the Kilo-Degree Survey (KiDS), in the equatorial fields of the Galaxy And Mass Assembly (GAMA) survey. Our aim is to cross-v…
▽ More
Galaxy Zoo is an online project to classify morphological features in extra-galactic imaging surveys with public voting. In this paper, we compare the classifications made for two different surveys, the Dark Energy Spectroscopic Instrument (DESI) imaging survey and a part of the Kilo-Degree Survey (KiDS), in the equatorial fields of the Galaxy And Mass Assembly (GAMA) survey. Our aim is to cross-validate and compare the classifications based on different imaging quality and depth.
We find that generally the voting agrees globally but with substantial scatter i.e. substantial differences for individual galaxies. There is a notable higher voting fraction in favor of ``smooth'' galaxies in the DESI+\rev{\sc zoobot} classifications, most likely due to the difference between imaging depth. DESI imaging is shallower and slightly lower resolution than KiDS and the Galaxy Zoo images do not reveal details such as disk features \rev{and thus are missed in the {\sc zoobot} training sample}. \rev{We check against expert visual classifications and find good agreement with KiDS-based Galaxy Zoo voting.}
We reproduce the results from Porter-Temple+ (2022), on the dependence of stellar mass, star-formation, and specific star-formation on the number of spiral arms. This shows that once corrected for redshift, the DESI Galaxy Zoo and KiDS Galaxy Zoo classifications agree well on population properties. The zoobot cross-validation increases confidence in its ability to compliment Galaxy Zoo classifications and its ability for transfer learning across surveys.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Galaxy Zoo: Morphologies based on UKIDSS NIR Imaging for 71,052 Galaxies
Authors:
Karen L. Masters,
Melanie Galloway,
Lucy Fortson,
Chris Lintott,
Mike Read,
Claudia Scarlata,
Brooke Simmons,
Mike Walmsley,
Kyle Willett
Abstract:
We present morphological classifications based on Galaxy Zoo analysis of 71,052 galaxies with imaging from the United Kingdom Infrared Telescope Infrared Deep Sky Survey (UKIDSS). Galaxies were selected out of the Galaxy Zoo 2 (GZ2) sample, so also have gri imaging from the Sloan Digital Sky Survey. An identical classification tree, and vote weighting/aggregation was applied to both UKIDSS and GZ2…
▽ More
We present morphological classifications based on Galaxy Zoo analysis of 71,052 galaxies with imaging from the United Kingdom Infrared Telescope Infrared Deep Sky Survey (UKIDSS). Galaxies were selected out of the Galaxy Zoo 2 (GZ2) sample, so also have gri imaging from the Sloan Digital Sky Survey. An identical classification tree, and vote weighting/aggregation was applied to both UKIDSS and GZ2 classifications enabling direct comparisons. With this Research Note we provide a public release of the GZ:UKIDSS morphologies and discuss some initial comparisons with GZ2.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Euclid: The Early Release Observations Lens Search Experiment
Authors:
J. A. Acevedo Barroso,
C. M. O'Riordan,
B. Clément,
C. Tortora,
T. E. Collett,
F. Courbin,
R. Gavazzi,
R. B. Metcalf,
V. Busillo,
I. T. Andika,
R. Cabanac,
H. M. Courtois,
J. Crook-Mansour,
L. Delchambre,
G. Despali,
L. R. Ecker,
A. Franco,
P. Holloway,
N. Jackson,
K. Jahnke,
G. Mahler,
L. Marchetti,
P. Matavulj,
A. Melo,
M. Meneghetti
, et al. (184 additional authors not shown)
Abstract:
We investigated the ability of the Euclid telescope to detect galaxy-scale gravitational lenses. To do so, we performed a systematic visual inspection of the $0.7\,\rm{deg}^2$ Euclid Early Release Observations data towards the Perseus cluster using both the high-resolution $I_{\scriptscriptstyle\rm E}$ band and the lower-resolution $Y_{\scriptscriptstyle\rm E}$, $J_{\scriptscriptstyle\rm E}$,…
▽ More
We investigated the ability of the Euclid telescope to detect galaxy-scale gravitational lenses. To do so, we performed a systematic visual inspection of the $0.7\,\rm{deg}^2$ Euclid Early Release Observations data towards the Perseus cluster using both the high-resolution $I_{\scriptscriptstyle\rm E}$ band and the lower-resolution $Y_{\scriptscriptstyle\rm E}$, $J_{\scriptscriptstyle\rm E}$, $H_{\scriptscriptstyle\rm E}$ bands. Each extended source brighter than magnitude 23 in $I_{\scriptscriptstyle\rm E}$ was inspected by 41 expert human classifiers. This amounts to $12\,086$ stamps of $10^{\prime\prime}\,\times\,10^{\prime\prime}$. We found $3$ grade A and $13$ grade B candidates. We assessed the validity of these $16$ candidates by modelling them and checking that they are consistent with a single source lensed by a plausible mass distribution. Five of the candidates pass this check, five others are rejected by the modelling, and six are inconclusive. Extrapolating from the five successfully modelled candidates, we infer that the full $14\,000\,{\rm deg}^2$ of the Euclid Wide Survey should contain $100\,000^{+70\,000}_{-30\,000}$ galaxy-galaxy lenses that are both discoverable through visual inspection and have valid lens models. This is consistent with theoretical forecasts of $170\,000$ discoverable galaxy-galaxy lenses in Euclid. Our five modelled lenses have Einstein radii in the range $0.\!\!^{\prime\prime}68\,<\,θ_\mathrm{E}\,<1.\!\!^{\prime\prime}24$, but their Einstein radius distribution is on the higher side when compared to theoretical forecasts. This suggests that our methodology is likely missing small-Einstein-radius systems. Whilst it is implausible to visually inspect the full Euclid dataset, our results corroborate the promise that Euclid will ultimately deliver a sample of around $10^5$ galaxy-scale lenses.
△ Less
Submitted 2 May, 2025; v1 submitted 12 August, 2024;
originally announced August 2024.
-
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy
Authors:
Kartheik G. Iyer,
Mikaeel Yunus,
Charles O'Neill,
Christine Ye,
Alina Hyk,
Kiera McCormick,
Ioana Ciuca,
John F. Wu,
Alberto Accomazzi,
Simone Astarita,
Rishabh Chakrabarty,
Jesse Cranney,
Anjalie Field,
Tirthankar Ghosal,
Michele Ginolfi,
Marc Huertas-Company,
Maja Jablonska,
Sandor Kruk,
Huiling Liu,
Gabriel Marchidan,
Rohit Mistry,
J. P. Naiman,
J. E. G. Peek,
Mugdha Polimera,
Sergio J. Rodriguez
, et al. (5 additional authors not shown)
Abstract:
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.…
▽ More
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 350,000 peer-reviewed papers from the Astrophysics Data System (ADS), Pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool's versatility through case studies, showcasing its application in various research scenarios. The system's performance is evaluated using custom benchmarks, including single-paper and multi-paper tasks. Beyond literature review, Pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g. in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying AI to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Galaxy Zoo DESI: large-scale bars as a secular mechanism for triggering AGN
Authors:
Izzy L. Garland,
Mike Walmsley,
Maddie S. Silcock,
Leah M. Potts,
Josh Smith,
Brooke D. Simmons,
Chris J. Lintott,
Rebecca J. Smethurst,
James M. Dawson,
William C. Keel,
Sandor Kruk,
Kameswara Bharadwaj Mantha,
Karen L. Masters,
David O'Ryan,
Jürgen J. Popp,
Matthew R. Thorne
Abstract:
Despite the evidence that supermassive black holes (SMBHs) co-evolve with their host galaxy, and that most of the growth of these SMBHs occurs via merger-free processes, the underlying mechanisms which drive this secular co-evolution are poorly understood. We investigate the role that both strong and weak large-scale galactic bars play in mediating this relationship. Using 72,940 disc galaxies in…
▽ More
Despite the evidence that supermassive black holes (SMBHs) co-evolve with their host galaxy, and that most of the growth of these SMBHs occurs via merger-free processes, the underlying mechanisms which drive this secular co-evolution are poorly understood. We investigate the role that both strong and weak large-scale galactic bars play in mediating this relationship. Using 72,940 disc galaxies in a volume-limited sample from Galaxy Zoo DESI, we analyse the active galactic nucleus (AGN) fraction in strongly barred, weakly barred, and unbarred galaxies up to z = 0.1 over a range of stellar masses and colours. After controlling for stellar mass and colour, we find that the optically selected AGN fraction is 31.6 +/- 0.9 per cent in strongly barred galaxies, 23.3 +/- 0.8 per cent in weakly barred galaxies, and 14.2 +/- 0.6 per cent in unbarred disc galaxies. These are highly statistically robust results, strengthening the tantalising results in earlier works. Strongly barred galaxies have a higher fraction of AGNs than weakly barred galaxies, which in turn have a higher fraction than unbarred galaxies. Thus, while bars are not required in order to grow a SMBH in a disc galaxy, large-scale galactic bars appear to facilitate AGN fuelling, and the presence of a strong bar makes a disc galaxy more than twice as likely to host an AGN than an unbarred galaxy at all galaxy stellar masses and colours.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Euclid. I. Overview of the Euclid mission
Authors:
Euclid Collaboration,
Y. Mellier,
Abdurro'uf,
J. A. Acevedo Barroso,
A. Achúcarro,
J. Adamek,
R. Adam,
G. E. Addison,
N. Aghanim,
M. Aguena,
V. Ajani,
Y. Akrami,
A. Al-Bahlawan,
A. Alavi,
I. S. Albuquerque,
G. Alestas,
G. Alguero,
A. Allaoui,
S. W. Allen,
V. Allevato,
A. V. Alonso-Tetilla,
B. Altieri,
A. Alvarez-Candal,
S. Alvi,
A. Amara
, et al. (1115 additional authors not shown)
Abstract:
The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14…
▽ More
The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14,000 deg^2 of extragalactic sky. In addition to accurate weak lensing and clustering measurements that probe structure formation over half of the age of the Universe, its primary probes for cosmology, these exquisite data will enable a wide range of science. This paper provides a high-level overview of the mission, summarising the survey characteristics, the various data-processing steps, and data products. We also highlight the main science objectives and expected performance.
△ Less
Submitted 24 September, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Scaling Laws for Galaxy Images
Authors:
Mike Walmsley,
Micah Bowles,
Anna M. M. Scaife,
Jason Shingirai Makechemu,
Alexander J. Gordon,
Annette M. N. Ferguson,
Robert G. Mann,
James Pearson,
Jürgen J. Popp,
Jo Bovy,
Josh Speagle,
Hugh Dickinson,
Lucy Fortson,
Tobias Géron,
Sandor Kruk,
Chris J. Lintott,
Kameswara Mantha,
Devina Mohan,
David O'Ryan,
Inigo V. Slijepevic
Abstract:
We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainab…
▽ More
We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainable parameters is effective only for some (typically more subjectively challenging) tasks. We then compare the downstream performance of finetuned models pretrained on either ImageNet-12k alone vs. additionally pretrained on our galaxy images. We achieve an average relative error rate reduction of 31% across 5 downstream tasks of scientific interest. Our finetuned models are more label-efficient and, unlike their ImageNet-12k-pretrained equivalents, often achieve linear transfer performance equal to that of end-to-end finetuning. We find relatively modest additional downstream benefits from scaling model size, implying that scaling alone is not sufficient to address our domain gap, and suggest that practitioners with qualitatively different images might benefit more from in-domain adaption followed by targeted downstream labelling.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Galaxy merger challenge: A comparison study between machine learning-based detection methods
Authors:
B. Margalef-Bentabol,
L. Wang,
A. La Marca,
C. Blanco-Prieto,
D. Chudy,
H. Domínguez-Sánchez,
A. D. Goulding,
A. Guzmán-Ortega,
M. Huertas-Company,
G. Martin,
W. J. Pearson,
V. Rodriguez-Gomez,
M. Walmsley,
R. W. Bickley,
C. Bottrell,
C. Conselice,
D. O'Ryan
Abstract:
Various galaxy merger detection methods have been applied to diverse datasets. However, it is difficult to understand how they compare. We aim to benchmark the relative performance of machine learning (ML) merger detection methods. We explore six leading ML methods using three main datasets. The first one (the training data) consists of mock observations from the IllustrisTNG simulations and allow…
▽ More
Various galaxy merger detection methods have been applied to diverse datasets. However, it is difficult to understand how they compare. We aim to benchmark the relative performance of machine learning (ML) merger detection methods. We explore six leading ML methods using three main datasets. The first one (the training data) consists of mock observations from the IllustrisTNG simulations and allows us to quantify the performance metrics of the detection methods. The second one consists of mock observations from the Horizon-AGN simulations, introduced to evaluate the performance of classifiers trained on different, but comparable data. The third one consists of real observations from the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) survey. For the binary classification task (mergers vs. non-mergers), all methods perform reasonably well in the domain of the training data. At $0.1<z<0.3$, precision and recall range between $\sim$70\% and 80\%, both of which decrease with increasing $z$ as expected (by $\sim$5\% for precision and $\sim$10\% for recall at $0.76<z<1.0$). When transferred to a different domain, the precision of all classifiers is only slightly reduced, but the recall is significantly worse (by $\sim$20-40\% depending on the method). Zoobot offers the best overall performance in terms of precision and F1 score. When applied to real HSC observations, all methods agree well with visual labels of clear mergers but can differ by more than an order of magnitude in predicting the overall fraction of major mergers. For the multi-class classification task to distinguish pre-, post- and non-mergers, none of the methods offer a good performance, which could be partly due to limitations in resolution and depth of the data. With the advent of better quality data (e.g. JWST and Euclid), it is important to improve our ability to detect mergers and distinguish between merger stages.
△ Less
Submitted 15 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Euclid preparation. XLIII. Measuring detailed galaxy morphologies for Euclid with machine learning
Authors:
Euclid Collaboration,
B. Aussel,
S. Kruk,
M. Walmsley,
M. Huertas-Company,
M. Castellano,
C. J. Conselice,
M. Delli Veneri,
H. Domínguez Sánchez,
P. -A. Duc,
U. Kuchner,
A. La Marca,
B. Margalef-Bentabol,
F. R. Marleau,
G. Stevens,
Y. Toba,
C. Tortora,
L. Wang,
N. Aghanim,
B. Altieri,
A. Amara,
S. Andreon,
N. Auricchio,
M. Baldi,
S. Bardelli
, et al. (233 additional authors not shown)
Abstract:
The Euclid mission is expected to image millions of galaxies with high resolution, providing an extensive dataset to study galaxy evolution. We investigate the application of deep learning to predict the detailed morphologies of galaxies in Euclid using Zoobot a convolutional neural network pretrained with 450000 galaxies from the Galaxy Zoo project. We adapted Zoobot for emulated Euclid images, g…
▽ More
The Euclid mission is expected to image millions of galaxies with high resolution, providing an extensive dataset to study galaxy evolution. We investigate the application of deep learning to predict the detailed morphologies of galaxies in Euclid using Zoobot a convolutional neural network pretrained with 450000 galaxies from the Galaxy Zoo project. We adapted Zoobot for emulated Euclid images, generated based on Hubble Space Telescope COSMOS images, and with labels provided by volunteers in the Galaxy Zoo: Hubble project. We demonstrate that the trained Zoobot model successfully measures detailed morphology for emulated Euclid images. It effectively predicts whether a galaxy has features and identifies and characterises various features such as spiral arms, clumps, bars, disks, and central bulges. When compared to volunteer classifications Zoobot achieves mean vote fraction deviations of less than 12% and an accuracy above 91% for the confident volunteer classifications across most morphology types. However, the performance varies depending on the specific morphological class. For the global classes such as disk or smooth galaxies, the mean deviations are less than 10%, with only 1000 training galaxies necessary to reach this performance. For more detailed structures and complex tasks like detecting and counting spiral arms or clumps, the deviations are slightly higher, around 12% with 60000 galaxies used for training. In order to enhance the performance on complex morphologies, we anticipate that a larger pool of labelled galaxies is needed, which could be obtained using crowdsourcing. Finally, our findings imply that the model can be effectively adapted to new morphological labels. We demonstrate this adaptability by applying Zoobot to peculiar galaxies. In summary, our trained Zoobot CNN can readily predict morphological catalogues for Euclid images.
△ Less
Submitted 20 September, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Transfer learning for galaxy feature detection: Finding Giant Star-forming Clumps in low redshift galaxies using Faster R-CNN
Authors:
Jürgen Popp,
Hugh Dickinson,
Stephen Serjeant,
Mike Walmsley,
Dominic Adams,
Lucy Fortson,
Kameswara Mantha,
Vihang Mehta,
James M. Dawson,
Sandor Kruk,
Brooke Simmons
Abstract:
Giant Star-forming Clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z>1) galaxies but their formation and role in galaxy evolution remain unclear. High-resolution observations of low-redshift clumpy galaxy analogues are rare and restricted to a limited set of galaxies but the increasing availability of wide-field galaxy survey data makes the detecti…
▽ More
Giant Star-forming Clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z>1) galaxies but their formation and role in galaxy evolution remain unclear. High-resolution observations of low-redshift clumpy galaxy analogues are rare and restricted to a limited set of galaxies but the increasing availability of wide-field galaxy survey data makes the detection of large clumpy galaxy samples increasingly feasible. Deep Learning, and in particular CNNs, have been successfully applied to image classification tasks in astrophysical data analysis. However, one application of DL that remains relatively unexplored is that of automatically identifying and localising specific objects or features in astrophysical imaging data. In this paper we demonstrate the feasibility of using Deep learning-based object detection models to localise GSFCs in astrophysical imaging data. We apply the Faster R-CNN object detection framework (FRCNN) to identify GSFCs in low redshift (z<0.3) galaxies. Unlike other studies, we train different FRCNN models not on simulated images with known labels but on real observational data that was collected by the Sloan Digital Sky Survey Legacy Survey and labelled by volunteers from the citizen science project `Galaxy Zoo: Clump Scout'. The FRCNN model relies on a CNN component as a `backbone' feature extractor. We show that CNNs, that have been pre-trained for image classification using astrophysical images, outperform those that have been pre-trained on terrestrial images. In particular, we compare a domain-specific CNN -`Zoobot' - with a generic classification backbone and find that Zoobot achieves higher detection performance and also requires smaller training data sets to do so. Our final model is capable of producing GSFC detections with a completeness and purity of >=0.8 while only being trained on ~5,000 galaxy images.
△ Less
Submitted 1 April, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Rare Galaxy Classes Identified In Foundation Model Representations
Authors:
Mike Walmsley,
Anna M. M. Scaife
Abstract:
We identify rare and visually distinctive galaxy populations by searching for structure within the learned representations of pretrained models. We show that these representations arrange galaxies by appearance in patterns beyond those needed to predict the pretraining labels. We design a clustering approach to isolate specific local patterns, revealing groups of galaxies with rare and scientifica…
▽ More
We identify rare and visually distinctive galaxy populations by searching for structure within the learned representations of pretrained models. We show that these representations arrange galaxies by appearance in patterns beyond those needed to predict the pretraining labels. We design a clustering approach to isolate specific local patterns, revealing groups of galaxies with rare and scientifically-interesting morphologies.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Deep Learning Segmentation of Spiral Arms and Bars
Authors:
Mike Walmsley,
Ashley Spindler
Abstract:
We present the first deep learning model for segmenting galactic spiral arms and bars. In a blinded assessment by expert astronomers, our predicted spiral arm masks are preferred over both current automated methods (99% of evaluations) and our original volunteer labels (79% of evaluations). Experts rated our spiral arm masks as `mostly good' to `perfect' in 89% of evaluations. Bar lengths triviall…
▽ More
We present the first deep learning model for segmenting galactic spiral arms and bars. In a blinded assessment by expert astronomers, our predicted spiral arm masks are preferred over both current automated methods (99% of evaluations) and our original volunteer labels (79% of evaluations). Experts rated our spiral arm masks as `mostly good' to `perfect' in 89% of evaluations. Bar lengths trivially derived from our predicted bar masks are in excellent agreement with a dedicated crowdsourcing project. The pixelwise precision of our masks, previously impossible at scale, will underpin new research into how spiral arms and bars evolve.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers
Authors:
D. Huppenkothen,
M. Ntampaka,
M. Ho,
M. Fouesneau,
B. Nord,
J. E. G. Peek,
M. Walmsley,
J. F. Wu,
C. Avestruz,
T. Buck,
M. Brescia,
D. P. Finkbeiner,
A. D. Goulding,
T. Kacprzak,
P. Melchior,
M. Pasquato,
N. Ramachandra,
Y. -S. Ting,
G. van de Ven,
S. Villar,
V. A. Villar,
E. Zinger
Abstract:
Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best pr…
▽ More
Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best practices, challenges, and drawbacks, which, at present, are often reported on incompletely in the astrophysical literature. With this paper, we aim to provide a primer to the astronomical community, including authors, reviewers, and editors, on how to implement machine learning models and report their results in a way that ensures the accuracy of the results, reproducibility of the findings, and usefulness of the method.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Galaxy mergers in Subaru HSC-SSP: a deep representation learning approach for identification and the role of environment on merger incidence
Authors:
Kiyoaki Christopher Omori,
Connor Bottrell,
Mike Walmsley,
Hassen M. Yesuf,
Andy D. Goulding,
Xuheng Ding,
Gergö Popping,
John D. Silverman,
Tsutomu T. Takeuchi,
Yoshiki Toba
Abstract:
We take a deep learning-based approach for galaxy merger identification in Subaru HSC-SSP, specifically through the use of deep representation learning and fine-tuning, with the aim of creating a pure and complete merger sample within the HSC-SSP survey. We can use this merger sample to conduct studies on how mergers affect galaxy evolution. We use Zoobot, a deep learning representation learning m…
▽ More
We take a deep learning-based approach for galaxy merger identification in Subaru HSC-SSP, specifically through the use of deep representation learning and fine-tuning, with the aim of creating a pure and complete merger sample within the HSC-SSP survey. We can use this merger sample to conduct studies on how mergers affect galaxy evolution. We use Zoobot, a deep learning representation learning model pre-trained on citizen science votes on Galaxy Zoo DeCALS images. We fine-tune Zoobot for the purpose of merger classification of images of SDSS and GAMA galaxies in HSC-SSP PDR 3. Fine-tuning is done using 1200 synthetic HSC-SSP images of galaxies from the TNG simulation. We then find merger probabilities on observed HSC images using the fine-tuned model. Using our merger probabilities, we examine the relationship between merger activity and environment. We find that our fine-tuned model returns an accuracy on the synthetic validation data of 76%. This number is comparable to those of previous studies where convolutional neural networks were trained with simulation images, but with our work requiring a far smaller number of training samples. For our synthetic data, our model is able to achieve completeness and precision values of 80%. In addition, our model is able to correctly classify both mergers and non-mergers of diverse morphologies and structures, including those at various stages and mass ratios, while distinguishing between projections and merger pairs. For the relation between galaxy mergers and environment, we find two distinct trends. Using stellar mass overdensity estimates for TNG simulations and observations using SDSS and GAMA, we find that galaxies with higher merger scores favor lower density environments on scales of 0.5 to 8 h^-1 Mpc. However, below these scales in the simulations, we find that galaxies with higher merger scores favor higher density environments.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Galaxy Zoo DESI: Detailed Morphology Measurements for 8.7M Galaxies in the DESI Legacy Imaging Surveys
Authors:
Mike Walmsley,
Tobias Géron,
Sandor Kruk,
Anna M. M. Scaife,
Chris Lintott,
Karen L. Masters,
James M. Dawson,
Hugh Dickinson,
Lucy Fortson,
Izzy L. Garland,
Kameswara Mantha,
David O'Ryan,
Jürgen Popp,
Brooke Simmons,
Elisabeth M. Baeten,
Christine Macmillan
Abstract:
We present detailed morphology measurements for 8.67 million galaxies in the DESI Legacy Imaging Surveys (DECaLS, MzLS, and BASS, plus DES). These are automated measurements made by deep learning models trained on Galaxy Zoo volunteer votes. Our models typically predict the fraction of volunteers selecting each answer to within 5-10\% for every answer to every GZ question. The models are trained o…
▽ More
We present detailed morphology measurements for 8.67 million galaxies in the DESI Legacy Imaging Surveys (DECaLS, MzLS, and BASS, plus DES). These are automated measurements made by deep learning models trained on Galaxy Zoo volunteer votes. Our models typically predict the fraction of volunteers selecting each answer to within 5-10\% for every answer to every GZ question. The models are trained on newly-collected votes for DESI-LS DR8 images as well as historical votes from GZ DECaLS. We also release the newly-collected votes. Extending our morphology measurements outside of the previously-released DECaLS/SDSS intersection increases our sky coverage by a factor of 4 (5,000 to 19,000 deg$^2$) and allows for full overlap with complementary surveys including ALFALFA and MaNGA.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Astronomaly at scale: searching for anomalies amongst 4 million galaxies
Authors:
Verlon Etsebeth,
Michelle Lochner,
Mike Walmsley,
Margherita Grespan
Abstract:
Modern astronomical surveys are producing datasets of unprecedented size and richness, increasing the potential for high-impact scientific discovery. This possibility, coupled with the challenge of exploring a large number of sources, has led to the development of novel machine-learning-based anomaly detection approaches, such as Astronomaly. For the first time, we test the scalability of Astronom…
▽ More
Modern astronomical surveys are producing datasets of unprecedented size and richness, increasing the potential for high-impact scientific discovery. This possibility, coupled with the challenge of exploring a large number of sources, has led to the development of novel machine-learning-based anomaly detection approaches, such as Astronomaly. For the first time, we test the scalability of Astronomaly by applying it to almost 4 million images of galaxies from the Dark Energy Camera Legacy Survey. We use a trained deep learning algorithm to learn useful representations of the images and pass these to the anomaly detection algorithm isolation forest, coupled with Astronomaly's active learning method, to discover interesting sources. We find that data selection criteria have a significant impact on the trade-off between finding rare sources such as strong lenses and introducing artefacts into the dataset. We demonstrate that active learning is required to identify the most interesting sources and reduce artefacts, while anomaly detection methods alone are insufficient. Using Astronomaly, we find 1635 anomalies among the top 2000 sources in the dataset after applying active learning, including eight strong gravitational lens candidates, 1609 galaxy merger candidates, and 18 previously unidentified sources exhibiting highly unusual morphology. Our results show that by leveraging the human-machine interface, Astronomaly is able to rapidly identify sources of scientific interest even in large datasets.
△ Less
Submitted 29 March, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Radio Galaxy Zoo: Towards building the first multi-purpose foundation model for radio astronomy with self-supervised learning
Authors:
Inigo V. Slijepcevic,
Anna M. M. Scaife,
Mike Walmsley,
Micah Bowles,
O. Ivy Wong,
Stanislav S. Shabala,
Sarah V. White
Abstract:
In this work, we apply self-supervised learning with instance differentiation to learn a robust, multi-purpose representation for image analysis of resolved extragalactic continuum images. We train a multi-use model which compresses our unlabelled data into a structured, low dimensional representation which can be used for a variety of downstream tasks (e.g. classification, similarity search). We…
▽ More
In this work, we apply self-supervised learning with instance differentiation to learn a robust, multi-purpose representation for image analysis of resolved extragalactic continuum images. We train a multi-use model which compresses our unlabelled data into a structured, low dimensional representation which can be used for a variety of downstream tasks (e.g. classification, similarity search). We exceed baseline supervised Fanaroff-Riley classification performance by a statistically significant margin, with our model reducing the test set error by up to half. Our model is also able to maintain high classification accuracy with very few labels, with only 7.79% error when only using 145 labels. We further demonstrate that by using our foundation model, users can efficiently trade off compute, human labelling cost and test set accuracy according to their respective budgets, allowing for efficient classification in a wide variety of scenarios. We highlight the generalizability of our model by showing that it enables accurate classification in a label scarce regime with data from the new MIGHTEE survey without any hyper-parameter tuning, where it improves upon the baseline by ~8%. Visualizations of our labelled and un-labelled data show that our model's representation space is structured with respect to physical properties of the sources, such as angular source extent. We show that the learned representation is scientifically useful even if no labels are available by performing a similarity search, finding hybrid sources in the RGZ DR1 data-set without any labels. We show that good augmentation design and hyper-parameter choice can help achieve peak performance, while emphasising that optimal hyper-parameters are not required to obtain benefits from self-supervised pre-training.
△ Less
Submitted 18 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Radio Galaxy Zoo EMU: Towards a Semantic Radio Galaxy Morphology Taxonomy
Authors:
Micah Bowles,
Hongming Tang,
Eleni Vardoulaki,
Emma L. Alexander,
Yan Luo,
Lawrence Rudnick,
Mike Walmsley,
Fiona Porter,
Anna M. M. Scaife,
Inigo Val Slijepcevic,
Elizabeth A. K. Adams,
Alexander Drabent,
Thomas Dugdale,
Gülay Gürkan,
Andrew M. Hopkins,
Eric F. Jimenez-Andrade,
Denis A. Leahy,
Ray P. Norris,
Syed Faisal ur Rahman,
Xichang Ouyang,
Gary Segal,
Stanislav S. Shabala,
O. Ivy Wong
Abstract:
We present a novel natural language processing (NLP) approach to deriving plain English descriptors for science cases otherwise restricted by obfuscating technical terminology. We address the limitations of common radio galaxy morphology classifications by applying this approach. We experimentally derive a set of semantic tags for the Radio Galaxy Zoo EMU (Evolutionary Map of the Universe) project…
▽ More
We present a novel natural language processing (NLP) approach to deriving plain English descriptors for science cases otherwise restricted by obfuscating technical terminology. We address the limitations of common radio galaxy morphology classifications by applying this approach. We experimentally derive a set of semantic tags for the Radio Galaxy Zoo EMU (Evolutionary Map of the Universe) project and the wider astronomical community. We collect 8,486 plain English annotations of radio galaxy morphology, from which we derive a taxonomy of tags. The tags are plain English. The result is an extensible framework which is more flexible, more easily communicated, and more sensitive to rare feature combinations which are indescribable using the current framework of radio astronomy classifications.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Harnessing the Hubble Space Telescope Archives: A Catalogue of 21,926 Interacting Galaxies
Authors:
David O'Ryan,
Bruno Merín,
Brooke D. Simmons,
Antónia Vojteková,
Anna Anku,
Mike Walmsley,
Izzy L. Garland,
Tobias Géron,
William Keel,
Sandor Kruk,
Chris J. Lintott,
Kameswara Bharadwaj Mantha,
Karen L. Masters,
Jan Reerink,
Rebecca J. Smethurst,
Matthew R. Thorne
Abstract:
Mergers play a complex role in galaxy formation and evolution. Continuing to improve our understanding of these systems require ever larger samples, which can be difficult (even impossible) to select from individual surveys. We use the new platform ESA Datalabs to assemble a catalogue of interacting galaxies from the Hubble Space Telescope science archives; this catalogue is larger than previously…
▽ More
Mergers play a complex role in galaxy formation and evolution. Continuing to improve our understanding of these systems require ever larger samples, which can be difficult (even impossible) to select from individual surveys. We use the new platform ESA Datalabs to assemble a catalogue of interacting galaxies from the Hubble Space Telescope science archives; this catalogue is larger than previously published catalogues by nearly an order of magnitude. In particular, we apply the Zoobot convolutional neural network directly to the entire public archive of HST $F814W$ images and make probabilistic interaction predictions for 126 million sources from the Hubble Source Catalogue. We employ a combination of automated visual representation and visual analysis to identify a clean sample of 21,926 interacting galaxy systems, mostly with $z < 1$. Sixty five percent of these systems have no previous references in either the NASA Extragalactic Database or Simbad. In the process of removing contamination, we also discover many other objects of interest, such as gravitational lenses, edge-on protoplanetary disks, and `backlit' overlapping galaxies. We briefly investigate the basic properties of this sample, and we make our catalogue publicly available for use by the community. In addition to providing a new catalogue of scientifically interesting objects imaged by HST, this work also demonstrates the power of the ESA Datalabs tool to facilitate substantial archival analysis without placing a high computational or storage burden on the end user.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Galaxy Zoo: Kinematics of strongly and weakly barred galaxies
Authors:
Tobias Géron,
Rebecca J. Smethurst,
Chris Lintott,
Sandor Kruk,
Karen L. Masters,
Brooke Simmons,
Kameswara Bharadwaj Mantha,
Mike Walmsley,
L. Garma-Oehmichen,
Niv Drory,
Richard R. Lane
Abstract:
We study the bar pattern speeds and corotation radii of 225 barred galaxies, using IFU data from MaNGA and the Tremaine-Weinberg method. Our sample, which is divided between strongly and weakly barred galaxies identified via Galaxy Zoo, is the largest that this method has been applied to. We find lower pattern speeds for strongly barred galaxies than for weakly barred galaxies. As simulations show…
▽ More
We study the bar pattern speeds and corotation radii of 225 barred galaxies, using IFU data from MaNGA and the Tremaine-Weinberg method. Our sample, which is divided between strongly and weakly barred galaxies identified via Galaxy Zoo, is the largest that this method has been applied to. We find lower pattern speeds for strongly barred galaxies than for weakly barred galaxies. As simulations show that the pattern speed decreases as the bar exchanges angular momentum with its host, these results suggest that strong bars are more evolved than weak bars. Interestingly, the corotation radius is not different between weakly and strongly barred galaxies, despite being proportional to bar length. We also find that the corotation radius is significantly different between quenching and star forming galaxies. Additionally, we find that strongly barred galaxies have significantly lower values for R, the ratio between the corotation radius and the bar radius, than weakly barred galaxies, despite a big overlap in both distributions. This ratio classifies bars into ultrafast bars (R < 1.0; 11% of our sample), fast bars (1.0 < R < 1.4; 27%) and slow bars (R > 1.4; 62%). Simulations show that R is correlated with the bar formation mechanism, so our results suggest that strong bars are more likely to be formed by different mechanisms than weak bars. Finally, we find a lower fraction of ultrafast bars than most other studies, which decreases the recently claimed tension with ΛCDM. However, the median value of R is still lower than what is predicted by simulations.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Galaxy And Mass Assembly: Galaxy Morphology in the Green Valley, Prominent rings and looser Spiral Arms
Authors:
Dominic Smith,
Lutz Haberzettl,
L. E. Porter,
Ren Porter-Temple,
Christopher P. A. Henry,
Benne Holwerda,
A. R. Lopez-Sanchez,
Steven Phillipps,
Alister W. Graham,
Sarah Brough,
Kevin A. Pimbblet,
Jochen Liske,
Lee S. Kelvin,
Clayton D. Robertson,
Wade Roemer,
Michael Walmsley,
David O'Ryan,
Tobias Geron
Abstract:
Galaxies broadly fall into two categories: star-forming (blue) galaxies and quiescent (red) galaxies. In between, one finds the less populated ``green valley". Some of these galaxies are suspected to be in the process of ceasing their star-formation through a gradual exhaustion of gas supply or already dead and are experiencing a rejuvenation of star-formation through fuel injection. We use the Ga…
▽ More
Galaxies broadly fall into two categories: star-forming (blue) galaxies and quiescent (red) galaxies. In between, one finds the less populated ``green valley". Some of these galaxies are suspected to be in the process of ceasing their star-formation through a gradual exhaustion of gas supply or already dead and are experiencing a rejuvenation of star-formation through fuel injection. We use the Galaxy And Mass Assembly database and the Galaxy Zoo citizen science morphological estimates to compare the morphology of galaxies in the green valley against those in the red sequence and blue cloud.
Our goal is to examine the structural differences within galaxies that fall in the green valley, and what brings them there. Previous results found disc features such as rings and lenses are more prominently represented in the green valley population. We revisit this with a similar sized data set of galaxies with morphology labels provided by the Galaxy Zoo for the GAMA fields based on new KiDS images. Our aim is to compare qualitatively the results from expert classification to that of citizen science.
We observe that ring structures are indeed found more commonly in green valley galaxies compared to their red and blue counterparts. We suggest that ring structures are a consequence of disc galaxies in the green valley actively exhibiting characteristics of fading discs and evolving disc morphology of galaxies. We note that the progression from blue to red correlates with loosening spiral arm structure.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
A New Task: Deriving Semantic Class Targets for the Physical Sciences
Authors:
Micah Bowles,
Hongming Tang,
Eleni Vardoulaki,
Emma L. Alexander,
Yan Luo,
Lawrence Rudnick,
Mike Walmsley,
Fiona Porter,
Anna M. M. Scaife,
Inigo Val Slijepcevic,
Gary Segal
Abstract:
We define deriving semantic class targets as a novel multi-modal task. By doing so, we aim to improve classification schemes in the physical sciences which can be severely abstracted and obfuscating. We address this task for upcoming radio astronomy surveys and present the derived semantic radio galaxy morphology class targets.
We define deriving semantic class targets as a novel multi-modal task. By doing so, we aim to improve classification schemes in the physical sciences which can be severely abstracted and obfuscating. We address this task for upcoming radio astronomy surveys and present the derived semantic radio galaxy morphology class targets.
△ Less
Submitted 27 October, 2022; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Galaxy Zoo: Clump Scout -- Design and first application of a two-dimensional aggregation tool for citizen science
Authors:
Hugh Dickinson,
Dominic Adams,
Vihang Mehta,
Claudia Scarlata,
Lucy Fortson,
Stephen Serjeant,
Coleman Krawczyk,
Sandor Kruk,
Chris Lintott,
Kameswara Mantha,
Brooke D. Simmons,
Mike Walmsley
Abstract:
Galaxy Zoo: Clump Scout is a web-based citizen science project designed to identify and spatially locate giant star forming clumps in galaxies that were imaged by the Sloan Digital Sky Survey Legacy Survey. We present a statistically driven software framework that is designed to aggregate two-dimensional annotations of clump locations provided by multiple independent Galaxy Zoo: Clump Scout volunt…
▽ More
Galaxy Zoo: Clump Scout is a web-based citizen science project designed to identify and spatially locate giant star forming clumps in galaxies that were imaged by the Sloan Digital Sky Survey Legacy Survey. We present a statistically driven software framework that is designed to aggregate two-dimensional annotations of clump locations provided by multiple independent Galaxy Zoo: Clump Scout volunteers and generate a consensus label that identifies the locations of probable clumps within each galaxy. The statistical model our framework is based on allows us to assign false-positive probabilities to each of the clumps we identify, to estimate the skill levels of each of the volunteers who contribute to Galaxy Zoo: Clump Scout and also to quantitatively assess the reliability of the consensus labels that are derived for each subject. We apply our framework to a dataset containing 3,561,454 two-dimensional points, which constitute 1,739,259 annotations of 85,286 distinct subjects provided by 20,999 volunteers. Using this dataset, we identify 128,100 potential clumps distributed among 44,126 galaxies. This dataset can be used to study the prevalence and demographics of giant star forming clumps in low-redshift galaxies. The code for our aggregation software framework is publicly available at: https://github.com/ou-astrophysics/BoxAggregator
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Learning useful representations for radio astronomy "in the wild" with contrastive learning
Authors:
Inigo Val Slijepcevic,
Anna M. M. Scaife,
Mike Walmsley,
Micah Bowles
Abstract:
Unknown class distributions in unlabelled astrophysical training data have previously been shown to detrimentally affect model performance due to dataset shift between training and validation sets. For radio galaxy classification, we demonstrate in this work that removing low angular extent sources from the unlabelled data before training produces qualitatively different training dynamics for a co…
▽ More
Unknown class distributions in unlabelled astrophysical training data have previously been shown to detrimentally affect model performance due to dataset shift between training and validation sets. For radio galaxy classification, we demonstrate in this work that removing low angular extent sources from the unlabelled data before training produces qualitatively different training dynamics for a contrastive model. By applying the model on an unlabelled data-set with unknown class balance and sub-population distribution to generate a representation space of radio galaxies, we show that with an appropriate cut threshold we can find a representation with FRI/FRII class separation approaching that of a supervised baseline explicitly trained to separate radio galaxies into these two classes. Furthermore we show that an excessively conservative cut threshold blocks any increase in validation accuracy. We then use the learned representation for the downstream task of performing a similarity search on rare hybrid sources, finding that the contrastive model can reliably return semantically similar samples, with the added bonus of finding duplicates which remain after pre-processing.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Towards Galaxy Foundation Models with Hybrid Contrastive Learning
Authors:
Mike Walmsley,
Inigo Val Slijepcevic,
Micah Bowles,
Anna M. M. Scaife
Abstract:
New astronomical tasks are often related to earlier tasks for which labels have already been collected. We adapt the contrastive framework BYOL to leverage those labels as a pretraining task while also enforcing augmentation invariance. For large-scale pretraining, we introduce GZ-Evo v0.1, a set of 96.5M volunteer responses for 552k galaxy images plus a further 1.34M comparable unlabelled galaxie…
▽ More
New astronomical tasks are often related to earlier tasks for which labels have already been collected. We adapt the contrastive framework BYOL to leverage those labels as a pretraining task while also enforcing augmentation invariance. For large-scale pretraining, we introduce GZ-Evo v0.1, a set of 96.5M volunteer responses for 552k galaxy images plus a further 1.34M comparable unlabelled galaxies. Most of the 206 GZ-Evo answers are unknown for any given galaxy, and so our pretraining task uses a Dirichlet loss that naturally handles unknown answers. GZ-Evo pretraining, with or without hybrid learning, improves on direct training even with plentiful downstream labels (+4% accuracy with 44k labels). Our hybrid pretraining/contrastive method further improves downstream accuracy vs. pretraining or contrastive learning, especially in the low-label transfer regime (+6% accuracy with 750 labels).
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift
Authors:
Inigo V. Slijepcevic,
Anna M. M. Scaife,
Mike Walmsley,
Micah Bowles,
Ivy Wong,
Stanislav S. Shabala,
Hongming Tang
Abstract:
In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy…
▽ More
In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.
△ Less
Submitted 4 May, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Quantifying Uncertainty in Deep Learning Approaches to Radio Galaxy Classification
Authors:
Devina Mohan,
Anna M. M. Scaife,
Fiona Porter,
Mike Walmsley,
Micah Bowles
Abstract:
In this work we use variational inference to quantify the degree of uncertainty in deep learning model predictions of radio galaxy classification. We show that the level of model posterior variance for individual test samples is correlated with human uncertainty when labelling radio galaxies. We explore the model performance and uncertainty calibration for different weight priors and suggest that…
▽ More
In this work we use variational inference to quantify the degree of uncertainty in deep learning model predictions of radio galaxy classification. We show that the level of model posterior variance for individual test samples is correlated with human uncertainty when labelling radio galaxies. We explore the model performance and uncertainty calibration for different weight priors and suggest that a sparse prior produces more well-calibrated uncertainty estimates. Using the posterior distributions for individual weights, we demonstrate that we can prune 30% of the fully-connected layer weights without significant loss of performance by removing the weights with the lowest signal-to-noise ratio. A larger degree of pruning can be achieved using a Fisher information based ranking, but both pruning methods affect the uncertainty calibration for Fanaroff-Riley type I and type II radio galaxies differently. Like other work in this field, we experience a cold posterior effect, whereby the posterior must be down-weighted to achieve good predictive performance. We examine whether adapting the cost function to accommodate model misspecification can compensate for this effect, but find that it does not make a significant difference. We also examine the effect of principled data augmentation and find that this improves upon the baseline but also does not compensate for the observed effect. We interpret this as the cold posterior effect being due to the overly effective curation of our training sample leading to likelihood misspecification, and raise this as a potential issue for Bayesian deep learning approaches to radio galaxy classification in future.
△ Less
Submitted 24 January, 2022; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Quantifying the Poor Purity and Completeness of Morphological Samples Selected by Galaxy Colour
Authors:
Rebecca J. Smethurst,
Karen L. Masters,
Brooke D. Simmons,
Izzy L. Garland,
Tobias Géron,
Boris Häußler,
Sandor Kruk,
Chris J. Lintott,
David O'Ryan,
Mike Walmsley
Abstract:
The galaxy population is strongly bimodal in both colour and morphology, and the two measures correlate strongly, with most blue galaxies being late-types (spirals) and most early-types, typically ellipticals, being red. This observation has led to the use of colour as a convenient selection criteria to make samples which are then labelled by morphology. Such use of colour as a proxy for morpholog…
▽ More
The galaxy population is strongly bimodal in both colour and morphology, and the two measures correlate strongly, with most blue galaxies being late-types (spirals) and most early-types, typically ellipticals, being red. This observation has led to the use of colour as a convenient selection criteria to make samples which are then labelled by morphology. Such use of colour as a proxy for morphology results in necessarily impure and incomplete samples. In this paper, we make use of the morphological labels produced by Galaxy Zoo to measure how incomplete and impure such samples are, considering optical (ugriz), NUV and NIR (JHK) bands. The best single colour optical selection is found using a threshold of g-r = 0.742, but this still results in a sample where only 56% of red galaxies are smooth and 56% of smooth galaxies are red. Use of the NUV gives some improvement over purely optical bands, particularly for late-types, but still results in low purity/completeness for early-types. No significant improvement is found by adding NIR bands. With any two bands, including NUV, a sample of early-types with greater than two-thirds purity cannot be constructed. Advances in quantitative galaxy morphologies have made colour-morphology proxy selections largely unnecessary going forward; where such assumptions are still required, we recommend studies carefully consider the implications of sample incompleteness/impurity.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Practical Galaxy Morphology Tools from Deep Supervised Representation Learning
Authors:
Mike Walmsley,
Anna M. M. Scaife,
Chris Lintott,
Michelle Lochner,
Verlon Etsebeth,
Tobias Géron,
Hugh Dickinson,
Lucy Fortson,
Sandor Kruk,
Karen L. Masters,
Kameswara Bharadwaj Mantha,
Brooke D. Simmons
Abstract:
Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several rec…
▽ More
Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. "#diffuse"), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100% accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly-labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled datasets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code Zoobot. Zoobot is accessible to researchers with no prior experience in deep learning.
△ Less
Submitted 8 June, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies
Authors:
Mike Walmsley,
Chris Lintott,
Tobias Geron,
Sandor Kruk,
Coleman Krawczyk,
Kyle W. Willett,
Steven Bamford,
Lee S. Kelvin,
Lucy Fortson,
Yarin Gal,
William Keel,
Karen L. Masters,
Vihang Mehta,
Brooke D. Simmons,
Rebecca Smethurst,
Lewis Smith,
Elisabeth M. Baeten,
Christine Macmillan
Abstract:
We present Galaxy Zoo DECaLS: detailed visual morphological classifications for Dark Energy Camera Legacy Survey images of galaxies within the SDSS DR8 footprint. Deeper DECaLS images (r=23.6 vs. r=22.2 from SDSS) reveal spiral arms, weak bars, and tidal features not previously visible in SDSS imaging. To best exploit the greater depth of DECaLS images, volunteers select from a new set of answers…
▽ More
We present Galaxy Zoo DECaLS: detailed visual morphological classifications for Dark Energy Camera Legacy Survey images of galaxies within the SDSS DR8 footprint. Deeper DECaLS images (r=23.6 vs. r=22.2 from SDSS) reveal spiral arms, weak bars, and tidal features not previously visible in SDSS imaging. To best exploit the greater depth of DECaLS images, volunteers select from a new set of answers designed to improve our sensitivity to mergers and bars. Galaxy Zoo volunteers provide 7.5 million individual classifications over 314,000 galaxies. 140,000 galaxies receive at least 30 classifications, sufficient to accurately measure detailed morphology like bars, and the remainder receive approximately 5. All classifications are used to train an ensemble of Bayesian convolutional neural networks (a state-of-the-art deep learning method) to predict posteriors for the detailed morphology of all 314,000 galaxies. When measured against confident volunteer classifications, the networks are approximately 99% accurate on every question. Morphology is a fundamental feature of every galaxy; our human and machine classifications are an accurate and detailed resource for understanding how galaxies evolve.
△ Less
Submitted 3 January, 2022; v1 submitted 16 February, 2021;
originally announced February 2021.