-
A Machine Learning Pipeline for Molecular Property Prediction using ChemXploreML
Authors:
Aravindh Nivas Marimuthu,
Brett A. McGuire
Abstract:
We present ChemXploreML, a modular desktop application designed for machine learning-based molecular property prediction. The framework's flexible architecture allows integration of any molecular embedding technique with modern machine learning algorithms, enabling researchers to customize their prediction pipelines without extensive programming expertise. To demonstrate the framework's capabiliti…
▽ More
We present ChemXploreML, a modular desktop application designed for machine learning-based molecular property prediction. The framework's flexible architecture allows integration of any molecular embedding technique with modern machine learning algorithms, enabling researchers to customize their prediction pipelines without extensive programming expertise. To demonstrate the framework's capabilities, we implement and evaluate two molecular embedding approaches - Mol2Vec and VICGAE (Variance-Invariance-Covariance regularized GRU Auto-Encoder) - combined with state-of-the-art tree-based ensemble methods (Gradient Boosting Regression, XGBoost, CatBoost, and LightGBM). Using five fundamental molecular properties as test cases - melting point (MP), boiling point (BP), vapor pressure (VP), critical temperature (CT), and critical pressure (CP) - we validate our framework on a dataset from the CRC Handbook of Chemistry and Physics. The models achieve excellent performance for well-distributed properties, with R$^2$ values up to 0.93 for critical temperature predictions. Notably, while Mol2Vec embeddings (300 dimensions) delivered slightly higher accuracy, VICGAE embeddings (32 dimensions) exhibited comparable performance yet offered significantly improved computational efficiency. ChemXploreML's modular design facilitates easy integration of new embedding techniques and machine learning algorithms, providing a flexible platform for customized property prediction tasks. The application automates chemical data preprocessing (including UMAP-based exploration of molecular space), model optimization, and performance analysis through an intuitive interface, making sophisticated machine learning techniques accessible while maintaining extensibility for advanced cheminformatics users.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Laboratory rotational spectra of cyanocyclohexane and its siblings (1- and 4-cyanocyclohexene) using a compact CP-FTMW spectrometer for interstellar detection
Authors:
Gabi Wenzel,
Martin S. Holdren,
D. Archie Stewart,
Hannah Toru Shay,
Alex N. Byrne,
Ci Xue,
Brett A. McGuire
Abstract:
Chirped-pulse Fourier transform microwave (CP-FTMW) spectroscopy is a versatile technique to record broadband gas-phase rotational spectra, enabling detailed investigations of molecular structure, dynamics, and hyperfine interactions. Here, we present the development and application of a CP-FTMW spectrometer operating in the 6.5-18 GHz frequency range, studying cyanocyclohexane, 1-cyanocyclohexene…
▽ More
Chirped-pulse Fourier transform microwave (CP-FTMW) spectroscopy is a versatile technique to record broadband gas-phase rotational spectra, enabling detailed investigations of molecular structure, dynamics, and hyperfine interactions. Here, we present the development and application of a CP-FTMW spectrometer operating in the 6.5-18 GHz frequency range, studying cyanocyclohexane, 1-cyanocyclohexene, and 4-cyanocyclohexene using a heated pulsed supersonic expansion source. The dynamic range, experimental resolution, and high sensitivity enable observation of multiple conformers, precise measurements of hyperfine splitting arising from nuclear quadrupole coupling due to the nitrogen atom in the cyano group, as well as the observation of singly $^{13}$C- and $^{15}$N-substituted isotopic isomers in natural abundance. Using the latter, precise structures for the molecules are derived. The accurate rotational spectra enabled a search for these species toward the dark, cold molecular cloud TMC-1; no signals are found, and we discuss the implications of derived upper limits on the interstellar chemistry of the cyanocyclohexane family.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Discovery of the 7-ring PAH Cyanocoronene (C$_{24}$H$_{11}$CN) in GOTHAM Observations of TMC-1
Authors:
Gabi Wenzel,
Siyuan Gong,
Ci Xue,
P. Bryan Changala,
Martin S. Holdren,
Thomas H. Speak,
D. Archie Stewart,
Zachary T. P. Fried,
Reace H. J. Willis,
Edwin A. Bergin,
Andrew M. Burkhardt,
Alex N. Byrne,
Steven B. Charnley,
Andrew Lipnicky,
Ryan A. Loomis,
Christopher N. Shingledecker,
Ilsa R. Cooke,
Anthony J. Remijan,
Michael C. McCarthy,
Alison E. Wendlandt,
Brett A. McGuire
Abstract:
We present the synthesis and laboratory rotational spectroscopy of the 7-ring polycyclic aromatic hydrocarbon (PAH) cyanocoronene (C$_{24}$H$_{11}$CN) using a laser-ablation assisted cavity-enhanced Fourier transform microwave spectrometer. A total of 71 transitions were measured and assigned between 6.8--10.6\,GHz. Using these assignments, we searched for emission from cyanocoronene in the GBT Ob…
▽ More
We present the synthesis and laboratory rotational spectroscopy of the 7-ring polycyclic aromatic hydrocarbon (PAH) cyanocoronene (C$_{24}$H$_{11}$CN) using a laser-ablation assisted cavity-enhanced Fourier transform microwave spectrometer. A total of 71 transitions were measured and assigned between 6.8--10.6\,GHz. Using these assignments, we searched for emission from cyanocoronene in the GBT Observations of TMC-1: Hunting Aromatic Molecules (GOTHAM) project observations of the cold dark molecular cloud TMC-1 using the 100\,m Green Bank Telescope (GBT). We detect a number of individually resolved transitions in ultrasensitive X-band observations and perform a Markov Chain Monte Carlo analysis to derive best-fit parameters, including a total column density of $N(\mathrm{C}_{24}\mathrm{H}_{11}\mathrm{CN}) = 2.69^{+0.26}_{-0.23} \times 10^{12}\,\mathrm{cm}^{-2}$ at a temperature of $6.05^{+0.38}_{-0.37}\,$K. A spectral stacking and matched filtering analysis provides a robust 17.3$\,σ$ significance to the overall detection. The derived column density is comparable to that of cyano-substituted naphthalene, acenaphthylene, and pyrene, defying the trend of decreasing abundance with increasing molecular size and complexity found for carbon chains. We discuss the implications of the detection for our understanding of interstellar PAH chemistry and highlight major open questions and next steps.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
CoCCoA: Complex Chemistry in hot Cores with ALMA. The chemical evolution of acetone from ice to gas
Authors:
Y. Chen,
R. T. Garrod,
M. Rachid,
E. F. van Dishoeck,
C. L. Brogan,
R. Loomis,
A. Lipnicky,
B. A. McGuire
Abstract:
Acetone (CH3COCH3) is one of the most abundant three-carbon oxygen-bearing complex organic molecules (O-COMs) that have been detected in space. Recently, acetone ice has been reported as (tentatively) detected toward B1-c, which enables the gas-to-ice comparison of its abundances. The detection of acetone ice warrants a more systematic study of its gaseous abundances which is currently lacking. Th…
▽ More
Acetone (CH3COCH3) is one of the most abundant three-carbon oxygen-bearing complex organic molecules (O-COMs) that have been detected in space. Recently, acetone ice has been reported as (tentatively) detected toward B1-c, which enables the gas-to-ice comparison of its abundances. The detection of acetone ice warrants a more systematic study of its gaseous abundances which is currently lacking. Therefore, we conducted systematic measurements of acetone gas in a dozen hot cores observed by the CoCCoA survey and investigate the chemical evolution from ice to gas of acetone in protostellar systems. We fit the ALMA spectra to determined the column density, excitation temperature, and line width of acetone, along with propanal (C2H5CHO), ketene (CH2CO), and propyne (CH3CCH), which might be chemically linked with acetone. We found that the observed gas abundances of acetone are surprisingly high compared to those of two-carbon O-COMs, while aldehydes are overall less abundant than other O-COMs (e.g., alcohols, ethers, and esters). This may suggest specific formation or destruction mechanisms that favor the production of ethers, esters, and ketones over aldehydes. The derived physical properties suggest that acetone, propanal, and ketene have the same origin from hot cores as other O-COMs, while propyne tends to trace the more extended outflows. The acetone-to-methanol ratios are higher in ice than in gas by one order of magnitude, hinting at gas-phase reprocessing after sublimation. There are several suggested formation pathways of acetone (in both ice and gas) from acetaldehyde (CH3CHO), ketene, and propylene (C3H6). The observed ratios between acetone and the relevant species are rather constant across the sample, and can be well reproduced by astrochemical simulations, but more investigations are needed to draw solid conclusions.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Machine Learning of Interstellar Chemical Inventories
Authors:
Kin Long Kelvin Lee,
Jacqueline Patterson,
Andrew M. Burkhardt,
Vivek Vankayalapati,
Michael C. McCarthy,
Brett A. McGuire
Abstract:
The characterization of interstellar chemical inventories provides valuable insight into the chemical and physical processes in astrophysical sources. The discovery of new interstellar molecules becomes increasingly difficult as the number of viable species grows combinatorially, even when considering only the most thermodynamically stable. In this work, we present a novel approach for understandi…
▽ More
The characterization of interstellar chemical inventories provides valuable insight into the chemical and physical processes in astrophysical sources. The discovery of new interstellar molecules becomes increasingly difficult as the number of viable species grows combinatorially, even when considering only the most thermodynamically stable. In this work, we present a novel approach for understanding and modeling interstellar chemical inventories by combining methodologies from cheminformatics and machine learning. Using multidimensional vector representations of molecules obtained through unsupervised machine learning, we show that identification of candidates for astrochemical study can be achieved through quantitative measures of chemical similarity in this vector space, highlighting molecules that are most similar to those already known in the interstellar medium. Furthermore, we show that simple, supervised learning regressors are capable of reproducing the abundances of entire chemical inventories, and predict the abundance of not yet seen molecules. As a proof-of-concept, we have developed and applied this discovery pipeline to the chemical inventory of a well-known dark molecular cloud, the Taurus Molecular Cloud 1 (TMC-1); one of the most chemically rich regions of space known to date. In this paper, we discuss the implications and new insights machine learning explorations of chemical space can provide in astrochemistry.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Detection of Interstellar H$_2$CCCHC$_3$N
Authors:
C. N. Shingledecker,
K. L. K. Lee,
J. T. Wandishin,
N. Balucani,
A. M. Burkhardt,
S. B. Charnley,
R. Loomis,
M. Schreffler,
M. Siebert,
M. C. McCarthy,
B. A. McGuire
Abstract:
The chemical pathways linking the small organic molecules commonly observed in molecular clouds to the large, complex, polycyclic species long-suspected to be carriers of the ubiquitous unidentified infrared emission bands remain unclear. To investigate whether the formation of mono- and poly-cyclic molecules observed in cold cores could form via the bottom-up reaction of ubiquitous carbon-chain s…
▽ More
The chemical pathways linking the small organic molecules commonly observed in molecular clouds to the large, complex, polycyclic species long-suspected to be carriers of the ubiquitous unidentified infrared emission bands remain unclear. To investigate whether the formation of mono- and poly-cyclic molecules observed in cold cores could form via the bottom-up reaction of ubiquitous carbon-chain species with, e.g. atomic hydrogen, a search is made for possible intermediates in data taken as part of the GOTHAM (GBT Observations of TMC-1 Hunting for Aromatic Molecules) project. Markov-Chain Monte Carlo (MCMC) Source Models were run to obtain column densities and excitation temperatures. Astrochemical models were run to examine possible formation routes, including a novel grain-surface pathway involving the hydrogenation of C$_6$N and HC$_6$N, as well as purely gas-phase reactions between C$_3$N and both propyne (CH$_3$CCH) and allene (CH$_2$CCH$_2$), as well as via the reaction CN + H$_2$CCCHCCH. We report the first detection of cyanoacetyleneallene (H$_2$CCCHC$_3$N) in space toward the TMC-1 cold cloud using the Robert C. Byrd 100 m Green Bank Telescope (GBT). Cyanoacetyleneallene may represent an intermediate between less-saturated carbon-chains, such as the cyanopolyynes, that are characteristic of cold cores and the more recently-discovered cyclic species like cyanocyclopentadiene. Results from our models show that the gas-phase allene-based formation route in particular produces abundances of H$_2$CCCHC$_3$N that match the column density of $2\times10^{11}$ cm$^{-2}$ obtained from the MCMC Source Model, and that the grain-surface route yields large abundances on ices that could potentially be important as precursors for cyclic molecules.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Gas phase detection and rotational spectroscopy of ethynethiol, HCCSH
Authors:
Kin Long Kelvin Lee,
Marine-Aline Martin-Drumel,
Valerio Lattanzi,
Brett A. McGuire,
Paola Caselli,
Michael McCarthy
Abstract:
We report the gas-phase detection and spectroscopic characterization of ethynethiol ($\mathrm{HCCSH}$), a metastable isomer of thioketene ($\mathrm{H_2C_2S}$) using a combination of Fourier-transform microwave and submillimeter-wave spectroscopies. Several $a$-type transitions of the normal species were initially detected below 40 GHz using a supersonic expansion-electrical discharge source, and s…
▽ More
We report the gas-phase detection and spectroscopic characterization of ethynethiol ($\mathrm{HCCSH}$), a metastable isomer of thioketene ($\mathrm{H_2C_2S}$) using a combination of Fourier-transform microwave and submillimeter-wave spectroscopies. Several $a$-type transitions of the normal species were initially detected below 40 GHz using a supersonic expansion-electrical discharge source, and subsequent measurement of higher-frequency, $b$-type lines using double resonance provided accurate predictions in the submillimeter region. With these, searches using a millimeter-wave absorption spectrometer equipped with a radio frequency discharge source were conducted in the range 280 - 660 GHz, ultimately yielding nearly 100 transitions up to $^rR_0(36)$ and $^rQ_0(68)$. From the combined data set, all three rotational constants and centrifugal distortion terms up to the sextic order were determined to high accuracy, providing a reliable set of frequency predictions to the lower end of the THz band. Isotopic substitution has enabled both a determination of the molecular structure of $\mathrm{HCCSH}$ and, by inference, its formation pathway in our nozzle discharge source via the bimolecular radical-radical recombination reaction $\mathrm{SH + C_2H}$, which is calculated to be highly exothermic (-477 kJ/mol) using the HEAT345(Q) thermochemical scheme.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
arXiv:1811.06157
[pdf]
physics.atom-ph
astro-ph.IM
physics.app-ph
physics.chem-ph
physics.comp-ph
Perspectives on Astrophysics Based on Atomic, Molecular, and Optical (AMO) Techniques
Authors:
Daniel Wolf Savin,
James F. Babb,
Paul M. Bellan,
Crystal Brogan,
Jan Cami,
Paola Caselli,
Lia Corrales,
Gerardo Dominguez,
Steven R. Federman,
Chris J. Fontes,
Richard Freedman,
Brad Gibson,
Leon Golub,
Thomas W. Gorczyca,
Michael Hahn,
Sarah M. Hörst,
Reggie L. Hudson,
Jeffrey Kuhn,
James E. Lawler,
Maurice A. Leutenegger,
Joan P. Marler,
Michael C. McCarthy,
Brett A. McGuire,
Stefanie N. Milam,
Nicholas A. Murphy
, et al. (13 additional authors not shown)
Abstract:
About two generations ago, a large part of AMO science was dominated by experimental high energy collision studies and perturbative theoretical methods. Since then, AMO science has undergone a transition and is now dominated by quantum, ultracold, and ultrafast studies. But in the process, the field has passed over the complexity that lies between these two extremes. Most of the Universe resides i…
▽ More
About two generations ago, a large part of AMO science was dominated by experimental high energy collision studies and perturbative theoretical methods. Since then, AMO science has undergone a transition and is now dominated by quantum, ultracold, and ultrafast studies. But in the process, the field has passed over the complexity that lies between these two extremes. Most of the Universe resides in this intermediate region. We put forward that the next frontier for AMO science is to explore the AMO complexity that describes most of the Cosmos.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Vibrational Satellites of C$_2$S, C$_3$S, and C$_4$S: Microwave Spectral Taxonomy as a Stepping Stone to the Millimeter-Wave Band
Authors:
Brett A. McGuire,
Marie-Aline Martin-Drumel,
Kin Long Kelvin Lee,
John F. Stanton,
Carl A. Gottlieb,
Michael C. McCarthy
Abstract:
We present a microwave spectral taxonomy study of several hydrocarbon/CS$_2$ discharge mixtures in which more than 60 distinct chemical species, their more abundant isotopic species, and/or their vibrationally excited states were detected using chirped-pulse and cavity Fourier-transform microwave spectroscopies. Taken together, in excess of 85 unique variants were detected, including several new i…
▽ More
We present a microwave spectral taxonomy study of several hydrocarbon/CS$_2$ discharge mixtures in which more than 60 distinct chemical species, their more abundant isotopic species, and/or their vibrationally excited states were detected using chirped-pulse and cavity Fourier-transform microwave spectroscopies. Taken together, in excess of 85 unique variants were detected, including several new isotopic species and more than 25 new vibrationally excited states of C$_2$S, C$_3$S, and C$_4$S, which have been assigned on the basis of published vibration-rotation interaction constants for C$_3$S, or newly calculated ones for C$_2$S and C$_4$S. On the basis of these precise, low-frequency measurements, several vibrationally exited states of C$_2$S and C$_3$S were subsequently identified in archival millimeter-wave data in the 253--280 GHz frequency range, ultimately providing highly accurate catalogs for astronomical searches. As part of this work, formation pathways of the two smaller carbon-sulfur chains were investigated using $^{13}$C isotopic spectroscopy, as was their vibrational excitation. The present study illustrates the utility of microwave spectral taxonomy as a tool for complex mixture analysis, and as a powerful and convenient `stepping stone' to higher frequency measurements in the millimeter and submillimeter bands.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Molecular Polymorphism: Microwave Spectra, Equilibrium Structures, and an Astronomical Investigation of the HNCS Isomeric Family
Authors:
Brett A. McGuire,
Marie-Aline Martin-Drumel,
Sven Thorwirth,
Sandra Brünken,
Valerio Lattanzi,
Justin L. Neill,
Silvia Spezzano,
Zhenhong Yu,
Daniel P. Zaleski,
Anthony J. Remijan,
Brooks H. Pate,
Michael C. McCarthy
Abstract:
The rotational spectra of thioisocyanic acid (HNCS), and its three energetic isomers (HSCN, HCNS, and HSNC) have been observed at high spectral resolution by a combination of chirped-pulse and Fabry-Pérot Fourier-transform microwave spectroscopy between 6 and 40~GHz in a pulsed-jet discharge expansion. Two isomers, thiofulminic acid (HCNS) and isothiofulminic acid (HSNC), calculated here to be 35-…
▽ More
The rotational spectra of thioisocyanic acid (HNCS), and its three energetic isomers (HSCN, HCNS, and HSNC) have been observed at high spectral resolution by a combination of chirped-pulse and Fabry-Pérot Fourier-transform microwave spectroscopy between 6 and 40~GHz in a pulsed-jet discharge expansion. Two isomers, thiofulminic acid (HCNS) and isothiofulminic acid (HSNC), calculated here to be 35-37~kcal/mol less stable than the ground state isomer HNCS, have been detected for the first time. Precise rotational, centrifugal distortion, and nitrogen hyperfine coupling constants have been determined for the normal and rare isotopic species of both molecules; all are in good agreement with theoretical predictions obtained at the coupled cluster level of theory. On the basis of isotopic spectroscopy, precise molecular structures have been derived for all four isomers by correcting experimental rotational constants for the effects of rotation-vibration calculated theoretically. Formation and isomerization pathways have also been investigated; the high abundance of HSCN relative to ground state HNCS, and the detection of strong lines of SH using CH$_3$CN and H$_2$S, suggest that HSCN is preferentially produced by the radical-radical reaction HS + CN. A radio astronomical search for HSCN and its isomers has been undertaken toward the high-mass star-forming region Sgr B2(N) in the Galactic Center with the 100 m Green Bank Telescope. While we find clear evidence for HSCN, only a tentative detection of HNCS is proposed, and there is no indication of HCNS or HSNC at the same rms noise level. HSCN, and tentatively HNCS, displays clear deviations from a single-excitation temperature model, suggesting weak masing may be occurring in some transitions in this source.
△ Less
Submitted 13 July, 2016;
originally announced July 2016.