-
Analytical Swarm Chemistry: Characterization and Analysis of Emergent Swarm Behaviors
Authors:
Ricardo Vega,
Connor Mattson,
Kevin Zhu,
Daniel S. Brown,
Cameron Nowzari
Abstract:
Swarm robotics has potential for a wide variety of applications, but real-world deployments remain rare due to the difficulty of predicting emergent behaviors arising from simple local interactions. Traditional engineering approaches design controllers to achieve desired macroscopic outcomes under idealized conditions, while agent-based and artificial life studies explore emergent phenomena in a b…
▽ More
Swarm robotics has potential for a wide variety of applications, but real-world deployments remain rare due to the difficulty of predicting emergent behaviors arising from simple local interactions. Traditional engineering approaches design controllers to achieve desired macroscopic outcomes under idealized conditions, while agent-based and artificial life studies explore emergent phenomena in a bottom-up, exploratory manner. In this work, we introduce Analytical Swarm Chemistry, a framework that integrates concepts from engineering, agent-based and artificial life research, and chemistry. This framework combines macrostate definitions with phase diagram analysis to systematically explore how swarm parameters influence emergent behavior. Inspired by concepts from chemistry, the framework treats parameters like thermodynamic variables, enabling visualization of regions in parameter space that give rise to specific behaviors. Applying this framework to agents with minimally viable capabilities, we identify sufficient conditions for behaviors such as milling and diffusion and uncover regions of the parameter space that reliably produce these behaviors. Preliminary validation on real robots demonstrates that these regions correspond to observable behaviors in practice. By providing a principled, interpretable approach, this framework lays the groundwork for predictable and reliable emergent behavior in real-world swarm systems.
△ Less
Submitted 26 October, 2025;
originally announced October 2025.
-
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations
Authors:
Connor Mattson,
Varun Raveendra,
Ellen Novoseller,
Nicholas Waytowich,
Vernon J. Lawhern,
Daniel S. Brown
Abstract:
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In thi…
▽ More
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
△ Less
Submitted 20 October, 2025;
originally announced October 2025.
-
Directional Search for Persistent Gravitational Waves: Results from the First Part of LIGO-Virgo-KAGRA's Fourth Observing Run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1743 additional authors not shown)
Abstract:
The angular distribution of gravitational-wave power from persistent sources may exhibit anisotropies arising from the large-scale structure of the Universe. This motivates directional searches for astrophysical and cosmological gravitational-wave backgrounds, as well as continuous-wave emitters. We present results of such a search using data from the first observing run through the first portion…
▽ More
The angular distribution of gravitational-wave power from persistent sources may exhibit anisotropies arising from the large-scale structure of the Universe. This motivates directional searches for astrophysical and cosmological gravitational-wave backgrounds, as well as continuous-wave emitters. We present results of such a search using data from the first observing run through the first portion of the fourth observing run of the LIGO-Virgo-KAGRA Collaborations. We apply gravitational-wave radiometer techniques to generate skymaps and search for both narrowband and broadband persistent gravitational-wave sources. Additionally, we use spherical harmonic decomposition to probe spatially extended sources. No evidence of persistent gravitational-wave signals is found, and we set the most stringent constraints to date on such emissions. For narrowband point sources, our sensitivity estimate to effective strain amplitude lies in the range $(0.03 - 8.4) \times 10^{-24}$ across all sky and frequency range $(20 - 160)$ Hz. For targeted sources -- Scorpius X-1, SN 1987A, the Galactic Center, Terzan 5, and NGC 6397 -- we constrain the strain amplitude with best limits ranging from $\sim 1.1 \times 10^{-25}$ to $6.5 \times 10^{-24}$. For persistent broadband sources, we constrain the gravitational-wave flux $F_{α, \hat{n}}^{95\%, \mathrm{UL}}(25\, \mathrm{Hz}) < (0.008 - 5.5) \times 10^{-8}\, \mathrm{erg\, cm^{-2}\, s^{-1}\, Hz^{-1}}$, depending on the sky direction $\hat{n}$ and spectral index $α=0,\,2/3,\,3$. Finally, for extended sources, we place upper limits on the strain angular power spectrum $C_\ell^{1/2} < (0.63 - 17) \times 10^{-10} \,\mathrm{sr}^{-1}$.
△ Less
Submitted 20 October, 2025;
originally announced October 2025.
-
Resolving star spots on WASP-85 A using high-resolution transit spectroscopy
Authors:
Vedad Kunovac,
Heather Cegla,
Hritam Chakraborty,
Cis Lagae,
David J. A. Brown,
Alix Freckelton,
Samuel Gill,
Mercedes López-Morales,
James McCormac,
Annelies Mortier,
Mathilde Timmermans,
Thomas G. Wilson,
Romain Allart,
Edward M. Bryant,
Matthew R. Burleigh,
Lauren Doyle,
Edward Gillen,
James S. Jenkins,
Marina Lafarga,
Monika Lendl,
Mahmoud Oshagh,
Vatsal Panwar,
Peter P. Pedersen,
Amaury Triaud,
Richard G. West
, et al. (1 additional authors not shown)
Abstract:
Stellar surface inhomogeneities such as spots and faculae introduce Doppler variations that challenge exoplanet detection via the radial velocity method. While their impact on disc-integrated spectra is well established, detailed studies of the underlying local line profiles have so far been limited to the Sun. We present an observational campaign targeting the active star WASP-85 A during transit…
▽ More
Stellar surface inhomogeneities such as spots and faculae introduce Doppler variations that challenge exoplanet detection via the radial velocity method. While their impact on disc-integrated spectra is well established, detailed studies of the underlying local line profiles have so far been limited to the Sun. We present an observational campaign targeting the active star WASP-85 A during transits of its hot Jupiter companion. The transits span two stellar rotation periods, allowing us to probe the evolution of active regions. From ground-based photometry we identify seven active regions, six containing dark spots. Using simultaneous ESPRESSO transit spectroscopy, we spatially resolve these regions on the stellar surface by using the planet as a probe. We detect significant bisector shape changes, line broadening, and net redshifts during spot occultations, with velocity shifts of 108-333 m/s (mean uncertainty 50 m/s). The observed broadening is consistent with the Zeeman effect, implying magnetic field strengths (Stokes $I$) $B$ = 2.7-4.4 kG (mean uncertainty 0.6 kG), comparable to solar umbrae. Combined with our photometric spot model, this yields lower limits to the disc-integrated field $Bf = 16 \pm 3$ G and $61 \pm 9$ G for the two hemispheres probed -- at least three times higher than Sun-as-a-star values. We also measure centre-to-limb variations in FWHM, line depth, equivalent width, and convective blueshift, which broadly agree with solar observations and 3D MHD models. This work demonstrates a new way to characterise the surfaces of exoplanet host stars, paving the way for future analyses incorporating synthetic line profiles from 3D MHD simulations.
△ Less
Submitted 19 October, 2025;
originally announced October 2025.
-
Autonomous Soft Robotic Guidewire Navigation via Imitation Learning
Authors:
Noah Barnes,
Ji Woong Kim,
Lingyun Di,
Hannah Qu,
Anuruddha Bhattacharjee,
Miroslaw Janowski,
Dheeraj Gandhi,
Bailey Felix,
Shaopeng Jiang,
Olivia Young,
Mark Fuge,
Ryan D. Sochol,
Jeremy D. Brown,
Axel Krieger
Abstract:
In endovascular surgery, endovascular interventionists push a thin tube called a catheter, guided by a thin wire to a treatment site inside the patient's blood vessels to treat various conditions such as blood clots, aneurysms, and malformations. Guidewires with robotic tips can enhance maneuverability, but they present challenges in modeling and control. Automation of soft robotic guidewire navig…
▽ More
In endovascular surgery, endovascular interventionists push a thin tube called a catheter, guided by a thin wire to a treatment site inside the patient's blood vessels to treat various conditions such as blood clots, aneurysms, and malformations. Guidewires with robotic tips can enhance maneuverability, but they present challenges in modeling and control. Automation of soft robotic guidewire navigation has the potential to overcome these challenges, increasing the precision and safety of endovascular navigation. In other surgical domains, end-to-end imitation learning has shown promising results. Thus, we develop a transformer-based imitation learning framework with goal conditioning, relative action outputs, and automatic contrast dye injections to enable generalizable soft robotic guidewire navigation in an aneurysm targeting task. We train the model on 36 different modular bifurcated geometries, generating 647 total demonstrations under simulated fluoroscopy, and evaluate it on three previously unseen vascular geometries. The model can autonomously drive the tip of the robot to the aneurysm location with a success rate of 83% on the unseen geometries, outperforming several baselines. In addition, we present ablation and baseline studies to evaluate the effectiveness of each design and data collection choice. Project website: https://softrobotnavigation.github.io/
△ Less
Submitted 10 October, 2025;
originally announced October 2025.
-
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
Authors:
Sagnik Anupam,
Davis Brown,
Shuo Li,
Eric Wong,
Hamed Hassani,
Osbert Bastani
Abstract:
LLM web agents now browse and take actions on the open web, yet current agent evaluations are constrained to sandboxed environments or artificial tasks. We introduce BrowserArena, a live open-web agent evaluation platform that collects user-submitted tasks, runs Arena-style head-to-head comparisons, and uses step-level human feedback to surface failure modes. Collecting and analyzing step-level an…
▽ More
LLM web agents now browse and take actions on the open web, yet current agent evaluations are constrained to sandboxed environments or artificial tasks. We introduce BrowserArena, a live open-web agent evaluation platform that collects user-submitted tasks, runs Arena-style head-to-head comparisons, and uses step-level human feedback to surface failure modes. Collecting and analyzing step-level annotations on the agent traces, we identify three consistent failure modes: captcha resolution, pop-up banner removal, and direct navigation to URLs. By constructing targeted datasets to further study these tasks, we discover variations in how different language models navigate these failure modes. We find, for example, that o4-mini deploys a wider variety of strategies to circumvent captcha resolution than other models and DeepSeek-R1 consistently misleads users about pop-up banner closure. Our findings surface both the diversity and brittleness of current web agents. More broadly, our benchmarking methodology provides an approach to evaluating and understanding web agent failure modes at scale.
△ Less
Submitted 7 October, 2025; v1 submitted 2 October, 2025;
originally announced October 2025.
-
Infrared Synchrotron Emission in the Soft State of GX 339-4 and the Mid-Infrared/X-ray Luminosity Plane of Black Hole X-ray Binaries
Authors:
P. Gandhi,
D. M. Russell,
M. C. Baglio,
Y. Bhargava,
R. Duncan,
A. Gúrpide,
C. O. Heinke,
C. Knigge,
K. S. Long,
T. J. Maccarone,
G. Mastroserio,
T. D. Russell,
A. W. Shaw,
A. J. Tetarenko,
F. M. Vincentelli,
E. S. Borowski,
D. A. H. Buckley,
P. Casella,
C. Dashwood Brown,
G. C. Dewangan,
R. I. Hynes,
S. Markoff,
J. A. Tomsick,
K. Alabarta,
F. Carotenuto
, et al. (11 additional authors not shown)
Abstract:
Progress in understanding the growth of accreting black holes remains hampered by a lack of sensitive coordinated multiwavelength observations. In particular, the mid-infrared (MIR) regime remains ill-explored except for jet-dominant states. Here, we present comprehensive follow-up of the black hole X-ray binary GX 339-4 during a disc-dominated state in its 2023/24 outburst as part of a multi-wave…
▽ More
Progress in understanding the growth of accreting black holes remains hampered by a lack of sensitive coordinated multiwavelength observations. In particular, the mid-infrared (MIR) regime remains ill-explored except for jet-dominant states. Here, we present comprehensive follow-up of the black hole X-ray binary GX 339-4 during a disc-dominated state in its 2023/24 outburst as part of a multi-wavelength campaign coordinated around JWST/MIRI. The X-ray properties are fairly typical of soft accretion states, with a high-energy Comptonised tail. The source is significantly detected between 5-10$μ$m, albeit at a faint flux level requiring MIR compact jet emission to be quenched by a factor of $\sim$300 or more relative to previous hard-state detections. The MIRI spectrum can be described as a simple power-law with slope $α$ = +0.39$\pm$0.07 ($F_ν$ $\propto$ $ν^α$), but surprisingly matches neither the radio/sub-mm nor the optical broadband slopes. Significant MIR stochastic variability is detected. Synchrotron radiation from the same medium responsible for high-energy Comptonisation can self-consistently account for the observed MIRI spectral-timing behaviour, offering new constraints on the physical conditions in the soft-state accretion disc atmosphere/corona. Alternative explanations, including a circumbinary disc or emission from a warm wind, fail to cleanly explain either the spectral properties or the variability. Multiwavelength timing cross-correlations show a puzzlingly long MIR lag relative to the optical, though at limited significance. We compile archival MIR and X-ray luminosities of transient black hole systems, including previously unreported detections of GX 339-4. These trace the evolution of the MIR-to-X-ray flux ratio with accretion state, and also reveal high MIR luminosities for GX 339-4 across all states. (abridged)
△ Less
Submitted 1 October, 2025;
originally announced October 2025.
-
Localized Uncertainty Quantification in Random Forests via Proximities
Authors:
Jake S. Rhodes,
Scott D. Brown,
J. Riley Wilkinson
Abstract:
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regres…
▽ More
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods.
△ Less
Submitted 26 September, 2025;
originally announced September 2025.
-
Towards Autonomous Robotic Electrosurgery via Thermal Imaging
Authors:
Naveed D. Riaziat,
Joseph Chen,
Axel Krieger,
Jeremy D. Brown
Abstract:
Electrosurgery is a surgical technique that can improve tissue cutting by reducing cutting force and bleeding. However, electrosurgery adds a risk of thermal injury to surrounding tissue. Expert surgeons estimate desirable cutting velocities based on experience but have no quantifiable reference to indicate if a particular velocity is optimal. Furthermore, prior demonstrations of autonomous electr…
▽ More
Electrosurgery is a surgical technique that can improve tissue cutting by reducing cutting force and bleeding. However, electrosurgery adds a risk of thermal injury to surrounding tissue. Expert surgeons estimate desirable cutting velocities based on experience but have no quantifiable reference to indicate if a particular velocity is optimal. Furthermore, prior demonstrations of autonomous electrosurgery have primarily used constant tool velocity, which is not robust to changes in electrosurgical tissue characteristics, power settings, or tool type. Thermal imaging feedback provides information that can be used to reduce thermal injury while balancing cutting force by controlling tool velocity. We introduce Thermography for Electrosurgical Rate Modulation via Optimization (ThERMO) to autonomously reduce thermal injury while balancing cutting force by intelligently controlling tool velocity. We demonstrate ThERMO in tissue phantoms and compare its performance to the constant velocity approach. Overall, ThERMO improves cut success rate by a factor of three and can reduce peak cutting force by a factor of two. ThERMO responds to varying environmental disturbances, reduces damage to tissue, and completes cutting tasks that would otherwise result in catastrophic failure for the constant velocity approach.
△ Less
Submitted 23 September, 2025;
originally announced September 2025.
-
Further evidence for natal kick segregation by spectral type in high-mass X-ray binaries
Authors:
Pornisara Nuchvanichakul,
Poshak Gandhi,
Christian Knigge,
Yue Zhao,
Puji Irawati,
Suwicha Wanawichian,
Cordelia Dashwood Brown
Abstract:
High-mass X-ray binaries (HMXBs) are systems in which a neutron star or black hole accretes material from a massive companion. HMXBs are expected to have experienced a supernova in their evolution. The impulsive kick associated with this event should affect the space velocity of the system in a way that depends on the nature and state of the progenitor binary. Here, we test whether the different e…
▽ More
High-mass X-ray binaries (HMXBs) are systems in which a neutron star or black hole accretes material from a massive companion. HMXBs are expected to have experienced a supernova in their evolution. The impulsive kick associated with this event should affect the space velocity of the system in a way that depends on the nature and state of the progenitor binary. Here, we test whether the different evolutionary histories of HMXBs have left a detectable imprint on their peculiar velocities ($V_{\rm pec}$). Using data from Gaia Data Release 3 (Gaia DR3), we first calculate the $V_{\rm pec}$ values for 63 well-known HMXBs hosting a black hole or neutron star and estimate the associated uncertainties via Monte Carlo re-sampling. We then analyse their distribution and check for differences between classes. Overall, $V_{\rm pec}$ estimates extend up to 100 km s$^{-1}$, but with Be/X-ray binaries (BeXRBs) favouring $V_{\rm pec}$ $\lesssim 40$ km s$^{-1}$ and supergiant X-ray binaries (SgXRBs) favouring $V_{\rm pec}$ $\gtrsim 40$ km s$^{-1}$. Based on a Kolmogorov-Smirnov (K-S) test, the null hypothesis that the peculiar velocities of both classes are drawn from the same parent distribution can be robustly rejected, irrespective of the background stellar velocity dispersion. Tests with binary population synthesis demonstrate that SgXRBs typically have shorter orbital periods and higher fractional mass loss than BeXRBs at supernova. We argue that the magnitude of $V_{\rm pec}$ could be used as a complementary feature to distinguish between Be and supergiant systems. These findings extend previous inferences based on two-dimensional kinematics from Hipparcos, and may be explained by the differing nature of the respective progenitors systems between the source classes at the instant of supernova.
△ Less
Submitted 12 September, 2025;
originally announced September 2025.
-
Examining Vision Language Models through Multi-dimensional Experiments with Vision and Text Features
Authors:
Saurav Sengupta,
Nazanin Moradinasab,
Jiebei Liu,
Donald E. Brown
Abstract:
Recent research on Vision Language Models (VLMs) suggests that they rely on inherent biases learned during training to respond to questions about visual properties of an image. These biases are exacerbated when VLMs are asked highly specific questions that require focusing on specific areas of the image. For example, a VLM tasked with counting stars on a modified American flag (e.g., with more tha…
▽ More
Recent research on Vision Language Models (VLMs) suggests that they rely on inherent biases learned during training to respond to questions about visual properties of an image. These biases are exacerbated when VLMs are asked highly specific questions that require focusing on specific areas of the image. For example, a VLM tasked with counting stars on a modified American flag (e.g., with more than 50 stars) will often disregard the visual evidence and fail to answer accurately. We build upon this research and develop a multi-dimensional examination framework to systematically determine which characteristics of the input data, including both the image and the accompanying prompt, lead to such differences in performance. Using open-source VLMs, we further examine how attention values fluctuate with varying input parameters (e.g., image size, number of objects in the image, background color, prompt specificity). This research aims to learn how the behavior of vision language models changes and to explore methods for characterizing such changes. Our results suggest, among other things, that even minor modifications in image characteristics and prompt specificity can lead to large changes in how a VLM formulates its answer and, subsequently, its overall performance.
△ Less
Submitted 9 September, 2025;
originally announced September 2025.
-
GW250114: testing Hawking's area law and the Kerr nature of black holes
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1763 additional authors not shown)
Abstract:
The gravitational-wave signal GW250114 was observed by the two LIGO detectors with a network matched-filter signal-to-noise ratio of 80. The signal was emitted by the coalescence of two black holes with near-equal masses $m_1 = 33.6^{+1.2}_{-0.8}\,M_\odot$ and $m_2 = 32.2^{+0.8}_{-1.3}\,M_\odot$, and small spins $χ_{1,2} \leq 0.26$ (90% credibility) and negligible eccentricity $e \leq 0.03$. Post-…
▽ More
The gravitational-wave signal GW250114 was observed by the two LIGO detectors with a network matched-filter signal-to-noise ratio of 80. The signal was emitted by the coalescence of two black holes with near-equal masses $m_1 = 33.6^{+1.2}_{-0.8}\,M_\odot$ and $m_2 = 32.2^{+0.8}_{-1.3}\,M_\odot$, and small spins $χ_{1,2} \leq 0.26$ (90% credibility) and negligible eccentricity $e \leq 0.03$. Post-merger data excluding the peak region are consistent with the dominant quadrupolar $(\ell = |m| = 2)$ mode of a Kerr black hole and its first overtone. We constrain the modes' frequencies to $\pm 30\%$ of the Kerr spectrum, providing a test of the remnant's Kerr nature. We also examine Hawking's area law, also known as the second law of black hole mechanics, which states that the total area of the black hole event horizons cannot decrease with time. A range of analyses that exclude up to 5 of the strongest merger cycles confirm that the remnant area is larger than the sum of the initial areas to high credibility.
△ Less
Submitted 9 September, 2025;
originally announced September 2025.
-
Directed searches for gravitational waves from ultralight vector boson clouds around merger remnant and galactic black holes during the first part of the fourth LIGO-Virgo-KAGRA observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1747 additional authors not shown)
Abstract:
We present the first directed searches for long-transient and continuous gravitational waves from ultralight vector boson clouds around known black holes (BHs). We use LIGO data from the first part of the fourth LIGO-Virgo-KAGRA observing run. The searches target two distinct types of BHs and use two new semicoherent methods: hidden Markov model (HMM) tracking for the remnant BHs of the mergers GW…
▽ More
We present the first directed searches for long-transient and continuous gravitational waves from ultralight vector boson clouds around known black holes (BHs). We use LIGO data from the first part of the fourth LIGO-Virgo-KAGRA observing run. The searches target two distinct types of BHs and use two new semicoherent methods: hidden Markov model (HMM) tracking for the remnant BHs of the mergers GW230814_230901 and GW231123_135430 (referred to as GW230814 and GW231123 in this study), and a dedicated method using the Band Sampled Data (BSD) framework for the galactic BH in the Cygnus X-1 binary system. Without finding evidence of a signal from vector bosons in the data, we estimate the mass range that can be constrained. For the HMM searches targeting the remnants from GW231123 and GW230814, we disfavor vector boson masses in the ranges $[0.94, 1.08]$ and $[2.75, 3.28] \times 10^{-13}$ eV, respectively, at 30% confidence, assuming a 1% false alarm probability. Although these searches are only marginally sensitive to signals from merger remnants at relatively large distances, future observations are expected to yield more stringent constraints with high confidence. For the BSD search targeting the BH in Cygnus X-1, we exclude vector boson masses in the range $[0.85, 1.59] \times 10^{-13}$ eV at 95% confidence, assuming an initial BH spin larger than 0.5.
△ Less
Submitted 14 September, 2025; v1 submitted 8 September, 2025;
originally announced September 2025.
-
GWTC-4.0: Constraints on the Cosmic Expansion Rate and Modified Gravitational-wave Propagation
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1750 additional authors not shown)
Abstract:
We analyze data from 142 of the 218 gravitational-wave (GW) sources in the fourth LIGO-Virgo-KAGRA Collaboration (LVK) Gravitational-Wave Transient Catalog (GWTC-4.0) to estimate the Hubble constant $H_0$ jointly with the population properties of merging compact binaries. We measure the luminosity distance and redshifted masses of GW sources directly; in contrast, we infer GW source redshifts stat…
▽ More
We analyze data from 142 of the 218 gravitational-wave (GW) sources in the fourth LIGO-Virgo-KAGRA Collaboration (LVK) Gravitational-Wave Transient Catalog (GWTC-4.0) to estimate the Hubble constant $H_0$ jointly with the population properties of merging compact binaries. We measure the luminosity distance and redshifted masses of GW sources directly; in contrast, we infer GW source redshifts statistically through i) location of features in the compact object mass spectrum and merger rate evolution, and ii) identifying potential host galaxies in the GW localization volume. Probing the relationship between source luminosity distances and redshifts obtained in this way yields constraints on cosmological parameters. We also constrain parameterized deviations from general relativity which affect GW propagation, specifically those modifying the dependence of a GW signal on the source luminosity distance. Assuming our fiducial model for the source-frame mass distribution and using GW candidates detected up to the end of the fourth observing run (O4a), together with the GLADE+ all-sky galaxy catalog, we estimate $H_0 = 76.6^{+13.0}_{-9.5} (76.6^{+25.2}_{-14.0})$ km s$^{-1}$ Mpc$^{-1}$. This value is reported as a median with 68.3% (90%) symmetric credible interval, and includes combination with the $H_0$ measurement from GW170817 and its electromagnetic counterpart. Using a parametrization of modified GW propagation in terms of the magnitude parameter $Ξ_0$, we estimate $Ξ_0 = 1.2^{+0.8}_{-0.4} (1.2^{+2.4}_{-0.5})$, where $Ξ_0 = 1$ recovers the behavior of general relativity.
△ Less
Submitted 7 October, 2025; v1 submitted 4 September, 2025;
originally announced September 2025.
-
A matter of perspective: how nanoscale optical defects limit cosmic-scale gravitational wave observations
Authors:
Anna C. Green,
Antonella Bianchi,
Daniel D. Brown,
Felice Feldmann,
Jeremie Gobeil,
Miron van der Kolk,
Riccardo Maggiore,
Jonathan W. Perry,
Emma Prins,
Mischa Salle,
Alina Soflau,
Enzo Tapia,
Andreas Freise
Abstract:
Ground-based gravitational-wave (GW) detectors, such as LIGO, Virgo, and KAGRA, have revolutionised as- tronomy. Now, future detectors like the Einstein Telescope and Cosmic Explorer aim to achieve even greater sensitivity. Advanced optical simulations are crucial to overcoming the challenges faced by these complex in- terferometers. Finesse, the leading interferometer simulation tool in the GW co…
▽ More
Ground-based gravitational-wave (GW) detectors, such as LIGO, Virgo, and KAGRA, have revolutionised as- tronomy. Now, future detectors like the Einstein Telescope and Cosmic Explorer aim to achieve even greater sensitivity. Advanced optical simulations are crucial to overcoming the challenges faced by these complex in- terferometers. Finesse, the leading interferometer simulation tool in the GW community, supports the design and commissioning of these detectors by modeling optical, quantum, and mechanical effects. A key focus is understanding optical defects that distort the shape of the laser light and limit detector performance. This work explores how nanoscale defects affect GW observations and presents recent advancements in modeling their effects to guide the development of next-generation detector optics.
△ Less
Submitted 1 September, 2025;
originally announced September 2025.
-
Upper Limits on the Isotropic Gravitational-Wave Background from the first part of LIGO, Virgo, and KAGRA's fourth Observing Run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1751 additional authors not shown)
Abstract:
We present results from the search for an isotropic gravitational-wave background using Advanced LIGO and Advanced Virgo data from O1 through O4a, the first part of the fourth observing run. This background is the accumulated signal from unresolved sources throughout cosmic history and encodes information about the merger history of compact binaries throughout the Universe, as well as exotic physi…
▽ More
We present results from the search for an isotropic gravitational-wave background using Advanced LIGO and Advanced Virgo data from O1 through O4a, the first part of the fourth observing run. This background is the accumulated signal from unresolved sources throughout cosmic history and encodes information about the merger history of compact binaries throughout the Universe, as well as exotic physics and potentially primordial processes from the early cosmos. Our cross-correlation analysis reveals no statistically significant background signal, enabling us to constrain several theoretical scenarios. For compact binary coalescences which approximately follow a 2/3 power-law spectrum, we constrain the fractional energy density to $Ω_{\rm GW}(25{\rm Hz})\leq 2.0\times 10^{-9}$ (95% cred.), a factor of 1.7 improvement over previous results. Scale-invariant backgrounds are constrained to $Ω_{\rm GW}(25{\rm Hz})\leq 2.8\times 10^{-9}$, representing a 2.1x sensitivity gain. We also place new limits on gravity theories predicting non-standard polarization modes and confirm that terrestrial magnetic noise sources remain below detection threshold. Combining these spectral limits with population models for GWTC-4, the latest gravitational-wave event catalog, we find our constraints remain above predicted merger backgrounds but are approaching detectability. The joint analysis combining the background limits shown here with the GWTC-4 catalog enables improved inference of the binary black hole merger rate evolution across cosmic time. Employing GWTC-4 inference results and standard modeling choices, we estimate that the total background arising from compact binary coalescences is $Ω_{\rm CBC}(25{\rm Hz})={0.9^{+1.1}_{-0.5}\times 10^{-9}}$ at 90% confidence, where the largest contribution is due to binary black holes only, $Ω_{\rm BBH}(25{\rm Hz})=0.8^{+1.1}_{-0.5}\times 10^{-9}$.
△ Less
Submitted 28 August, 2025;
originally announced August 2025.
-
GWTC-4.0: Population Properties of Merging Compact Binaries
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
S. Ahmadzadeh,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi
, et al. (1783 additional authors not shown)
Abstract:
We detail the population properties of merging compact objects using 158 mergers from the cumulative Gravitational-Wave Transient Catalog 4.0, which includes three types of binary mergers: binary neutron star, neutron star--black hole binary, and binary black hole mergers. We resolve multiple over- and under-densities in the black hole mass distribution: features persist at primary masses of…
▽ More
We detail the population properties of merging compact objects using 158 mergers from the cumulative Gravitational-Wave Transient Catalog 4.0, which includes three types of binary mergers: binary neutron star, neutron star--black hole binary, and binary black hole mergers. We resolve multiple over- and under-densities in the black hole mass distribution: features persist at primary masses of $10\,M_\odot$ and $35\,M_\odot$ with a possible third feature at $\sim 20\,M_\odot$. These are departures from an otherwise power-law-like continuum that steepens above $35\,M_\odot$. Binary black holes with primary masses near $10\,M_\odot$ are more likely to have less massive secondaries, with a mass ratio distribution peaking at $q = 0.74^{+0.13}_{-0.13}$, potentially a signature of stable mass transfer during binary evolution. Black hole spins are inferred to be non-extremal, with 90\% of black holes having $χ< 0.57$, and preferentially aligned with binary orbits, implying many merging binaries form in isolation. However, we find a significant fraction, 0.24-0.42, of binaries have negative effective inspiral spins, suggesting many could be formed dynamically in gas-free environments. We find evidence for correlation between effective inspiral spin and mass ratio, though it is unclear if this is driven by variation in the mode of the distribution or the width. (Abridged)
△ Less
Submitted 17 September, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
GWTC-4.0: Updating the Gravitational-Wave Transient Catalog with Observations from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1748 additional authors not shown)
Abstract:
Version 4.0 of the Gravitational-Wave Transient Catalog (GWTC-4.0) adds new candidates detected by the LIGO, Virgo, and KAGRA observatories through the first part of the fourth observing run (O4a: 2023 May 24 15:00:00 to 2024 January 16 16:00:00 UTC) and a preceding engineering run. In this new data, we find 128 new compact binary coalescence candidates that are identified by at least one of our s…
▽ More
Version 4.0 of the Gravitational-Wave Transient Catalog (GWTC-4.0) adds new candidates detected by the LIGO, Virgo, and KAGRA observatories through the first part of the fourth observing run (O4a: 2023 May 24 15:00:00 to 2024 January 16 16:00:00 UTC) and a preceding engineering run. In this new data, we find 128 new compact binary coalescence candidates that are identified by at least one of our search algorithms with a probability of astrophysical origin $p_{\rm astro} \geq 0.5$ and that are not vetoed during event validation. We also provide detailed source property measurements for 86 of these that have a false alarm rate $< 1 \rm{yr}^{-1}$. Based on the inferred component masses, these new candidates are consistent with signals from binary black holes and neutron star-black hole binaries (GW230518_125908 and GW230529_181500). Median inferred component masses of binary black holes in the catalog now range from $5.79\,M_\odot$ (GW230627_015337) to $137\,M_\odot$ (GW231123_135430), while GW231123_135430 was probably produced by the most massive binary observed in the catalog. For the first time we have discovered binary black hole signals with network signal-to-noise ratio exceeding 30, GW230814_230901 and GW231226_01520, enabling high-fidelity studies of the waveforms and astrophysical properties of these systems. Combined with the 90 candidates included in GWTC-3.0, the catalog now contains 218 candidates with $p_{\rm astro} \geq 0.5$ and not otherwise vetoed, doubling the size of the catalog and further opening our view of the gravitational-wave Universe.
△ Less
Submitted 8 September, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
GWTC-4.0: Methods for Identifying and Characterizing Gravitational-wave Transients
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
S. Ahmadzadeh,
L. Aiello,
A. Ain,
P. Ajith,
S. Akcay,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi
, et al. (1787 additional authors not shown)
Abstract:
The Gravitational-Wave Transient Catalog (GWTC) is a collection of candidate gravitational-wave transient signals identified and characterized by the LIGO-Virgo-KAGRA Collaboration. Producing the contents of the GWTC from detector data requires complex analysis methods. These comprise techniques to model the signal; identify the transients in the data; evaluate the quality of the data and mitigate…
▽ More
The Gravitational-Wave Transient Catalog (GWTC) is a collection of candidate gravitational-wave transient signals identified and characterized by the LIGO-Virgo-KAGRA Collaboration. Producing the contents of the GWTC from detector data requires complex analysis methods. These comprise techniques to model the signal; identify the transients in the data; evaluate the quality of the data and mitigate possible instrumental issues; infer the parameters of each transient; compare the data with the waveform models for compact binary coalescences; and handle the large amount of results associated with all these different analyses. In this paper, we describe the methods employed to produce the catalog's fourth release, GWTC-4.0, focusing on the analysis of the first part of the fourth observing run of Advanced LIGO, Advanced Virgo and KAGRA.
△ Less
Submitted 25 August, 2025;
originally announced August 2025.
-
GWTC-4.0: An Introduction to Version 4.0 of the Gravitational-Wave Transient Catalog
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
S. Ahmadzadeh,
L. Aiello,
A. Ain,
P. Ajith,
S. Akcay,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi
, et al. (1786 additional authors not shown)
Abstract:
The Gravitational-Wave Transient Catalog (GWTC) is a collection of short-duration (transient) gravitational wave signals identified by the LIGO-Virgo-KAGRA Collaboration in gravitational-wave data produced by the eponymous detectors. The catalog provides information about the identified candidates, such as the arrival time and amplitude of the signal and properties of the signal's source as inferr…
▽ More
The Gravitational-Wave Transient Catalog (GWTC) is a collection of short-duration (transient) gravitational wave signals identified by the LIGO-Virgo-KAGRA Collaboration in gravitational-wave data produced by the eponymous detectors. The catalog provides information about the identified candidates, such as the arrival time and amplitude of the signal and properties of the signal's source as inferred from the observational data. GWTC is the data release of this dataset and version 4.0 extends the catalog to include observations made during the first part of the fourth LIGO-Virgo-KAGRA observing run up until 2024 January 31. This paper marks an introduction to a collection of articles related to this version of the catalog, GWTC-4.0. The collection of articles accompanying the catalog provides documentation of the methods used to analyze the data, summaries of the catalog of events, observational measurements drawn from the population, and detailed discussions of selected candidates
△ Less
Submitted 23 September, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
Open Data from LIGO, Virgo, and KAGRA through the First Part of the Fourth Observing Run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1746 additional authors not shown)
Abstract:
LIGO, Virgo, and KAGRA form a network of gravitational-wave observatories. Data and analysis results from this network are made publicly available through the Gravitational Wave Open Science Center. This paper describes open data from this network, including the addition of data from the first part of the fourth observing run (O4a) and selected periods from the preceding engineering run, collected…
▽ More
LIGO, Virgo, and KAGRA form a network of gravitational-wave observatories. Data and analysis results from this network are made publicly available through the Gravitational Wave Open Science Center. This paper describes open data from this network, including the addition of data from the first part of the fourth observing run (O4a) and selected periods from the preceding engineering run, collected from May 2023 to January 2024. The public data set includes calibrated strain time series for each instrument, data from additional channels used for noise subtraction and detector characterization, and analysis data products from version 4.0 of the Gravitational-Wave Transient Catalog.
△ Less
Submitted 3 September, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
Large Language Models as Visualization Agents for Immersive Binary Reverse Engineering
Authors:
Dennis Brown,
Samuel Mulder
Abstract:
Immersive virtual reality (VR) offers affordances that may reduce cognitive complexity in binary reverse engineering (RE), enabling embodied and external cognition to augment the RE process through enhancing memory, hypothesis testing, and visual organization. In prior work, we applied a cognitive systems engineering approach to identify an initial set of affordances and implemented a VR environme…
▽ More
Immersive virtual reality (VR) offers affordances that may reduce cognitive complexity in binary reverse engineering (RE), enabling embodied and external cognition to augment the RE process through enhancing memory, hypothesis testing, and visual organization. In prior work, we applied a cognitive systems engineering approach to identify an initial set of affordances and implemented a VR environment to support RE through spatial persistence and interactivity. In this work, we extend that platform with an integrated large language model (LLM) agent capable of querying binary analysis tools, answering technical questions, and dynamically generating immersive 3D visualizations in alignment with analyst tasks. We describe the system architecture and our evaluation process and results. Our pilot study shows that while LLMs can generate meaningful 3D call graphs (for small programs) that align with design principles, output quality varies widely. This work raises open questions about the potential for LLMs to function as visualization agents, constructing 3D representations that reflect cognitive design principles without explicit training.
△ Less
Submitted 18 August, 2025;
originally announced August 2025.
-
Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models
Authors:
Max Peeperkorn,
Tom Kouwenhoven,
Dan Brown,
Anna Jordanous
Abstract:
Instruction-tuning large language models (LLMs) reduces the diversity of their outputs, which has implications for many tasks, particularly for creative tasks. This paper investigates the ``diversity gap'' for a writing prompt narrative generation task. This gap emerges as measured by current diversity metrics for various open-weight and open-source LLMs. The results show significant decreases in…
▽ More
Instruction-tuning large language models (LLMs) reduces the diversity of their outputs, which has implications for many tasks, particularly for creative tasks. This paper investigates the ``diversity gap'' for a writing prompt narrative generation task. This gap emerges as measured by current diversity metrics for various open-weight and open-source LLMs. The results show significant decreases in diversity due to instruction-tuning. We explore the diversity loss at each fine-tuning stage for the OLMo and OLMo 2 models to further understand how output diversity is affected. The results indicate that DPO has the most substantial impact on diversity. Motivated by these findings, we present a new decoding strategy, conformative decoding, which guides an instruct model using its more diverse base model to reintroduce output diversity. We show that conformative decoding typically increases diversity and even maintains or improves quality.
△ Less
Submitted 28 July, 2025;
originally announced July 2025.
-
Towards a fictitious magnetic field trap for both ground and Rydberg state $^{87}$Rb atoms via the evanescent field of an optical nanofibre
Authors:
Alexey Vylegzhanin,
Dylan J. Brown,
Danil F. Kornovan,
Etienne Brion,
Síle Nic Chormaic
Abstract:
Cold Rydberg atoms, known for their long lifetimes and strong dipole-dipole interactions that lead to the Rydberg blockade phenomenon, are among the most promising platforms for quantum simulations, quantum computation and quantum networks. However, a major limitation to the performance of Rydberg atom-based platforms is dephasing, which can be caused by atomic motion within the trap. Here, we pro…
▽ More
Cold Rydberg atoms, known for their long lifetimes and strong dipole-dipole interactions that lead to the Rydberg blockade phenomenon, are among the most promising platforms for quantum simulations, quantum computation and quantum networks. However, a major limitation to the performance of Rydberg atom-based platforms is dephasing, which can be caused by atomic motion within the trap. Here, we propose a trap for $^{87}$Rb cold atoms that confines both the electronic ground state and a Rydberg state, engineered to minimize the differential light shifts between the two states. This is achieved by combining a fictitious magnetic field induced by optical nanofibre guided light and an external bias magnetic field. We calculate trap potentials for the cases of one- and two-guided modes with quasi-linear and quasi-circular polarisations, and calculate trap depths and trap frequencies for different values of laser power and bias fields. Moreover, we discuss the impact of the quadrupole polarisability of the Rydberg atoms on the trap potential and demonstrate how the size of a Rydberg atom influences the ponderomotive potential generated by the nanofibre-guided light field. This work expands on the idea of light-induced fictitious magnetic field traps and presents a practical approach for creating quantum networks using Rydberg atoms integrated with optical nanofibres to generate 1D atom arrays.
△ Less
Submitted 17 July, 2025;
originally announced July 2025.
-
All-sky search for long-duration gravitational-wave transients in the first part of the fourth LIGO-Virgo-KAGRA Observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1750 additional authors not shown)
Abstract:
We present an all-sky search for long-duration gravitational waves (GWs) from the first part of the LIGO-Virgo-KAGRA fourth observing run (O4), called O4a and comprising data taken between 24 May 2023 and 16 January 2024. The GW signals targeted by this search are the so-called "long-duration" (> 1 s) transients expected from a variety of astrophysical processes, including non-axisymmetric deforma…
▽ More
We present an all-sky search for long-duration gravitational waves (GWs) from the first part of the LIGO-Virgo-KAGRA fourth observing run (O4), called O4a and comprising data taken between 24 May 2023 and 16 January 2024. The GW signals targeted by this search are the so-called "long-duration" (> 1 s) transients expected from a variety of astrophysical processes, including non-axisymmetric deformations in magnetars or eccentric binary coalescences. We make minimal assumptions on the emitted GW waveforms in terms of morphologies and durations. Overall, our search targets signals with durations ~1-1000 s and frequency content in the range 16-2048 Hz. In the absence of significant detections, we report the sensitivity limits of our search in terms of root-sum-square signal amplitude (hrss) of reference waveforms. These limits improve upon the results from the third LIGO-Virgo-KAGRA observing run (O3) by about 30% on average. Moreover, this analysis demonstrates substantial progress in our ability to search for long-duration GW signals owing to enhancements in pipeline detection efficiencies. As detector sensitivities continue to advance and observational runs grow longer, unmodeled long-duration searches will increasingly be able to explore a range of compelling astrophysical scenarios involving neutron stars and black holes.
△ Less
Submitted 23 July, 2025; v1 submitted 16 July, 2025;
originally announced July 2025.
-
Practical Crystallography with a Transmission Electron Microscope
Authors:
Benjamin L. Weare,
Kayleigh L. Y. Fung,
Ian Cardillo-Zallo,
William J. Cull,
Michael W. Fay,
Stephen P. Argent,
Paul D. Brown
Abstract:
Three-dimensional electron diffraction (3DED) is a powerful technique providing for crystal structure solutions of sub-micron sized crystals too small for structure determination via X-ray techniques. The entry requirement, however, of a transmission electron microscope (TEM) adapted with bespoke software for coordinated sample stage rotation and continuous electron diffraction data acquisition ha…
▽ More
Three-dimensional electron diffraction (3DED) is a powerful technique providing for crystal structure solutions of sub-micron sized crystals too small for structure determination via X-ray techniques. The entry requirement, however, of a transmission electron microscope (TEM) adapted with bespoke software for coordinated sample stage rotation and continuous electron diffraction data acquisition has generally inhibited the wider uptake of 3DED. To address this limitation, we present novel software GiveMeED appropriate for controlled 3DED data acquisition. The collection of useable reflections beyond 0.8 Å makes 3DED crystallographic processing effectively routine, using standard software and workflows derived from single-crystal X-ray diffraction (SCXRD) techniques. A full experimental workflow for 3DED on a conventional TEM is described in practical terms, in combination with direct imaging, and energy dispersive X-ray spectroscopy (EDS) and electron energy loss spectroscopy (EELS), for the return of comprehensive correlative descriptions of crystal morphologies and sample compositions, with due regard for the quantification of electron flux at each stage of the characterisation process. The accuracy and effectiveness of GiveMeED is demonstrated through structure solutions for case study paracetamol, copper(II) phthalocyanine, and percholorocoronene samples, characterised in their near-native states under controlled low dose conditions at either room or cryogenic temperatures, with determined unit cell parameters and atomic connectivity matching accepted literature X-ray structures for these compounds. To promote the wider adoption of 3DED, we make GiveMeED freely available for use and modification, in support of greater uptake and utilisation of structure solution procedures via electron diffraction.
△ Less
Submitted 14 July, 2025;
originally announced July 2025.
-
GW231123: a Binary Black Hole Merger with Total Mass 190-265 $M_{\odot}$
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1763 additional authors not shown)
Abstract:
On 2023 November 23 the two LIGO observatories both detected GW231123, a gravitational-wave signal consistent with the merger of two black holes with masses $137^{+22}_{-17}\, M_\odot$ and $103^{+20}_{-52}\, M_\odot$ (90\% credible intervals), at luminosity distance 0.7-4.1 Gpc and redshift of $0.39^{+0.27}_{-0.24}$, and a network signal-to-noise ratio of $\sim$22.5. Both black holes exhibit high…
▽ More
On 2023 November 23 the two LIGO observatories both detected GW231123, a gravitational-wave signal consistent with the merger of two black holes with masses $137^{+22}_{-17}\, M_\odot$ and $103^{+20}_{-52}\, M_\odot$ (90\% credible intervals), at luminosity distance 0.7-4.1 Gpc and redshift of $0.39^{+0.27}_{-0.24}$, and a network signal-to-noise ratio of $\sim$22.5. Both black holes exhibit high spins, $0.9^{+0.10}_{-0.19}$ and $0.80^{+0.20}_{-0.51}$ respectively. A massive black hole remnant is supported by an independent ringdown analysis. Some properties of GW231123 are subject to large systematic uncertainties, as indicated by differences in inferred parameters between signal models. The primary black hole lies within or above the theorized mass gap where black holes between 60-130 $M_\odot$ should be rare due to pair instability mechanisms, while the secondary spans the gap. The observation of GW231123 therefore suggests the formation of black holes from channels beyond standard stellar collapse, and that intermediate-mass black holes of mass $\sim$200 $M_\odot$ form through gravitational-wave driven mergers.
△ Less
Submitted 11 August, 2025; v1 submitted 10 July, 2025;
originally announced July 2025.
-
Benchmarking Misuse Mitigation Against Covert Adversaries
Authors:
Davis Brown,
Mahdi Sabbaghi,
Luze Sun,
Alexander Robey,
George J. Pappas,
Eric Wong,
Hamed Hassani
Abstract:
Existing language model safety evaluations focus on overt attacks and low-stakes tasks. Realistic attackers can subvert current safeguards by requesting help on small, benign-seeming tasks across many independent queries. Because individual queries do not appear harmful, the attack is hard to {detect}. However, when combined, these fragments uplift misuse by helping the attacker complete hard and…
▽ More
Existing language model safety evaluations focus on overt attacks and low-stakes tasks. Realistic attackers can subvert current safeguards by requesting help on small, benign-seeming tasks across many independent queries. Because individual queries do not appear harmful, the attack is hard to {detect}. However, when combined, these fragments uplift misuse by helping the attacker complete hard and dangerous tasks. Toward identifying defenses against such strategies, we develop Benchmarks for Stateful Defenses (BSD), a data generation pipeline that automates evaluations of covert attacks and corresponding defenses. Using this pipeline, we curate two new datasets that are consistently refused by frontier models and are too difficult for weaker open-weight models. Our evaluations indicate that decomposition attacks are effective misuse enablers, and highlight stateful defenses as a countermeasure.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class
Authors:
James V. Roggeveen,
Erik Y. Wang,
Will Flintoft,
Peter Donets,
Lucy S. Nathwani,
Nickholas Gutierrez,
David Ettel,
Anton Marius Graf,
Siddharth Dandavate,
Arjun Nageswaran,
Raglan Ward,
Ava Williamson,
Anne Mykland,
Kacper K. Migacz,
Yijun Wang,
Egemen Bostan,
Duy Thuc Nguyen,
Zhe He,
Marc L. Descoteaux,
Felix Yeung,
Shida Liu,
Jorge García Ponce,
Luke Zhu,
Yuyang Chen,
Ekaterina S. Ivshina
, et al. (20 additional authors not shown)
Abstract:
Large language models (LLMs) have shown remarkable progress in mathematical problem-solving, but evaluation has largely focused on problems that have exact analytical solutions or involve formal proofs, often overlooking approximation-based problems ubiquitous in applied science and engineering. To fill this gap, we build on prior work and present HARDMath2, a dataset of 211 original problems cove…
▽ More
Large language models (LLMs) have shown remarkable progress in mathematical problem-solving, but evaluation has largely focused on problems that have exact analytical solutions or involve formal proofs, often overlooking approximation-based problems ubiquitous in applied science and engineering. To fill this gap, we build on prior work and present HARDMath2, a dataset of 211 original problems covering the core topics in an introductory graduate applied math class, including boundary-layer analysis, WKB methods, asymptotic solutions of nonlinear partial differential equations, and the asymptotics of oscillatory integrals. This dataset was designed and verified by the students and instructors of a core graduate applied mathematics course at Harvard. We build the dataset through a novel collaborative environment that challenges students to write and refine difficult problems consistent with the class syllabus, peer-validate solutions, test different models, and automatically check LLM-generated solutions against their own answers and numerical ground truths. Evaluation results show that leading frontier models still struggle with many of the problems in the dataset, highlighting a gap in the mathematical reasoning skills of current LLMs. Importantly, students identified strategies to create increasingly difficult problems by interacting with the models and exploiting common failure modes. This back-and-forth with the models not only resulted in a richer and more challenging benchmark but also led to qualitative improvements in the students' understanding of the course material, which is increasingly important as we enter an age where state-of-the-art language models can solve many challenging problems across a wide domain of fields.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Iterative Recommendations based on Monte Carlo Sampling and Trust Estimation in Multi-Stage Vehicular Traffic Routing Games
Authors:
Doris E. M. Brown,
Venkata Sriram Siddhardh Nadendla,
Sajal K. Das
Abstract:
The shortest-time route recommendations offered by modern navigation systems fuel selfish routing in urban vehicular traffic networks and are therefore one of the main reasons for the growth of congestion. In contrast, intelligent transportation systems (ITS) prefer to steer driver-vehicle systems (DVS) toward system-optimal route recommendations, which are primarily designed to mitigate network c…
▽ More
The shortest-time route recommendations offered by modern navigation systems fuel selfish routing in urban vehicular traffic networks and are therefore one of the main reasons for the growth of congestion. In contrast, intelligent transportation systems (ITS) prefer to steer driver-vehicle systems (DVS) toward system-optimal route recommendations, which are primarily designed to mitigate network congestion. However, due to the misalignment in motives, drivers exhibit a lack of trust in the ITS. This paper models the interaction between a DVS and an ITS as a novel, multi-stage routing game where the DVS exhibits dynamics in its trust towards the recommendations of ITS based on counterfactual and observed game outcomes. Specifically, DVS and ITS are respectively modeled as a travel-time minimizer and network congestion minimizer, each having nonidentical prior beliefs about the network state. A novel approximate algorithm to compute the Bayesian Nash equilibrium, called ROSTER(Recommendation Outcome Sampling with Trust Estimation and Re-evaluation), is proposed based on Monte Carlo sampling with trust belief updating to determine the best response route recommendations of the ITS at each stage of the game. Simulation results demonstrate that the trust prediction error in the proposed algorithm converges to zero with a growing number of multi-stage DVS-ITS interactions and is effectively able to both mitigate congestion and reduce driver travel times when compared to alternative route recommendation strategies.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Strain Engineering of Magnetoresistance and Magnetic Anisotropy in CrSBr
Authors:
Eudomar Henríquez-Guerra,
Alberto M. Ruiz,
Marta Galbiati,
Alvaro Cortes-Flores,
Daniel Brown,
Esteban Zamora-Amo,
Lisa Almonte,
Andrei Shumilin,
Juan Salvador-Sánchez,
Ana Pérez-Rodríguez,
Iñaki Orue,
Andrés Cantarero,
Andres Castellanos-Gomez,
Federico Mompeán,
Mar Garcia-Hernandez,
Efrén Navarro-Moratalla,
Enrique Díez,
Mario Amado,
José J. Baldoví,
M. Reyes Calvo
Abstract:
Tailoring magnetoresistance and magnetic anisotropy in van der Waals magnetic materials is essential for advancing their integration into technological applications. In this regard, strain engineering has emerged as a powerful and versatile strategy to control magnetism at the two-dimensional (2D) limit. Here, we demonstrate that compressive biaxial strain significantly enhances the magnetoresista…
▽ More
Tailoring magnetoresistance and magnetic anisotropy in van der Waals magnetic materials is essential for advancing their integration into technological applications. In this regard, strain engineering has emerged as a powerful and versatile strategy to control magnetism at the two-dimensional (2D) limit. Here, we demonstrate that compressive biaxial strain significantly enhances the magnetoresistance and magnetic anisotropy of few-layer CrSBr flakes. Strain is efficiently transferred to the flakes from the thermal compression of a polymeric substrate upon cooling, as confirmed by temperature-dependent Raman spectroscopy. This strain induces a remarkable increase in the magnetoresistance ratio and in the saturation fields required to align the magnetization of CrSBr along each of its three crystalographic directions, reaching a twofold enhancement along the magnetic easy axis. This enhancement is accompanied by a subtle reduction of the Néel temperature by ~10K. Our experimental results are fully supported by first-principles calculations, which link the observed effects to a strain-driven modification in interlayer exchange coupling and magnetic anisotropy energy. These findings establish strain engineering as a key tool for fine-tuning magnetotransport properties in 2D magnetic semiconductors, paving the way for implementation in spintronics and information storage devices.
△ Less
Submitted 31 July, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
United States Muon Collider Community White Paper for the European Strategy for Particle Physics Update
Authors:
A. Abdelhamid,
D. Acosta,
P. Affleck,
G. Agarwal,
K. Agashe,
P. Agrawal,
R. Alharthy,
B. Allmond,
D. Ally,
G. Ambrosio,
O. Amram,
A. Apresyan,
A. Apyan,
C. Aruta,
C. Arzate,
P. Asadi,
J. Ashley,
A. Avasthi,
J. Backus,
R. Bartek,
A. Batz,
L. Bauerdick,
C. Bell,
S. Belomestnykh,
J. S. Berg
, et al. (280 additional authors not shown)
Abstract:
This document is being submitted to the 2024-2026 European Strategy for Particle Physics Update (ESPPU) process on behalf of the US Muon Collider community, with its preparation coordinated by the interim US Muon Collider Coordination Group. The US Muon Collider Community comprises a few hundred American scientists. The purpose of the document is to inform ESPPU about the US plans for Muon Collide…
▽ More
This document is being submitted to the 2024-2026 European Strategy for Particle Physics Update (ESPPU) process on behalf of the US Muon Collider community, with its preparation coordinated by the interim US Muon Collider Coordination Group. The US Muon Collider Community comprises a few hundred American scientists. The purpose of the document is to inform ESPPU about the US plans for Muon Collider research and development (R&D), explain how these efforts align with the broader international R&D initiatives, and present the US community vision for the future realization of this transformative project.
△ Less
Submitted 15 April, 2025; v1 submitted 30 March, 2025;
originally announced March 2025.
-
Assessment of AI-Generated Pediatric Rehabilitation SOAP-Note Quality
Authors:
Solomon Amenyo,
Maura R. Grossman,
Daniel G. Brown,
Brendan Wylie-Toal
Abstract:
This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copil…
▽ More
This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copilot, a commercial LLM, and KAUWbot, a fine-tuned LLM developed for KidsAbility Centre for Child Development (an Ontario pediatric rehabilitation facility), in simplifying and automating this process. We focus on two key questions: (i) How does the quality of AI-generated SOAP notes based on short clinician summaries compare to human-authored notes, and (ii) To what extent is human editing necessary for improving AI-generated SOAP notes? We found no evidence of prior work assessing the quality of AI-generated clinical notes in pediatric rehabilitation.
We used a sample of 432 SOAP notes, evenly divided among human-authored, Copilot-generated, and KAUWbot-generated notes. We employ a blind evaluation by experienced clinicians based on a custom rubric. Statistical analysis is conducted to assess the quality of the notes and the impact of human editing. The results suggest that AI tools such as KAUWbot and Copilot can generate SOAP notes with quality comparable to those authored by humans. We highlight the potential for combining AI with human expertise to enhance clinical documentation and offer insights for the future integration of AI into pediatric rehabilitation practice and other settings for the management of clinical conditions.
△ Less
Submitted 3 February, 2025;
originally announced March 2025.
-
Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
Authors:
Nazanin Moradinasab,
Saurav Sengupta,
Jiebei Liu,
Sana Syed,
Donald E. Brown
Abstract:
Healthcare relies on multiple types of data, such as medical images, genetic information, and clinical records, to improve diagnosis and treatment. However, missing data is a common challenge due to privacy restrictions, cost, and technical issues, making many existing multi-modal models unreliable. To address this, we propose a new multi-model model called Mixture of Experts, Symmetric Aligning,…
▽ More
Healthcare relies on multiple types of data, such as medical images, genetic information, and clinical records, to improve diagnosis and treatment. However, missing data is a common challenge due to privacy restrictions, cost, and technical issues, making many existing multi-modal models unreliable. To address this, we propose a new multi-model model called Mixture of Experts, Symmetric Aligning, and Reconstruction (MoSARe), a deep learning framework that handles incomplete multimodal data while maintaining high accuracy. MoSARe integrates expert selection, cross-modal attention, and contrastive learning to improve feature representation and decision-making. Our results show that MoSARe outperforms existing models in situations when the data is complete. Furthermore, it provides reliable predictions even when some data are missing. This makes it especially useful in real-world healthcare settings, including resource-limited environments. Our code is publicly available at https://github.com/NazaninMn/MoSARe.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics
Authors:
Herman Chau,
Helen Jenne,
Davis Brown,
Jesse He,
Mark Raugas,
Sara Billey,
Henry Kvinge
Abstract:
With recent dramatic increases in AI system capabilities, there has been growing interest in utilizing machine learning for reasoning-heavy, quantitative tasks, particularly mathematics. While there are many resources capturing mathematics at the high-school, undergraduate, and graduate level, there are far fewer resources available that align with the level of difficulty and open endedness encoun…
▽ More
With recent dramatic increases in AI system capabilities, there has been growing interest in utilizing machine learning for reasoning-heavy, quantitative tasks, particularly mathematics. While there are many resources capturing mathematics at the high-school, undergraduate, and graduate level, there are far fewer resources available that align with the level of difficulty and open endedness encountered by professional mathematicians working on open problems. To address this, we introduce a new collection of datasets, the Algebraic Combinatorics Dataset Repository (ACD Repo), representing either foundational results or open problems in algebraic combinatorics, a subfield of mathematics that studies discrete structures arising from abstract algebra. Further differentiating our dataset collection is the fact that it aims at the conjecturing process. Each dataset includes an open-ended research-level question and a large collection of examples (up to 10M in some cases) from which conjectures should be generated. We describe all nine datasets, the different ways machine learning models can be applied to them (e.g., training with narrow models followed by interpretability analysis or program synthesis with LLMs), and discuss some of the challenges involved in designing datasets like these.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Adaptively profiling models with task elicitation
Authors:
Davis Brown,
Prithvi Balehannina,
Helen Jin,
Shreya Havaldar,
Hamed Hassani,
Eric Wong
Abstract:
Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks -- an order of magnitude more than prior work -- where frontier models exhibit systematic…
▽ More
Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks -- an order of magnitude more than prior work -- where frontier models exhibit systematic failures, in domains ranging from forecasting to online harassment. For example, we find that Sonnet 3.5 over-associates quantum computing and AGI and that o3-mini is prone to hallucination when fabrications are repeated in-context.
△ Less
Submitted 25 September, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique
Authors:
Piotr Sawicki,
Marek Grześ,
Dan Brown,
Fabrício Góes
Abstract:
This study adapts the Consensual Assessment Technique (CAT) for Large Language Models (LLMs), introducing a novel methodology for poetry evaluation. Using a 90-poem dataset with a ground truth based on publication venue, we demonstrate that this approach allows LLMs to significantly surpass the performance of non-expert human judges. Our method, which leverages forced-choice ranking within small,…
▽ More
This study adapts the Consensual Assessment Technique (CAT) for Large Language Models (LLMs), introducing a novel methodology for poetry evaluation. Using a 90-poem dataset with a ground truth based on publication venue, we demonstrate that this approach allows LLMs to significantly surpass the performance of non-expert human judges. Our method, which leverages forced-choice ranking within small, randomized batches, enabled Claude-3-Opus to achieve a Spearman's Rank Correlation of 0.87 with the ground truth, dramatically outperforming the best human non-expert evaluation (SRC = 0.38). The LLM assessments also exhibited high inter-rater reliability, underscoring the methodology's robustness. These findings establish that LLMs, when guided by a comparative framework, can be effective and reliable tools for assessing poetry, paving the way for their broader application in other creative domains.
△ Less
Submitted 4 October, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows
Authors:
Frederic Gmeiner,
Nicolai Marquardt,
Michael Bentley,
Hugo Romat,
Michel Pahud,
David Brown,
Asta Roseway,
Nikolas Martelaro,
Kenneth Holstein,
Ken Hinckley,
Nathalie Riche
Abstract:
Despite Generative AI (GenAI) systems' potential for enhancing content creation, users often struggle to effectively integrate GenAI into their creative workflows. Core challenges include misalignment of AI-generated content with user intentions (intent elicitation and alignment), user uncertainty around how to best communicate their intents to the AI system (prompt formulation), and insufficient…
▽ More
Despite Generative AI (GenAI) systems' potential for enhancing content creation, users often struggle to effectively integrate GenAI into their creative workflows. Core challenges include misalignment of AI-generated content with user intentions (intent elicitation and alignment), user uncertainty around how to best communicate their intents to the AI system (prompt formulation), and insufficient flexibility of AI systems to support diverse creative workflows (workflow flexibility). Motivated by these challenges, we created IntentTagger: a system for slide creation based on the notion of Intent Tags - small, atomic conceptual units that encapsulate user intent - for exploring granular and non-linear micro-prompting interactions for Human-GenAI co-creation workflows. Our user study with 12 participants provides insights into the value of flexibly expressing intent across varying levels of ambiguity, meta-intent elicitation, and the benefits and challenges of intent tag-driven workflows. We conclude by discussing the broader implications of our findings and design considerations for GenAI-supported content creation workflows.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools
Authors:
Nathalie Riche,
Anna Offenwanger,
Frederic Gmeiner,
David Brown,
Hugo Romat,
Michel Pahud,
Nicolai Marquardt,
Kori Inkpen,
Ken Hinckley
Abstract:
Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody "prompts" as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple in…
▽ More
Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody "prompts" as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple interpretations of ambiguous user-intents (Reflection-in-intent) as well as the range of AI-model responses (Reflection-in-response) to inform design "moves" towards a desired result; and (3) Grounding to instantiate an instrument from an example, result, or extrapolation directly from another instrument. Further, AI-Instruments leverage LLM's to suggest, vary, and refine new instruments, enabling a system that goes beyond hard-coded functionality by generating its own instrumental controls from content. We demonstrate four technology probes, applied to image generation, and qualitative insights from twelve participants, showing how AI-Instruments address challenges of intent formulation, steering via direct manipulation, and non-linear iterative workflows to reflect and resolve ambiguous intents.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Discovery and Deployment of Emergent Robot Swarm Behaviors via Representation Learning and Real2Sim2Real Transfer
Authors:
Connor Mattson,
Varun Raveendra,
Ricardo Vega,
Cameron Nowzari,
Daniel S. Drew,
Daniel S. Brown
Abstract:
Given a swarm of limited-capability robots, we seek to automatically discover the set of possible emergent behaviors. Prior approaches to behavior discovery rely on human feedback or hand-crafted behavior metrics to represent and evolve behaviors and only discover behaviors in simulation, without testing or considering the deployment of these new behaviors on real robot swarms. In this work, we pr…
▽ More
Given a swarm of limited-capability robots, we seek to automatically discover the set of possible emergent behaviors. Prior approaches to behavior discovery rely on human feedback or hand-crafted behavior metrics to represent and evolve behaviors and only discover behaviors in simulation, without testing or considering the deployment of these new behaviors on real robot swarms. In this work, we present Real2Sim2Real Behavior Discovery via Self-Supervised Representation Learning, which combines representation learning and novelty search to discover possible emergent behaviors automatically in simulation and enable direct controller transfer to real robots. First, we evaluate our method in simulation and show that our proposed self-supervised representation learning approach outperforms previous hand-crafted metrics by more accurately representing the space of possible emergent behaviors. Then, we address the reality gap by incorporating recent work in sim2real transfer for swarms into our lightweight simulator design, enabling direct robot deployment of all behaviors discovered in simulation on an open-source and low-cost robot platform.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies
Authors:
Akansha Kalra,
Basavasagar Patil,
Guanhong Tao,
Daniel S. Brown
Abstract:
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Polic…
▽ More
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantizied Behavior Transformer (VQ-BET). We study the vulnerability of these methods to universal adversarial perturbations. Our experiments on several simulated robotic manipulation tasks reveal that most of the current methods are highly vulnerable to adversarial perturbations. We also show that these attacks are often transferable across algorithms, architectures, and tasks, raising concerning security vulnerabilities to black-box attacks. To the best of our knowledge, we are the first to present a systematic study of the vulnerabilities of different LfD algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern BC algorithms, paving the way for future work in addressing such limitations.
△ Less
Submitted 13 October, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Iterative Refinement of Arbitrary Micro-Optical Surfaces
Authors:
Meagan Plummer,
Stephen Taylor,
Matthew Marshall,
David Brown,
Robert Leonard,
Seth Hyra,
Spencer Olson
Abstract:
We introduce an adaptive optical refinement method enabling ultra-precise micro-milling of arbitrary surfaces. Through repeated iteration, our method reduces surface error without requiring significant specific surface engineering. This remediates the long sample preparation times and lack of refinement capability that previously reported methods suffer from. The iterative refinement milling metho…
▽ More
We introduce an adaptive optical refinement method enabling ultra-precise micro-milling of arbitrary surfaces. Through repeated iteration, our method reduces surface error without requiring significant specific surface engineering. This remediates the long sample preparation times and lack of refinement capability that previously reported methods suffer from. The iterative refinement milling method was used to produce spherical mirrors with small radii of curvature and low surface roughness for use in micro Fabry-Perot cavities. We demonstrate the use of this adaptive process to produce a variety of arbitrary surface geometries on both optical fiber tips as well as optical flats. We additionally discuss our capability to apply iterative refinement milling adaptively to various materials, including to construct GRIN lenses.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1087 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 25 September, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Early Failure Detection in Autonomous Surgical Soft-Tissue Manipulation via Uncertainty Quantification
Authors:
Jordan Thompson,
Ronald Koe,
Anthony Le,
Gabriella Goodman,
Daniel S. Brown,
Alan Kuntz
Abstract:
Autonomous surgical robots are a promising solution to the increasing demand for surgery amid a shortage of surgeons. Recent work has proposed learning-based approaches for the autonomous manipulation of soft tissue. However, due to variability in tissue geometries and stiffnesses, these methods do not always perform optimally, especially in out-of-distribution settings. We propose, develop, and t…
▽ More
Autonomous surgical robots are a promising solution to the increasing demand for surgery amid a shortage of surgeons. Recent work has proposed learning-based approaches for the autonomous manipulation of soft tissue. However, due to variability in tissue geometries and stiffnesses, these methods do not always perform optimally, especially in out-of-distribution settings. We propose, develop, and test the first application of uncertainty quantification to learned surgical soft-tissue manipulation policies as an early identification system for task failures. We analyze two different methods of uncertainty quantification, deep ensembles and Monte Carlo dropout, and find that deep ensembles provide a stronger signal of future task success or failure. We validate our approach using the physical daVinci Research Kit (dVRK) surgical robot to perform physical soft-tissue manipulation. We show that we are able to successfully detect out-of-distribution states leading to task failure and request human intervention when necessary while still enabling autonomous manipulation when possible. Our learned tissue manipulation policy with uncertainty-based early failure detection achieves a zero-shot sim2real performance improvement of 47.5% over the prior state of the art in learned soft-tissue manipulation. We also show that our method generalizes well to new types of tissue as well as to a bimanual soft-tissue manipulation task.
△ Less
Submitted 25 August, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
Toward Zero-Shot User Intent Recognition in Shared Autonomy
Authors:
Atharv Belsare,
Zohre Karimi,
Connor Mattson,
Daniel S. Brown
Abstract:
A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being ab…
▽ More
A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being able to assist the user. We propose and study a zero-shot, vision-only shared autonomy (VOSA) framework designed to allow robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks with unknown and dynamically changing object locations. To demonstrate the effectiveness of our VOSA framework, we instantiate a simple version of VOSA on a Kinova Gen3 manipulator and evaluate our system by conducting a user study on three tabletop manipulation tasks. The performance of VOSA matches that of an oracle baseline model that receives privileged knowledge of possible human intents while also requiring significantly less effort than unassisted teleoperation. In more realistic settings, where the set of possible human intents is fully or partially unknown, we demonstrate that VOSA requires less human effort and time than baseline approaches while being preferred by a majority of the participants. Our results demonstrate the efficacy and efficiency of using off-the-shelf vision algorithms to enable flexible and beneficial shared control of a robot manipulator. Code and videos available here: https://sites.google.com/view/zeroshot-sharedautonomy/home.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
TreeMatch: A Fully Unsupervised WSD System Using Dependency Knowledge on a Specific Domain
Authors:
Andrew Tran,
Chris Bowes,
David Brown,
Ping Chen,
Max Choly,
Wei Ding
Abstract:
Word sense disambiguation (WSD) is one of the main challenges in Computational Linguistics. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English All-words Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Specific Domain). The system is based on a fully unsupervised method using dependency kno…
▽ More
Word sense disambiguation (WSD) is one of the main challenges in Computational Linguistics. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English All-words Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Specific Domain). The system is based on a fully unsupervised method using dependency knowledge drawn from a domain specific knowledge base that was built for this task. When evaluated on the task, the system precision performs above the Most Frequent Selection baseline.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
Search for continuous gravitational waves from known pulsars in the first part of the fourth LIGO-Virgo-KAGRA observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi,
A. Al-Jodah,
C. Alléné
, et al. (1794 additional authors not shown)
Abstract:
Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent ana…
▽ More
Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent analysis methods considering the single-harmonic and the dual-harmonic emission models. We find no evidence of a CW signal in O4a data for both models and set upper limits on the signal amplitude and on the ellipticity, which quantifies the asymmetry in the neutron star mass distribution. For the single-harmonic emission model, 29 targets have the upper limit on the amplitude below the theoretical spin-down limit. The lowest upper limit on the amplitude is $6.4\!\times\!10^{-27}$ for the young energetic pulsar J0537-6910, while the lowest constraint on the ellipticity is $8.8\!\times\!10^{-9}$ for the bright nearby millisecond pulsar J0437-4715. Additionally, for a subset of 16 targets we performed a narrowband search that is more robust regarding the emission model, with no evidence of a signal. We also found no evidence of non-standard polarizations as predicted by the Brans-Dicke theory.
△ Less
Submitted 26 September, 2025; v1 submitted 2 January, 2025;
originally announced January 2025.
-
White Paper on Software Infrastructure for Advanced Nuclear Physics Computing
Authors:
P. M. Jacobs,
A. Boehnlein,
B. Sawatzky,
J. Carlson,
I. Cloet,
M. Diefenthaler,
R. G. Edwards,
K. Godbey,
W. R. Hix,
K. Orginos,
T. Papenbrock,
M. Ploskon,
C. Ratti,
R. Soltz,
T. Wenaus,
L. Andreoli,
J. Brodsky,
D. Brown,
A. Bulgac,
G. D. Chung,
S. J. Coleman,
J. Detwiler,
A. Dubey,
R. Ehlers,
S. Gandolfi
, et al. (27 additional authors not shown)
Abstract:
This White Paper documents the discussion and consensus conclusions of the workshop "Software Infrastructure for Advanced Nuclear Physics Computing" (SANPC 24), which was held at Jefferson Lab on June 20-22, 2024. The workshop brought together members of the US Nuclear Physics community with data scientists and funding agency representatives, to discuss the challenges and opportunities in advanced…
▽ More
This White Paper documents the discussion and consensus conclusions of the workshop "Software Infrastructure for Advanced Nuclear Physics Computing" (SANPC 24), which was held at Jefferson Lab on June 20-22, 2024. The workshop brought together members of the US Nuclear Physics community with data scientists and funding agency representatives, to discuss the challenges and opportunities in advanced computing for Nuclear Physics in the coming decade. Opportunities for sustainable support and growth are identified, within the context of existing and currently planned DOE and NSF programs.
△ Less
Submitted 21 April, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Interactive Classification Metrics: A graphical application to build robust intuition for classification model evaluation
Authors:
David H. Brown,
Davide Chicco
Abstract:
Machine learning continues to grow in popularity in academia, in industry, and is increasingly used in other fields. However, most of the common metrics used to evaluate even simple binary classification models have shortcomings that are neither immediately obvious nor consistently taught to practitioners. Here we present Interactive Classification Metrics (ICM), an application to visualize and ex…
▽ More
Machine learning continues to grow in popularity in academia, in industry, and is increasingly used in other fields. However, most of the common metrics used to evaluate even simple binary classification models have shortcomings that are neither immediately obvious nor consistently taught to practitioners. Here we present Interactive Classification Metrics (ICM), an application to visualize and explore the relationships between different evaluation metrics. The user changes the distribution statistics and explores corresponding changes across a suite of evaluation metrics. The interactive, graphical nature of this tool emphasizes the tradeoffs of each metric without the overhead of data wrangling and model training. The goals of this application are: (1) to aid practitioners in the ever-expanding machine learning field to choose the most appropriate evaluation metrics for their classification problem; (2) to promote careful attention to interpretation that is required even in the simplest scenarios like binary classification. Our application is publicly available for free under the MIT license as a Python package on PyPI at https://pypi.org/project/interactive-classification-metrics and on GitHub at https://github.com/davhbrown/interactive_classification_metrics.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
Learning Complex Word Embeddings in Classical and Quantum Spaces
Authors:
Carys Harvey,
Stephen Clark,
Douglas Brown,
Konstantinos Meichanetzidis
Abstract:
We present a variety of methods for training complex-valued word embeddings, based on the classical Skip-gram model, with a straightforward adaptation simply replacing the real-valued vectors with arbitrary vectors of complex numbers. In a more "physically-inspired" approach, the vectors are produced by parameterised quantum circuits (PQCs), which are unitary transformations resulting in normalise…
▽ More
We present a variety of methods for training complex-valued word embeddings, based on the classical Skip-gram model, with a straightforward adaptation simply replacing the real-valued vectors with arbitrary vectors of complex numbers. In a more "physically-inspired" approach, the vectors are produced by parameterised quantum circuits (PQCs), which are unitary transformations resulting in normalised vectors which have a probabilistic interpretation. We develop a complex-valued version of the highly optimised C code version of Skip-gram, which allows us to easily produce complex embeddings trained on a 3.8B-word corpus for a vocabulary size of over 400k, for which we are then able to train a separate PQC for each word. We evaluate the complex embeddings on a set of standard similarity and relatedness datasets, for some models obtaining results competitive with the classical baseline. We find that, while training the PQCs directly tends to harm performance, the quantum word embeddings from the two-stage process perform as well as the classical Skip-gram embeddings with comparable numbers of parameters. This enables a highly scalable route to learning embeddings in complex spaces which scales with the size of the vocabulary rather than the size of the training corpus. In summary, we demonstrate how to produce a large set of high-quality word embeddings for use in complex-valued and quantum-inspired NLP models, and for exploring potential advantage in quantum NLP models.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.