-
Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration
Authors:
Benjamin Li,
Shuyang Shi,
Lucia Romero,
Huao Li,
Yaqi Xie,
Woojun Kim,
Stefanos Nikolaidis,
Michael Lewis,
Katia Sycara,
Simon Stepputtis
Abstract:
In collaborative tasks, being able to adapt to your teammates is a necessary requirement for success. When teammates are heterogeneous, such as in human-agent teams, agents need to be able to observe, recognize, and adapt to their human partners in real time. This becomes particularly challenging in tasks with time pressure and complex strategic spaces where the dynamics can change rapidly. In thi…
▽ More
In collaborative tasks, being able to adapt to your teammates is a necessary requirement for success. When teammates are heterogeneous, such as in human-agent teams, agents need to be able to observe, recognize, and adapt to their human partners in real time. This becomes particularly challenging in tasks with time pressure and complex strategic spaces where the dynamics can change rapidly. In this work, we introduce TALENTS, a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a range of partner strategies, enabling ad-hoc teamwork. Our approach utilizes a variational autoencoder to learn a latent strategy space from trajectory data. This latent space represents the underlying strategies that agents employ. Subsequently, the system identifies different types of strategy by clustering the data. Finally, a cooperator agent is trained to generate partners for each type of strategy, conditioned on these clusters. In order to adapt to previously unseen partners, we leverage a fixed-share regret minimization algorithm that infers and adjusts the estimated partner strategy dynamically. We assess our approach in a customized version of the Overcooked environment, posing a challenging cooperative cooking task that demands strong coordination across a wide range of possible strategies. Using an online user study, we show that our agent outperforms current baselines when working with unfamiliar human partners.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Low-mass vector-meson production at forward rapidity in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
D. Anderson,
V. Andrieux,
S. Antsupov,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
E. Bannikov,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont
, et al. (331 additional authors not shown)
Abstract:
The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nuc…
▽ More
The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nucleons, $\langle N_{\rm part}\rangle$, and the transverse momentum $p_T$. These results were compared with those obtained via the kaon decay channel in a similar $p_T$ range at midrapidity. The nuclear-modification factors in both rapidity regions are consistent within the uncertainties. A comparison of the $ω+ρ$ and $J/ψ$ mesons reveals that the light and heavy flavors are consistently suppressed across both $p_T$ and ${\langle}N_{\rm part}\rangle$. In contrast, the $φ$ meson displays a nuclear-modification factor consistent with unity, suggesting strangeness enhancement in the medium formed.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Dissipation Pathways in a Photosynthetic Complex
Authors:
Ignacio Gustin,
Chang Woo Kim,
Ignacio Franco
Abstract:
Determining how energy flows within and between molecules is crucial for understanding chemical reactions, material properties, and even vital processes such as photosynthesis. While the general principles of energy transfer are well established, elucidating the specific molecular pathways by which energy is funneled remains challenging as it requires tracking energy flow in complex molecular envi…
▽ More
Determining how energy flows within and between molecules is crucial for understanding chemical reactions, material properties, and even vital processes such as photosynthesis. While the general principles of energy transfer are well established, elucidating the specific molecular pathways by which energy is funneled remains challenging as it requires tracking energy flow in complex molecular environments. Here, we demonstrate how photon excitation energy is partially dissipated in the light-harvesting Fenna-Matthews-Olson (FMO) complex, mediating the excitation energy transfer from light-harvesting chlorosomes to the photosynthetic reaction center in green sulfur bacteria. Specifically, we isolate the contribution of the protein and specific vibrational modes of the pigment molecules to the energy dynamics. For this, we introduce an efficient computational implementation of a recently proposed theory of dissipation pathways for open quantum systems. Using it and a state-of-the-art FMO model with highly structured and chromophore-specific spectral densities, we demonstrate that energy dissipation is dominated by low-frequency modes ($<$ 800 cm$^{-1}$) as their energy range is near-resonance with the energy gaps between electronic states of the pigments. We identify the most important mode for dissipation to be in-plane breathing modes ($\sim$200 cm$^{-1}$) of the bacteriochlorophylls in the complex. Conversely, far-detuned intramolecular vibrations with higher frequencies ($>$ 800 cm$^{-1}$) play no role in dissipation. Interestingly, the FMO complex first needs to borrow energy from the environment to release excess photonic energy, making the energy dissipation dynamics non-monotonic. Beyond their fundamental value, these insights can guide the development of artificial light-harvesting devices and, more broadly, engineer environments for chemical and quantum control tasks.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues
Authors:
Kyochul Jang,
Donghyeon Lee,
Kyusik Kim,
Dongseok Heo,
Taewhoo Lee,
Woojeong Kim,
Bongwon Suh
Abstract:
Existing function-calling benchmarks focus on single-turn interactions. However, they overlook the complexity of real-world scenarios. To quantify how existing benchmarks address practical applications, we introduce DICE-SCORE, a metric that evaluates the dispersion of tool-related information such as function name and parameter values throughout the dialogue. Analyzing existing benchmarks through…
▽ More
Existing function-calling benchmarks focus on single-turn interactions. However, they overlook the complexity of real-world scenarios. To quantify how existing benchmarks address practical applications, we introduce DICE-SCORE, a metric that evaluates the dispersion of tool-related information such as function name and parameter values throughout the dialogue. Analyzing existing benchmarks through DICE-SCORE reveals notably low scores, highlighting the need for more realistic scenarios. To address this gap, we present DICE-BENCH, a framework that constructs practical function-calling datasets by synthesizing conversations through a tool graph that maintains dependencies across rounds and a multi-agent system with distinct personas to enhance dialogue naturalness. The final dataset comprises 1,607 high-DICE-SCORE instances. Our experiments on 19 LLMs with DICE-BENCH show that significant advances are still required before such models can be deployed effectively in real-world settings. Our code and data are all publicly available: https://snuhcc.github.io/DICE-Bench/.
△ Less
Submitted 2 July, 2025; v1 submitted 28 June, 2025;
originally announced June 2025.
-
Emergence of Text Readability in Vision Language Models
Authors:
Jaeyoo Park,
Sanghyuk Chun,
Wonjae Kim,
Sangdoo Yun,
Bohyung Han
Abstract:
We investigate how the ability to recognize textual content within images emerges during the training of Vision-Language Models (VLMs). Our analysis reveals a critical phenomenon: the ability to read textual information in a given image \textbf{(text readability)} emerges abruptly after substantial training iterations, in contrast to semantic content understanding which develops gradually from the…
▽ More
We investigate how the ability to recognize textual content within images emerges during the training of Vision-Language Models (VLMs). Our analysis reveals a critical phenomenon: the ability to read textual information in a given image \textbf{(text readability)} emerges abruptly after substantial training iterations, in contrast to semantic content understanding which develops gradually from the early stages of training. This delayed emergence may reflect how contrastive learning tends to initially prioritize general semantic understanding, with text-specific symbolic processing developing later. Interestingly, the ability to match images with rendered text develops even slower, indicating a deeper need for semantic integration. These findings highlight the need for tailored training strategies to accelerate robust text comprehension in VLMs, laying the groundwork for future research on optimizing multimodal learning.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Approximating Language Model Training Data from Weights
Authors:
John X. Morris,
Junjie Oscar Yin,
Woojeong Kim,
Vitaly Shmatikov,
Alexander M. Rush
Abstract:
Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the highest-matching data from a large public text corpus and show its effectiveness at recovering useful data given only weights of the original and finetuned models.…
▽ More
Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the highest-matching data from a large public text corpus and show its effectiveness at recovering useful data given only weights of the original and finetuned models. Even when none of the true training data is known, our method is able to locate a small subset of public Web documents can be used to train a model to close to the original model performance given models trained for both classification and supervised-finetuning. On the AG News classification task, our method improves performance from 65% (using randomly selected data) to 80%, approaching the expert benchmark of 88%. When applied to a model trained with SFT on MSMARCO web documents, our method reduces perplexity from 3.3 to 2.3, compared to an expert LLAMA model's perplexity of 2.0.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Adaptive Data Augmentation for Thompson Sampling
Authors:
Wonyoung Kim
Abstract:
In linear contextual bandits, the objective is to select actions that maximize cumulative rewards, modeled as a linear function with unknown parameters. Although Thompson Sampling performs well empirically, it does not achieve optimal regret bounds. This paper proposes a nearly minimax optimal Thompson Sampling for linear contextual bandits by developing a novel estimator with the adaptive augment…
▽ More
In linear contextual bandits, the objective is to select actions that maximize cumulative rewards, modeled as a linear function with unknown parameters. Although Thompson Sampling performs well empirically, it does not achieve optimal regret bounds. This paper proposes a nearly minimax optimal Thompson Sampling for linear contextual bandits by developing a novel estimator with the adaptive augmentation and coupling of the hypothetical samples that are designed for efficient parameter learning. The proposed estimator accurately predicts rewards for all arms without relying on assumptions for the context distribution. Empirical results show robust performance and significant improvement over existing methods.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
Authors:
Xuanchi Ren,
Yifan Lu,
Tianshi Cao,
Ruiyuan Gao,
Shengyu Huang,
Amirmojtaba Sabour,
Tianchang Shen,
Tobias Pfaff,
Jay Zhangjie Wu,
Runjian Chen,
Seung Wook Kim,
Jun Gao,
Laura Leal-Taixe,
Mike Chen,
Sanja Fidler,
Huan Ling
Abstract:
Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generat…
▽ More
Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with high-fidelity and challenging scenarios. Experimentally, we demonstrate that our generated data helps in mitigating long-tail distribution problems and enhances generalization in downstream tasks such as 3D lane detection, 3D object detection and driving policy learning. We open source our pipeline toolkit, dataset and model weights through the NVIDIA's Cosmos platform.
Project page: https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams
△ Less
Submitted 18 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Semi-gradient DICE for Offline Constrained Reinforcement Learning
Authors:
Woosung Kim,
JunHo Seo,
Jongmin Lee,
Byung-Jun Lee
Abstract:
Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline setting…
▽ More
Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to enable OPE and constrained RL through semi-gradient DICE. Our method ensures accurate cost estimation and achieves state-of-the-art performance on the offline constrained RL benchmark, DSRL.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
Authors:
Woohyeon Park,
Woojin Kim,
Jaeik Kim,
Jaeyoung Do
Abstract:
Despite significant advancements in Vision-Language Models (VLMs), the performance of existing VLMs remains hindered by object hallucination, a critical challenge to achieving accurate visual understanding. To address this issue, we propose SECOND: Selective and Contrastive Decoding, a novel approach that enables VLMs to effectively leverage multi-scale visual information with an object-centric ma…
▽ More
Despite significant advancements in Vision-Language Models (VLMs), the performance of existing VLMs remains hindered by object hallucination, a critical challenge to achieving accurate visual understanding. To address this issue, we propose SECOND: Selective and Contrastive Decoding, a novel approach that enables VLMs to effectively leverage multi-scale visual information with an object-centric manner, closely aligning with human visual perception. SECOND progressively selects and integrates multi-scale visual information, facilitating a more precise interpretation of images. By contrasting these visual information iteratively, SECOND significantly reduces perceptual hallucinations and outperforms a wide range of benchmarks. Our theoretical analysis and experiments highlight the largely unexplored potential of multi-scale application in VLMs, showing that prioritizing and contrasting across scales outperforms existing methods.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning
Authors:
Woosung Kim,
Jinho Lee,
Jongmin Lee,
Byung-Jun Lee
Abstract:
Multi-objective reinforcement learning (MORL) aims to optimize policies in the presence of conflicting objectives, where linear scalarization is commonly used to reduce vector-valued returns into scalar signals. While effective for certain preferences, this approach cannot capture fairness-oriented goals such as Nash social welfare or max-min fairness, which require nonlinear and non-additive trad…
▽ More
Multi-objective reinforcement learning (MORL) aims to optimize policies in the presence of conflicting objectives, where linear scalarization is commonly used to reduce vector-valued returns into scalar signals. While effective for certain preferences, this approach cannot capture fairness-oriented goals such as Nash social welfare or max-min fairness, which require nonlinear and non-additive trade-offs. Although several online algorithms have been proposed for specific fairness objectives, a unified approach for optimizing nonlinear welfare criteria in the offline setting-where learning must proceed from a fixed dataset-remains unexplored. In this work, we present FairDICE, the first offline MORL framework that directly optimizes nonlinear welfare objective. FairDICE leverages distribution correction estimation to jointly account for welfare maximization and distributional regularization, enabling stable and sample-efficient learning without requiring explicit preference weights or exhaustive weight search. Across multiple offline benchmarks, FairDICE demonstrates strong fairness-aware performance compared to existing baselines.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Integrable deformations of cluster maps of type $D_{2N}$
Authors:
Wookyung Kim
Abstract:
In this paper, we extend one of the main results in \cite{hkm24}, of a deformed type $D_{4}$ map, to the general case of the type $D_{2N}$ for $N\geq3$. This can be achieved through a "local expansion" operation, introduced in the joint work \cite{grab} with Grabowski and Hone. This operation involves inserting a specific subquiver into the quiver arising from the Laurentification of the deformed…
▽ More
In this paper, we extend one of the main results in \cite{hkm24}, of a deformed type $D_{4}$ map, to the general case of the type $D_{2N}$ for $N\geq3$. This can be achieved through a "local expansion" operation, introduced in the joint work \cite{grab} with Grabowski and Hone. This operation involves inserting a specific subquiver into the quiver arising from the Laurentification of the deformed type $D_{4}$ map. This insertion yields a new quiver, obtained through the Laurentification of the deformed type $D_{6}$ map and thus enables systematic generalization to higher ranks $D_{2N}$. We further considered the degree growth of the deformed type $D_{2N}$ map via the tropical method and conjecture that for each $N$, the deformed map is an integrable map by applying an algebraic entropy test, the criterion for detecting integrability of the dynamical system.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Faint absorption of the ground state hyperfine-splitting transitions of hydroxyl at 18 cm in the Galactic Disk
Authors:
M. R. Rugel,
H. Beuther,
J. D. Soler,
P. Goldsmith,
L. Anderson,
A. Hafner,
J. R. Dawson,
Y. Wang,
S. Bihr,
H. Wiesemeyer,
R. Guesten,
M. -Y. Lee,
D. Riquelme,
A. M. Jacob,
W. -J. Kim,
M. Busch,
S. Khan,
A. Brunthaler
Abstract:
The interstellar hydride hydroxyl (OH) is a potential tracer of CO-dark molecular gas. We present new absorption line observations of OH at 18-cm wavelength towards four continuum sources. We compare these to the [CII] line at 1.9 THz obtained with SOFIA, observations of the neutral atomic hydrogen 21 cm line with the VLA, and CO lines obtained with APEX. We trace OH over a large range of molecula…
▽ More
The interstellar hydride hydroxyl (OH) is a potential tracer of CO-dark molecular gas. We present new absorption line observations of OH at 18-cm wavelength towards four continuum sources. We compare these to the [CII] line at 1.9 THz obtained with SOFIA, observations of the neutral atomic hydrogen 21 cm line with the VLA, and CO lines obtained with APEX. We trace OH over a large range of molecular hydrogen column densities, and derive OH abundances with respect to molecular and total hydrogen column densities. Increased sensitivity and spectral resolution allowed us to detect weak and narrow features. We identify only one OH absorption component out of 23 without CO counterpart, yet several with intermediate molecular gas fractions. A potential association of [CII] 158 mu m emission with an OH absorption component is seen toward one sightline. Our results confirm that OH absorption traces molecular gas across diffuse and dense environments of the interstellar medium. At the sensitivity limits of the present observations our detection of only one CO-dark molecular gas feature appears in agreement with previous studies. We conclude that if OH absorption was to be used as a CO-dark molecular gas tracer, deeper observations or stronger background targets are necessary to unveil its full potential as a CO-dark molecular gas tracer, and yet it will never be an exclusive tracer of CO-dark molecular gas. For OH hyperfine-splitting transitions in the vicinity of photodissociation regions in W43-South, we detect a spectral and spatial offset between the peak of the inversion of the OH 1612 MHz line and the absorption of the OH 1720 MHz line on the one hand, and the absorption of the OH main lines on the other hand, which provides additional constraints on the interpretation of the OH 18 cm line signatures typical of HII regions.
△ Less
Submitted 9 June, 2025; v1 submitted 6 June, 2025;
originally announced June 2025.
-
ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding
Authors:
Ankit Pal,
Jung-Oh Lee,
Xiaoman Zhang,
Malaikannan Sankarasubbu,
Seunghyeon Roh,
Won Jung Kim,
Meesun Lee,
Pranav Rajpurkar
Abstract:
We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core r…
▽ More
We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core radiological reasoning skills: presence assessment, location analysis, negation detection, differential diagnosis, and geometric reasoning. We evaluate eight state-of-the-art multimodal large language models, including MedGemma-4B-it, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B. The best-performing model (MedGemma) achieves 83.24% overall accuracy. To bridge the gap between AI performance and clinical expertise, we conducted a comprehensive human reader study involving 3 radiology residents on 200 randomly sampled cases. Our evaluation demonstrates that MedGemma achieved superior performance (83.84% accuracy) compared to human readers (best radiology resident: 77.27%), representing a significant milestone where AI performance exceeds expert human evaluation on chest X-ray interpretation. The reader study reveals distinct performance patterns between AI models and human experts, with strong inter-reader agreement among radiologists while showing more variable agreement patterns between human readers and AI models. ReXVQA establishes a new standard for evaluating generalist radiological AI systems, offering public leaderboards, fine-grained evaluation splits, structured explanations, and category-level breakdowns. This benchmark lays the foundation for next-generation AI systems capable of mimicking expert-level clinical reasoning beyond narrow pathology classification. Our dataset will be open-sourced at https://huggingface.co/datasets/rajpurkarlab/ReXVQA
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Enhancing Safety of Foundation Models for Visual Navigation through Collision Avoidance via Repulsive Estimation
Authors:
Joonkyung Kim,
Joonyeol Sim,
Woojun Kim,
Katia Sycara,
Changjoo Nam
Abstract:
We propose CARE (Collision Avoidance via Repulsive Estimation), a plug-and-play module that enhances the safety of vision-based navigation without requiring additional range sensors or fine-tuning of pretrained models. While recent foundation models using only RGB inputs have shown strong performance, they often fail to generalize in out-of-distribution (OOD) environments with unseen objects or va…
▽ More
We propose CARE (Collision Avoidance via Repulsive Estimation), a plug-and-play module that enhances the safety of vision-based navigation without requiring additional range sensors or fine-tuning of pretrained models. While recent foundation models using only RGB inputs have shown strong performance, they often fail to generalize in out-of-distribution (OOD) environments with unseen objects or variations in camera parameters (e.g., field of view, pose, or focal length). Without fine-tuning, these models may generate unsafe trajectories that lead to collisions, requiring costly data collection and retraining. CARE addresses this limitation by seamlessly integrating with any RGB-based navigation system that outputs local trajectories, dynamically adjusting them using repulsive force vectors derived from monocular depth maps. We evaluate CARE by combining it with state-of-the-art vision-based navigation models across multiple robot platforms. CARE consistently reduces collision rates (up to 100%) without sacrificing goal-reaching performance and improves collision-free travel distance by up to 10.7x in exploration tasks.
△ Less
Submitted 10 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
Removal of Lunar Dust Simulant from Cold Dielectric Surfaces with Electron Beam
Authors:
Hsin-yi Hao,
Wousik Kim,
David S. Shelton,
Benjamin Farr,
Xu Wang,
Inseob Hahn
Abstract:
It has been demonstrated that lunar dust simulant can be efficiently lofted and removed from various room temperature surfaces in vacuum when exposed to a low-energy electron beam. This provides a potential solution to the well-known dust risks associated with future lunar exploration. Considering its application in extremely cold regions on the Moon, we experimentally demonstrated dust lofting fr…
▽ More
It has been demonstrated that lunar dust simulant can be efficiently lofted and removed from various room temperature surfaces in vacuum when exposed to a low-energy electron beam. This provides a potential solution to the well-known dust risks associated with future lunar exploration. Considering its application in extremely cold regions on the Moon, we experimentally demonstrated dust lofting from surfaces at temperatures as low as -123degC using an electron beam. Compared to room temperature applications, we found that the dust lofting from a glass surface slows down significantly at lower temperatures. Possible reasons are discussed. We also found that the dust lofting process can be accelerated when the electron beam energy is swept within an optimal range and rate.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
From Chat Logs to Collective Insights: Aggregative Question Answering
Authors:
Wentao Zhang,
Woojeong Kim,
Yuntian Deng
Abstract:
Conversational agents powered by large language models (LLMs) are rapidly becoming integral to our daily interactions, generating unprecedented amounts of conversational data. Such datasets offer a powerful lens into societal interests, trending topics, and collective concerns. Yet, existing approaches typically treat these interactions as independent and miss critical insights that could emerge f…
▽ More
Conversational agents powered by large language models (LLMs) are rapidly becoming integral to our daily interactions, generating unprecedented amounts of conversational data. Such datasets offer a powerful lens into societal interests, trending topics, and collective concerns. Yet, existing approaches typically treat these interactions as independent and miss critical insights that could emerge from aggregating and reasoning across large-scale conversation logs. In this paper, we introduce Aggregative Question Answering, a novel task requiring models to reason explicitly over thousands of user-chatbot interactions to answer aggregative queries, such as identifying emerging concerns among specific demographics. To enable research in this direction, we construct a benchmark, WildChat-AQA, comprising 6,027 aggregative questions derived from 182,330 real-world chatbot conversations. Experiments show that existing methods either struggle to reason effectively or incur prohibitive computational costs, underscoring the need for new approaches capable of extracting collective insights from large-scale conversational data.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Pose-free 3D Gaussian splatting via shape-ray estimation
Authors:
Youngju Na,
Taeyeon Kim,
Jumin Lee,
Kyu Beom Han,
Woo Jae Kim,
Sung-eui Yoon
Abstract:
While generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcome…
▽ More
While generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcomes these ambiguities by joint shape and camera rays estimation. Instead of relying on explicit 3D transformations, SHARE builds a pose-aware canonical volume representation that seamlessly integrates multi-view information, reducing misalignment caused by inaccurate pose estimates. Additionally, anchor-aligned Gaussian prediction enhances scene reconstruction by refining local geometry around coarse anchors, allowing for more precise Gaussian placement. Extensive experiments on diverse real-world datasets show that our method achieves robust performance in pose-free generalizable Gaussian splatting.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Dual-Polarization SHG Interferometry for Imaging Antiparallel Domains and Stacking Angles of 2D Heterocrystals
Authors:
Juseung Oh,
Wontaek Kim,
Gyouil Jeong,
Yeri Lee,
Jihun Kim,
Hyeongjoon Kim,
Hyeon Suk Shin,
Sunmin Ryu
Abstract:
Optical second-harmonic generation (SHG) enables orientational polarimetry for crystallographic analysis and domain imaging of various materials. However, conventional intensity polarimetry, which neglects phase information, fails to resolve antiparallel domains and to describe two-dimensional heterostructures, which represent a new class of van der Waals-bound composite crystals. In this work, we…
▽ More
Optical second-harmonic generation (SHG) enables orientational polarimetry for crystallographic analysis and domain imaging of various materials. However, conventional intensity polarimetry, which neglects phase information, fails to resolve antiparallel domains and to describe two-dimensional heterostructures, which represent a new class of van der Waals-bound composite crystals. In this work, we report dual-polarization spectral phase interferometry (DP-SPI) and establish a generalized SHG superposition model that incorporates the observables of DP-SPI. Antiparallel domains of monolayer transition metal dichalcogenides (TMDs) were successfully imaged with distinction, validating the interferometric polarimetry. From DP interferograms of TMD heterobilayers, the orientation of each layer could be determined, enabling layer-resolved probing. By employing the superposition model, we also demonstrate the photonic design and fabrication of ternary TMD heterostructures for circularly polarized SHG. These methods, providing comprehensive SHG measurements and theoretical description, can be extended to heterostructures consisting of more than two constituent layers and are not limited to TMDs or 2D materials.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images
Authors:
George R. Nahass,
Zhu Wang,
Homa Rashidisabet,
Won Hwa Kim,
Sasha Hubschman,
Jeffrey C. Peterson,
Ghasem Yazdanpanah,
Chad A. Purnell,
Pete Setabutr,
Ann Q. Tran,
Darvin Yi,
Sathya N. Ravi
Abstract:
Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes…
▽ More
Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
On properness of moduli stacks of $D^{\times}$-shtukas over ramified legs
Authors:
Yong-Gyu Choi,
Wansu Kim,
Junyeong Park
Abstract:
Given a maximal order $D$ of a central division algebra over a global function field $F$, we prove an explicit sufficient condition for moduli stacks of $D^\times$-shtukas to be proper over a finite field in terms of the local invariants of $D$ and bounds. Our proof is a refinement of E.~Lau's result (Duke Math. J. 140 (2007)), which showed the properness of the leg morphism (or characteristic mor…
▽ More
Given a maximal order $D$ of a central division algebra over a global function field $F$, we prove an explicit sufficient condition for moduli stacks of $D^\times$-shtukas to be proper over a finite field in terms of the local invariants of $D$ and bounds. Our proof is a refinement of E.~Lau's result (Duke Math. J. 140 (2007)), which showed the properness of the leg morphism (or characteristic morphism) away from the ramification locus of $D$. We also establish non-emptiness of Newton and Kottwitz--Rapoport strata for moduli stacks of $B^\times$-shtukas, where $B$ is a maximal order of a central simple algebra over $F$.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
Authors:
Bryan Wong,
Jong Woo Kim,
Huazhu Fu,
Mun Yong Yi
Abstract:
Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole slide images (WSIs). A key trend involves leveraging multi-scale information to better represent hierarchical tissue structures. However, existing methods often face two key limitations: (1) insufficient modelin…
▽ More
Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole slide images (WSIs). A key trend involves leveraging multi-scale information to better represent hierarchical tissue structures. However, existing methods often face two key limitations: (1) insufficient modeling of interactions within the same modalities across scales (e.g., 5x and 20x) and (2) inadequate alignment between visual and textual modalities on the same scale. To address these gaps, we propose HiVE-MIL, a hierarchical vision-language framework that constructs a unified graph consisting of (1) parent-child links between coarse (5x) and fine (20x) visual/textual nodes to capture hierarchical relationships, and (2) heterogeneous intra-scale edges linking visual and textual nodes on the same scale. To further enhance semantic consistency, HiVE-MIL incorporates a two-stage, text-guided dynamic filtering mechanism that removes weakly correlated patch-text pairs, and introduces a hierarchical contrastive loss to align textual semantics across scales. Extensive experiments on TCGA breast, lung, and kidney cancer datasets demonstrate that HiVE-MIL consistently outperforms both traditional MIL and recent VLM-based MIL approaches, achieving gains of up to 4.1% in macro F1 under 16-shot settings. Our results demonstrate the value of jointly modeling hierarchical structure and multimodal alignment for efficient and scalable learning from limited pathology data. The code is available at https://github.com/bryanwong17/HiVE-MIL
△ Less
Submitted 27 May, 2025; v1 submitted 23 May, 2025;
originally announced May 2025.
-
Rapid adiabatic couplers with arbitrary split ratios for broadband DWDM interleaver application
Authors:
Daehan Choi,
Woo-Joo Kim,
Young-Ik Sohn
Abstract:
We experimentally demonstrate a compact and broadband rapid adiabatic couplers (RACs) with arbitrary power split ratios, achieved through the combination of translational offset and waveguide width control. Fabricated RACs of four different target split ratios show power splitting within $\pm$3% of the design target over a 160 nm wavelength range. Using these RACs, we implement an 8-channel dense…
▽ More
We experimentally demonstrate a compact and broadband rapid adiabatic couplers (RACs) with arbitrary power split ratios, achieved through the combination of translational offset and waveguide width control. Fabricated RACs of four different target split ratios show power splitting within $\pm$3% of the design target over a 160 nm wavelength range. Using these RACs, we implement an 8-channel dense wavelength division multiplexing (DWDM) interleaver exhibiting < -20 dB crosstalk for the center 8 channels with flat-top passbands. Over a broader wavelength range, the design maintains crosstalk below -10 dB across more than 40 channels with 100 GHz spacing, demonstrating the broadband capability and scalability of RAC-based photonic integrated circuits.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
OGHReS: Star formation in the Outer Galaxy II ($\ell = 180^\circ$-$280^\circ$)
Authors:
J. S. Urquhart,
C. Koenig,
D. Colombo,
A. Karska,
A. Giannetti,
T. J. T. Moore,
A. Y. Yang,
F. Wyrowski,
Y. Sun,
Z. Jiang,
K. R. Neralwar,
D. Eden,
I. Grozdanova,
S. Neupane,
M. Figueira,
E. Dann,
V.,
S. Veena,
W. -J. Kim,
S. Leurini,
J. Brand,
M. -Y. Lee
Abstract:
The Outer Galaxy High-Resolution Survey (OGHReS) covers 100 square degrees ($180^\circ < \ell < 280^\circ$) in the (2--1) transitions of three CO-isotopologues. We use the spectra to refine the velocities and physical properties to 6706 \higal\ clumps located in the OGHReS region. In a previous paper, we analysed 3584 clumps between $\ell = 250^\circ$ and $280^\circ$. Here, we cover a further 3122…
▽ More
The Outer Galaxy High-Resolution Survey (OGHReS) covers 100 square degrees ($180^\circ < \ell < 280^\circ$) in the (2--1) transitions of three CO-isotopologues. We use the spectra to refine the velocities and physical properties to 6706 \higal\ clumps located in the OGHReS region. In a previous paper, we analysed 3584 clumps between $\ell = 250^\circ$ and $280^\circ$. Here, we cover a further 3122 clumps ($180^\circ < \ell < 250^\circ$) and determine reliable velocities for \withVLSR\ of these, finding good agreement with the previously assigned velocities ($\sim$80 percent within 5 \kms). We update velocities for 288 clumps and provide new values for an additional 411. Combining these with the previous results, we have velocities and physical properties for 6193 clumps (92.3 percent). The \allnonDetections\ non-detections are low surface density clumps or likely contamination by evolved stars and galaxies. Key findings: i) improved correlation between clumps and spiral arm loci, and the discovery of clumps beyond the outer arm supports the existence of a new spiral structure; ii) decreasing trend in the $L/M$-ratio consistent with less high-mass star formation in the outer Galaxy; iii) increase in the star formation fraction (SFF) in the outer Galaxy, suggesting that more clumps are forming stars despite their lower mass; iv) discrepancies in velocity assignments across different surveys that could affect $\sim$10000 clumps, especially in the fourth quadrant.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Boosting Off-Road Segmentation via Photometric Distortion and Exponential Moving Average
Authors:
Wonjune Kim,
Lae-kyoung Lee,
Su-Yong An
Abstract:
We report on the application of a high-capacity semantic segmentation pipeline to the GOOSE 2D Semantic Segmentation Challenge for unstructured off-road environments. Using a FlashInternImage-B backbone together with a UPerNet decoder, we adapt established techniques, rather than designing new ones, to the distinctive conditions of off-road scenes. Our training recipe couples strong photometric di…
▽ More
We report on the application of a high-capacity semantic segmentation pipeline to the GOOSE 2D Semantic Segmentation Challenge for unstructured off-road environments. Using a FlashInternImage-B backbone together with a UPerNet decoder, we adapt established techniques, rather than designing new ones, to the distinctive conditions of off-road scenes. Our training recipe couples strong photometric distortion augmentation (to emulate the wide lighting variations of outdoor terrain) with an Exponential Moving Average (EMA) of weights for better generalization. Using only the GOOSE training dataset, we achieve 88.8\% mIoU on the validation set.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
GLOVA: Global and Local Variation-Aware Analog Circuit Design with Risk-Sensitive Reinforcement Learning
Authors:
Dongjun Kim,
Junwoo Park,
Chaehyeon Shin,
Jaeheon Jung,
Kyungho Shin,
Seungheon Baek,
Sanghyuk Heo,
Woongrae Kim,
Inchul Jeong,
Joohwan Cho,
Jongsun Park
Abstract:
Analog/mixed-signal circuit design encounters significant challenges due to performance degradation from process, voltage, and temperature (PVT) variations. To achieve commercial-grade reliability, iterative manual design revisions and extensive statistical simulations are required. While several studies have aimed to automate variation aware analog design to reduce time-to-market, the substantial…
▽ More
Analog/mixed-signal circuit design encounters significant challenges due to performance degradation from process, voltage, and temperature (PVT) variations. To achieve commercial-grade reliability, iterative manual design revisions and extensive statistical simulations are required. While several studies have aimed to automate variation aware analog design to reduce time-to-market, the substantial mismatches in real-world wafers have not been thoroughly addressed. In this paper, we present GLOVA, an analog circuit sizing framework that effectively manages the impact of diverse random mismatches to improve robustness against PVT variations. In the proposed approach, risk-sensitive reinforcement learning is leveraged to account for the reliability bound affected by PVT variations, and ensemble-based critic is introduced to achieve sample-efficient learning. For design verification, we also propose $μ$-$σ$ evaluation and simulation reordering method to reduce simulation costs of identifying failed designs. GLOVA supports verification through industrial-level PVT variation evaluation methods, including corner simulation as well as global and local Monte Carlo (MC) simulations. Compared to previous state-of-the-art variation-aware analog sizing frameworks, GLOVA achieves up to 80.5$\times$ improvement in sample efficiency and 76.0$\times$ reduction in time.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning
Authors:
Ji Woong Kim,
Juo-Tung Chen,
Pascal Hansen,
Lucy X. Shi,
Antony Goldenberg,
Samuel Schmidgall,
Paul Maria Scheikl,
Anton Deguet,
Brandon M. White,
De Ru Tsai,
Richard Cha,
Jeffrey Jopling,
Chelsea Finn,
Axel Krieger
Abstract:
Research on autonomous surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications demand dexterous manipulation over extended durations and robust generalization to the inherent variability of human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To addre…
▽ More
Research on autonomous surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications demand dexterous manipulation over extended durations and robust generalization to the inherent variability of human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To address this gap, we propose a hierarchical framework for performing dexterous, long-horizon surgical steps. Our approach utilizes a high-level policy for task planning and a low-level policy for generating low-level trajectories. The high-level planner plans in language space, generating task or corrective instructions to guide the robot through the long-horizon steps and correct for the low-level policy's errors. We validate our framework through ex vivo experiments on cholecystectomy, a commonly-practiced minimally invasive procedure, and conduct ablation studies to evaluate key components of the system. Our method achieves a 100% success rate across n=8 different ex vivo gallbladders, operating fully autonomously without human intervention. The hierarchical approach improves the policy's ability to recover from suboptimal states that are inevitable in the highly dynamic environment of realistic surgical applications. This work demonstrates step-level autonomy in a surgical procedure, marking a milestone toward clinical deployment of autonomous surgical systems.
△ Less
Submitted 17 June, 2025; v1 submitted 15 May, 2025;
originally announced May 2025.
-
Real-Time Person Image Synthesis Using a Flow Matching Model
Authors:
Jiwoo Jeong,
Kirok Kim,
Wooju Kim,
Nam-Joon Kim
Abstract:
Pose-Guided Person Image Synthesis (PGPIS) generates realistic person images conditioned on a target pose and a source image. This task plays a key role in various real-world applications, such as sign language video generation, AR/VR, gaming, and live streaming. In these scenarios, real-time PGPIS is critical for providing immediate visual feedback and maintaining user immersion.However, achievin…
▽ More
Pose-Guided Person Image Synthesis (PGPIS) generates realistic person images conditioned on a target pose and a source image. This task plays a key role in various real-world applications, such as sign language video generation, AR/VR, gaming, and live streaming. In these scenarios, real-time PGPIS is critical for providing immediate visual feedback and maintaining user immersion.However, achieving real-time performance remains a significant challenge due to the complexity of synthesizing high-fidelity images from diverse and dynamic human poses. Recent diffusion-based methods have shown impressive image quality in PGPIS, but their slow sampling speeds hinder deployment in time-sensitive applications. This latency is particularly problematic in tasks like generating sign language videos during live broadcasts, where rapid image updates are required. Therefore, developing a fast and reliable PGPIS model is a crucial step toward enabling real-time interactive systems. To address this challenge, we propose a generative model based on flow matching (FM). Our approach enables faster, more stable, and more efficient training and sampling. Furthermore, the proposed model supports conditional generation and can operate in latent space, making it especially suitable for real-time PGPIS applications where both speed and quality are critical. We evaluate our proposed method, Real-Time Person Image Synthesis Using a Flow Matching Model (RPFM), on the widely used DeepFashion dataset for PGPIS tasks. Our results show that RPFM achieves near-real-time sampling speeds while maintaining performance comparable to the state-of-the-art models. Our methodology trades off a slight, acceptable decrease in generated-image accuracy for over a twofold increase in generation speed, thereby ensuring real-time performance.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Competitive Adsorption in Polymer Nanocomposites: The Molecular Weight and End-Group Effect Revealed by SANS and MD Simulations
Authors:
Tae Yeon Kong,
WooJin Kim,
YongJoo Kim,
So Youn Kim
Abstract:
Understanding polymer adsorption at interfaces is essential for designing advanced polymer-based nanomaterials with tailored interfacial properties. Although adsorption significantly influences the macroscopic properties of polymer composites and thin films, a comprehensive understanding of molecular weight (MW)-dependent adsorption remains challenging and controversial, particularly in polydisper…
▽ More
Understanding polymer adsorption at interfaces is essential for designing advanced polymer-based nanomaterials with tailored interfacial properties. Although adsorption significantly influences the macroscopic properties of polymer composites and thin films, a comprehensive understanding of molecular weight (MW)-dependent adsorption remains challenging and controversial, particularly in polydisperse polymer systems, due to the limitations of experimental approaches. We investigate competitive adsorption in bidisperse poly(ethylene glycol) (PEG) melts and find that shorter chains preferentially adsorb onto nanoparticle surfaces. Experiments and molecular dynamics simulations reveal that the high density of terminal hydroxyl groups in short PEG chains strengthens hydrogen bonding at the interface, driving enthalpy-driven adsorption despite identical polymer backbones. This leads to a densely packed interfacial layer that alters the conformation of longer chains. These findings highlight the critical role of end-group functionality in interfacial polymer behavior and provide new insights for tailoring nanocomposite properties.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Design, analysis, and experimental validation of a stepped plate parametric array loudspeaker
Authors:
Woongji Kim,
Beomseok Oh,
Chayeong Kim,
Wonkyu Moon
Abstract:
This study investigates the design and analysis of a stepped plate parametric array loudspeaker (SPPAL) as an alternative to conventional array-based parametric loudspeakers. The SPPAL utilizes a single Langevin-type ultrasonic transducer coupled with a flexural stepped plate to generate narrow-beam audible sound via nonlinear acoustic interaction. To evaluate and optimize the performance of the S…
▽ More
This study investigates the design and analysis of a stepped plate parametric array loudspeaker (SPPAL) as an alternative to conventional array-based parametric loudspeakers. The SPPAL utilizes a single Langevin-type ultrasonic transducer coupled with a flexural stepped plate to generate narrow-beam audible sound via nonlinear acoustic interaction. To evaluate and optimize the performance of the SPPAL, an integrated modeling framework is developed, consisting of an approximate analytical 3D model for transducer dynamics, an equivalence ratio formulation to relate stepped plate and rigid piston behavior, and a spherical wave expansion method for nonlinear sound field simulation. The dual-resonance behavior of the transducer is optimized through multi-objective analysis to enhance low-frequency audio performance. Experimental validation includes frequency response and modal analysis of the transducer, as well as sound field measurements. The analytical methods are further verified through comparison with experimental data. Furthermore, combination resonance--an unintended structural excitation resulting from intermodulation--is identified as an inherent phenomenon in SPPAL operation. The findings offer practical guidance for the development of efficient, compact, and manufacturable parametric array loudspeakers employing plate-based flexural vibration.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Measurement of single- and double-polarization observables in the photoproduction of $π^+π^-$~meson pairs off the proton using CLAS at Jefferson Laboratory
Authors:
P. Roy,
S. Cao,
V. Crede,
E. Klempt,
V. A. Nikonov,
A. V. Sarantsev,
V. D. Burkert,
V. Mokeev,
P. Achenbach,
J. S. Alvarado,
W. R. Armstrong,
H. Atac,
H. Avakian,
N. A. Baltzell,
L. Barion,
M. Bashkanov,
M. Battaglieri,
F. Benmokhtar,
A. Bianconi,
A. S. Biselli,
M. Bondi,
F. Bossu,
S. Boiarinov,
K. -T. Brinkmann,
W. J. Briscoe
, et al. (119 additional authors not shown)
Abstract:
The photoproduction of $π^+π^-$ meson pairs off the proton has been studied in the reaction $γp\to p\,π^+π^-$ using the CEBAF Large Acceptance Spectrometer (CLAS) and the frozen-spin target (FROST) in Hall B at the Thomas Jefferson National Accelerator Facility. For the first time, the beam and target asymmetries, $I^{s,c}$ and $P_{x,y}$, have been measured along with the beam-target double-polari…
▽ More
The photoproduction of $π^+π^-$ meson pairs off the proton has been studied in the reaction $γp\to p\,π^+π^-$ using the CEBAF Large Acceptance Spectrometer (CLAS) and the frozen-spin target (FROST) in Hall B at the Thomas Jefferson National Accelerator Facility. For the first time, the beam and target asymmetries, $I^{s,c}$ and $P_{x,y}$, have been measured along with the beam-target double-polarization observables, $P^{s,c}_{x,y}$, using a transversely polarized target with center-of-mass energies ranging from 1.51 GeV up to 2.04 GeV. These data and additional $ππ$ photoproduction observables from CLAS and experiments elsewhere were included in a partial-wave analysis within the Bonn-Gatchina framework. Significant contributions from $s$-channel resonance production are observed in addition to $t$-channel exchange processes. The data indicate significant contributions from $N^\ast$ and $Δ^\ast$ resonances in the third and fourth resonance regions.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Dynamic Time-aware Continual User Representation Learning
Authors:
Seungyoon Choi,
Sein Kim,
Hongseok Kang,
Wonjoong Kim,
Chanyoung Park
Abstract:
Traditional user modeling (UM) approaches have primarily focused on designing models for a single specific task, but they face limitations in generalization and adaptability across various tasks. Recognizing these challenges, recent studies have shifted towards continual learning (CL)-based universal user representation learning aiming to develop a single model capable of handling multiple tasks.…
▽ More
Traditional user modeling (UM) approaches have primarily focused on designing models for a single specific task, but they face limitations in generalization and adaptability across various tasks. Recognizing these challenges, recent studies have shifted towards continual learning (CL)-based universal user representation learning aiming to develop a single model capable of handling multiple tasks. Despite advancements, existing methods are in fact evaluated under an unrealistic scenario that does not consider the passage of time as tasks progress, which overlooks newly emerged items that may change the item distribution of previous tasks. In this paper, we introduce a practical evaluation scenario on which CL-based universal user representation learning approaches should be evaluated, which takes into account the passage of time as tasks progress. Then, we propose a novel framework Dynamic Time-aware continual user representation learner, named DITTO, designed to alleviate catastrophic forgetting despite continuous shifts in item distribution, while also allowing the knowledge acquired from previous tasks to adapt to the current shifted item distribution. Through our extensive experiments, we demonstrate the superiority of DITTO over state-of-the-art methods under a practical evaluation scenario. Our source code is available at https://github.com/seungyoon-Choi/DITTO_official.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Authors:
Inho Kim,
Youngkil Song,
Jicheol Park,
Won Hwa Kim,
Suha Kwak
Abstract:
Sound source localization (SSL) is the task of locating the source of sound within an image. Due to the lack of localization labels, the de facto standard in SSL has been to represent an image and audio as a single embedding vector each, and use them to learn SSL via contrastive learning. To this end, previous work samples one of local image features as the image embedding and aggregates all local…
▽ More
Sound source localization (SSL) is the task of locating the source of sound within an image. Due to the lack of localization labels, the de facto standard in SSL has been to represent an image and audio as a single embedding vector each, and use them to learn SSL via contrastive learning. To this end, previous work samples one of local image features as the image embedding and aggregates all local audio features to obtain the audio embedding, which is far from optimal due to the presence of noise and background irrelevant to the actual target in the input. We present a novel SSL method that addresses this chronic issue by joint slot attention on image and audio. To be specific, two slots competitively attend image and audio features to decompose them into target and off-target representations, and only target representations of image and audio are used for contrastive learning. Also, we introduce cross-modal attention matching to further align local features of image and audio. Our method achieved the best in almost all settings on three public benchmarks for SSL, and substantially outperformed all the prior work in cross-modal retrieval.
△ Less
Submitted 11 May, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Using Multiple Outcomes to Adjust Standard Errors for Spatial Correlation
Authors:
Stefano DellaVigna,
Guido Imbens,
Woojin Kim,
David M. Ritzwoller
Abstract:
Empirical research in economics often examines the behavior of agents located in a geographic space. In such cases, statistical inference is complicated by the interdependence of economic outcomes across locations. A common approach to account for this dependence is to cluster standard errors based on a predefined geographic partition. A second strategy is to model dependence in terms of the dista…
▽ More
Empirical research in economics often examines the behavior of agents located in a geographic space. In such cases, statistical inference is complicated by the interdependence of economic outcomes across locations. A common approach to account for this dependence is to cluster standard errors based on a predefined geographic partition. A second strategy is to model dependence in terms of the distance between units. Dependence, however, does not necessarily stop at borders and is typically not determined by distance alone. This paper introduces a method that leverages observations of multiple outcomes to adjust standard errors for cross-sectional dependence. Specifically, a researcher, while interested in a particular outcome variable, often observes dozens of other variables for the same units. We show that these outcomes can be used to estimate dependence under the assumption that the cross-sectional correlation structure is shared across outcomes. We develop a procedure, which we call Thresholding Multiple Outcomes (TMO), that uses this estimate to adjust standard errors in a given regression setting. We show that adjustments of this form can lead to sizable reductions in the bias of standard errors in calibrated U.S. county-level regressions. Re-analyzing nine recent papers, we find that the proposed correction can make a substantial difference in practice.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Current-driven dynamics of antiferromagnetic domain-wall skyrmions
Authors:
Wooyon Kim,
Jun Seok Seo,
Se Kwon Kim
Abstract:
Domain-wall skyrmions are magnetic solitons embedded in a domain wall that are topologically equivalent to skyrmions. Here, we theoretically study antiferromagnetic domain-wall skyrmions and their current-driven motion within the Landau-Lifshitz-Gilbert phenomenology, and verify our findings with micromagnetic simulations. While the skyrmion Hall effect is expected to be suppressed in the current-…
▽ More
Domain-wall skyrmions are magnetic solitons embedded in a domain wall that are topologically equivalent to skyrmions. Here, we theoretically study antiferromagnetic domain-wall skyrmions and their current-driven motion within the Landau-Lifshitz-Gilbert phenomenology, and verify our findings with micromagnetic simulations. While the skyrmion Hall effect is expected to be suppressed in the current-induced motion of antiferromagnetic domain-wall skyrmions, we observe a finite Hall angle, which originates from the anisotropic spin configuration of domain-wall skyrmions. The skyrmion Hall effect is, however, conditionally suppressed and the motion aligns with the current applied in certain directions, which can be interpreted as principal axes of a domain-wall skyrmion that is easily identified from the symmetry of the spin configuration. Our work on antiferromagnetic domain-wall skyrmions shows that the dynamics of spin textures endowed with multiple soliton characteristics can be unconventional, which is envisaged to enrich the field of topological solitons.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
An accurate measurement of parametric array using a spurious sound filter topologically equivalent to a half-wavelength resonator
Authors:
Woongji Kim,
Beomseok Oh,
Junsuk Rho,
Wonkyu Moon
Abstract:
Parametric arrays (PA) offer exceptional directivity and compactness compared to conventional loudspeakers, facilitating various acoustic applications. However, accurate measurement of audio signals generated by PA remains challenging due to spurious ultrasonic sounds arising from microphone nonlinearities. Existing filtering methods, including Helmholtz resonators, phononic crystals, polymer film…
▽ More
Parametric arrays (PA) offer exceptional directivity and compactness compared to conventional loudspeakers, facilitating various acoustic applications. However, accurate measurement of audio signals generated by PA remains challenging due to spurious ultrasonic sounds arising from microphone nonlinearities. Existing filtering methods, including Helmholtz resonators, phononic crystals, polymer films, and grazing incidence techniques, exhibit practical constraints such as size limitations, fabrication complexity, or insufficient attenuation. To address these issues, we propose and demonstrate a novel acoustic filter based on the design of a half-wavelength resonator. The developed filter exploits the nodal plane in acoustic pressure distribution, effectively minimizing microphone exposure to targeted ultrasonic frequencies. Fabrication via stereolithography (SLA) 3D printing ensures high dimensional accuracy, which is crucial for high-frequency acoustic filters. Finite element method (FEM) simulations guided filter optimization for suppression frequencies at 40 kHz and 60 kHz, achieving high transmission loss (TL) around 60 dB. Experimental validations confirm the filter's superior performance in significantly reducing spurious acoustic signals, as reflected in frequency response, beam pattern, and propagation curve measurements. The proposed filter ensures stable and precise acoustic characterization, independent of measurement distances and incidence angles. This new approach not only improves measurement accuracy but also enhances reliability and reproducibility in parametric array research and development.
△ Less
Submitted 2 July, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Coherent EUV scatterometry of 2D periodic structure profiles with mathematically optimal experimental design
Authors:
Clay Klein,
Nicholas W. Jenkins,
Yunzhe Shao,
Yunhao Li,
Seungbeom Park,
Wookrae Kim,
Henry C. Kapteyn,
Margaret M. Murnane
Abstract:
Extreme ultraviolet (EUV) scatterometry is an increasingly important metrology that can measure critical parameters of periodic nanostructured materials in a fast, accurate, and repeatable manner and with high sensitivity to nanoscale structure and material composition. Because of this, EUV scatterometry could support manufacturing of semiconductor devices or polymer metamaterials, addressing the…
▽ More
Extreme ultraviolet (EUV) scatterometry is an increasingly important metrology that can measure critical parameters of periodic nanostructured materials in a fast, accurate, and repeatable manner and with high sensitivity to nanoscale structure and material composition. Because of this, EUV scatterometry could support manufacturing of semiconductor devices or polymer metamaterials, addressing the limitations of traditional imaging methods such as resolution and field of view, sample damage, throughput, or low sensitivity. Here we use EUV scatterometry to measure the profile of an industrially relevant 2D periodic interconnect structure, using $λ= 29$ nm light from a table-top high harmonic generation source. We show that EUV scatterometry is sensitive to out-of-plane features with single-nanometer sensitivity. Furthermore, we also apply a methodology based on the Fisher information matrix to optimize experimental design parameters, such as incidence angles and wavelength, to show how measurement sensitivity can be maximized. This methodology reveals the strong dependence of measurement sensitivity on both incidence angle and wavelength $-$ even in a simple two-parameter case. Through a simultaneous optimization of incidence angles and wavelength, we determine that the most sensitive measurement of the quantities of interest can be made at a wavelength of $\sim$14 nm. In the future, by reducing sample contamination due to sample preparation, deep sub-nanometer sensitivity to axial profiles and 2D structures will be possible. Our results are an important step in guiding EUV scatterometry towards increased accuracy and throughput with a priori computations and by leveraging new experimental capabilities.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
X-ray scattering investigation of hydride surface segregation in epitaxial Nb films
Authors:
David A. Garcia-Wetten,
Philip J. Ryan,
Jong Woo Kim,
Dominic P Goronzy,
Roger J. Reinertsen,
Mark C. Hersam,
Michael J. Bedzyk
Abstract:
Hydride precipitation in niobium-based, superconducting circuits is a damaging side-effect of hydrofluoric acid treatments used to clean and thin the Nb surface oxides and Si oxides. The precipitate microstructure is difficult to probe because of the high hydrogen mobility in the niobium matrix. In particular, destructive techniques used to prepare samples for elemental depth profiling can change…
▽ More
Hydride precipitation in niobium-based, superconducting circuits is a damaging side-effect of hydrofluoric acid treatments used to clean and thin the Nb surface oxides and Si oxides. The precipitate microstructure is difficult to probe because of the high hydrogen mobility in the niobium matrix. In particular, destructive techniques used to prepare samples for elemental depth profiling can change the hydride structure. Here, we use X-ray surface scattering to non-destructively probe the depth distribution of precipitates in hydrided, epitaxial, niobium thin films. We find that the niobium hydride is confined within the top ten nm of the surface.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Counterfactual Fairness Evaluation of Machine Learning Models on Educational Datasets
Authors:
Woojin Kim,
Hyeoncheol Kim
Abstract:
As machine learning models are increasingly used in educational settings, from detecting at-risk students to predicting student performance, algorithmic bias and its potential impacts on students raise critical concerns about algorithmic fairness. Although group fairness is widely explored in education, works on individual fairness in a causal context are understudied, especially on counterfactual…
▽ More
As machine learning models are increasingly used in educational settings, from detecting at-risk students to predicting student performance, algorithmic bias and its potential impacts on students raise critical concerns about algorithmic fairness. Although group fairness is widely explored in education, works on individual fairness in a causal context are understudied, especially on counterfactual fairness. This paper explores the notion of counterfactual fairness for educational data by conducting counterfactual fairness analysis of machine learning models on benchmark educational datasets. We demonstrate that counterfactual fairness provides meaningful insight into the causality of sensitive attributes and causal-based individual fairness in education.
△ Less
Submitted 20 April, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
Insights into Nb2C and Nb2CO2 as high-performance anodes for sodium- and lithium-ion batteries: An ab initio investigation
Authors:
Nishat Sultana,
Abdullah A. Amin,
Eric J. Payton,
Woo Kyun Kim
Abstract:
In this study, we employ first-principles density functional theory (DFT) calculations to investigate the electrochemical properties of Nb2C and Nb2CO2 MXenes as potential anode materials for sodium-ion (SIBs) and lithium-ion batteries (LIBs). Our findings reveal that Li and Na intercalation primarily modifies the electronic properties of Nb2C without inducing significant structural distortions, a…
▽ More
In this study, we employ first-principles density functional theory (DFT) calculations to investigate the electrochemical properties of Nb2C and Nb2CO2 MXenes as potential anode materials for sodium-ion (SIBs) and lithium-ion batteries (LIBs). Our findings reveal that Li and Na intercalation primarily modifies the electronic properties of Nb2C without inducing significant structural distortions, as indicated by Raman intensity variations. Adsorption energy calculations show that the T4 and H3 sites are the most favorable for metal intercalation, with Nb2CO2 exhibiting stronger adsorption due to oxygen functionalization. We find that Nb2C offers lower diffusion barriers, especially for Na ions, making it a promising candidate for fast-charging SIBs. In contrast, Nb2CO2 enhances charge retention through stronger electrostatic interactions but introduces higher migration resistance. Electronic structure analysis confirms the metallic nature of both MXenes, ensuring efficient electron transport. Open-circuit voltage (OCV) calculations indicate that Nb2CO2 exhibits higher OCV values than Nb2C, highlighting the role of surface functionalization in tuning electrochemical performance. Our study suggests that, while Li-based systems achieve slightly higher theoretical capacities, Na-based systems exhibit comparable performance, reinforcing the viability of sodium-ion batteries as a cost-effective alternative. Overall, our results demonstrate that Nb2C is better suited for rapid ion transport, whereas Nb2CO2 offers enhanced charge retention. These insights provide a foundation for the optimization of MXene-based electrodes for next-generation high performance energy storage applications.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Measurement-induced phase transitions in quantum inference problems and quantum hidden Markov models
Authors:
Sun Woo P. Kim,
Curt von Keyserlingk,
Austen Lamacraft
Abstract:
Recently, there is interest in coincident 'sharpening' and 'learnability' transitions in monitored quantum systems. In the latter, an outside observer's ability to infer properties of a quantum system from measurements undergoes a phase transition. Such transitions appear to be related to the decodability transition in quantum error correction, but the precise connection is not clear. Here, we stu…
▽ More
Recently, there is interest in coincident 'sharpening' and 'learnability' transitions in monitored quantum systems. In the latter, an outside observer's ability to infer properties of a quantum system from measurements undergoes a phase transition. Such transitions appear to be related to the decodability transition in quantum error correction, but the precise connection is not clear. Here, we study these problems under one framework we call the general quantum inference problem. In cases as above where the system has a Markov structure, we say that the inference is on a quantum hidden Markov model. We show a formal connection to classical hidden Markov models and that they coincide for certain setups. For example, we prove this for those involving Haar-random unitaries and measurements. We introduce the notion of Bayes non-optimality, where parameters used for inference differs from true ones. This allows us to expand the phase diagrams of above models. At Bayes optimality, we obtain an explicit relation between 'sharpening' and 'learnability' order parameters, explicitly showing that the two transitions coincide. Next, we study concrete examples. We review quantum error correction on the toric and repetition code and their mapping to 2D random-bond Ising model (RBIM) through our framework. We study the Haar-random U(1)-symmetric monitored quantum circuit and tree, mapping each to inference models that we call the planted SSEP and planted XOR, respectively, and expanding the phase diagram to Bayes non-optimality. For the circuit, we deduce the phase boundary numerically and analytically argue that it is of a single universality class. For the tree, we present an exact solution of the entire phase boundary, which displays re-entrance as does the 2D RBIM. We discuss these phase diagrams, with their interpretations for quantum inference problems and rigorous arguments on their shapes.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
Authors:
Muhammad Shihab Rashid,
Christian Bock,
Yuan Zhuang,
Alexander Buchholz,
Tim Esler,
Simon Valentin,
Luca Franceschi,
Martin Wistuba,
Prabhu Teja Sivaprasad,
Woo Jung Kim,
Anoop Deoras,
Giovanni Zappella,
Laurent Callot
Abstract:
Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains challenging. We introduce SWE-PolyBench, a new multi-language benchmark for repository-level, execution-based evaluation of coding agents. SWE-PolyBench contains 2110 instances from 21…
▽ More
Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains challenging. We introduce SWE-PolyBench, a new multi-language benchmark for repository-level, execution-based evaluation of coding agents. SWE-PolyBench contains 2110 instances from 21 repositories and includes tasks in Java (165), JavaScript (1017), TypeScript (729) and Python (199), covering bug fixes, feature additions, and code refactoring. We provide a task and repository-stratified subsample (SWE-PolyBench500) and release an evaluation harness allowing for fully automated evaluation. To enable a more comprehensive comparison of coding agents, this work also presents a novel set of metrics rooted in syntax tree analysis. We evaluate leading open source coding agents on SWE-PolyBench, revealing their strengths and limitations across languages, task types, and complexity classes. Our experiments show that current agents exhibit uneven performances across languages and struggle with complex problems while showing higher performance on simpler tasks. SWE-PolyBench aims to drive progress in developing more versatile and robust AI coding assistants for real-world software engineering. Our datasets and code are available at: https://github.com/amazon-science/SWE-PolyBench
△ Less
Submitted 23 April, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
Multidimensional Measurements of Beam Single Spin Asymmetries in Semi-inclusive Deep-inelastic Charged Kaon Electroproduction off Protons in the Valence Region
Authors:
A. Kripko,
S. Diehl,
K. Joo,
P. Achenbach,
J. S. Alvarado,
M. Amaryan,
W. R. Armstrong,
H. Atac,
H. Avakian,
L. Baashen,
N. A. Baltzell,
L. Barion,
M. Bashkanov,
F. Benmokhtar,
A. Bianconi,
A. S. Biselli,
M. Bondi,
F. Bossù,
S. Boiarinov,
K. -T. Brinkmann,
W. J. Briscoe,
W. K. Brooks,
T. Cao,
R. Capobianco,
D. S. Carman
, et al. (114 additional authors not shown)
Abstract:
Measurements of beam single spin asymmetries in semi-inclusive deep inelastic electron scattering (SIDIS) with positively charged kaons off protons have been performed with 10.6 and 10.2 GeV incident electron beams using the CLAS12 spectrometer at Jefferson Lab. We report an analysis of the electroproduction of positively charged kaons over a large kinematic range of fractional energy, Bjorken…
▽ More
Measurements of beam single spin asymmetries in semi-inclusive deep inelastic electron scattering (SIDIS) with positively charged kaons off protons have been performed with 10.6 and 10.2 GeV incident electron beams using the CLAS12 spectrometer at Jefferson Lab. We report an analysis of the electroproduction of positively charged kaons over a large kinematic range of fractional energy, Bjorken $x$, transverse momentum, and photon virtualities $Q^2$ ranging from 1 GeV$^2$ up to 6 GeV$^2$. This is the first published multi-dimensionally binned CLAS12 measurement of a kaon SIDIS single spin asymmetry in the valence quark regime. The data provide constraints on the structure function ratio $F_{LU}^{\sinφ}/F_{UU}$, where $F_{LU}^{\sinφ}$ is a quantity with a leading twist of twist-3 that can reveal novel aspects of the quark-gluon correlations within the nucleon. The impact of the data on understanding the underlying reaction mechanisms and their kinematic variation is explored using theoretical models for the different contributing twist-3 parton distribution functions (PDFs) and fragmentation functions (FFs).
△ Less
Submitted 15 April, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
Einstein ring of dust shells with quantum hair
Authors:
Sojeong Cheong,
Wontae Kim,
Mungon Nam
Abstract:
The information about the internal structure of a compact object is classically inaccessible to external observers. In this paper, we investigate how quantum corrections to gravitational fields can reveal the internal structure of compact objects composed of dust shells. Using an effective field theory approach to incorporate quantum corrections up to second order in curvature, we derive a quantum…
▽ More
The information about the internal structure of a compact object is classically inaccessible to external observers. In this paper, we investigate how quantum corrections to gravitational fields can reveal the internal structure of compact objects composed of dust shells. Using an effective field theory approach to incorporate quantum corrections up to second order in curvature, we derive a quantum-corrected metric for $N$ uniformly spaced shells with equal surface mass density and then examine how these corrections manifest in the deflection angle for gravitational lensing. In particular, we mainly investigate quantum-corrected astrophysical observables such as the Einstein ring and image magnification. Compared to the classical scenario, the deflection angle and the corresponding Einstein angle differ by a term that depends explicitly on the number of dust shells, which play the role of quantum hair. Specifically, the quantum correction to them diminishes as $N$ increases, yet a finite deviation from the classical result remains even in the continuum limit $N\to\infty$. Consequently, our results show that the internal structures of compact objects with identical mass and radius can be distinguished by quantum hair through their lensing observables.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Compositional Flows for 3D Molecule and Synthesis Pathway Co-design
Authors:
Tony Shen,
Seonghwan Seo,
Ross Irwin,
Kieran Didi,
Simon Olsson,
Woo Youn Kim,
Martin Ester
Abstract:
Many generative applications, such as synthesis-based 3D molecular design, involve constructing compositional objects with continuous features. Here, we introduce Compositional Generative Flows (CGFlow), a novel framework that extends flow matching to generate objects in compositional steps while modeling continuous states. Our key insight is that modeling compositional state transitions can be fo…
▽ More
Many generative applications, such as synthesis-based 3D molecular design, involve constructing compositional objects with continuous features. Here, we introduce Compositional Generative Flows (CGFlow), a novel framework that extends flow matching to generate objects in compositional steps while modeling continuous states. Our key insight is that modeling compositional state transitions can be formulated as a straightforward extension of the flow matching interpolation process. We further build upon the theoretical foundations of generative flow networks (GFlowNets), enabling reward-guided sampling of compositional structures. We apply CGFlow to synthesizable drug design by jointly designing the molecule's synthetic pathway with its 3D binding pose. Our approach achieves state-of-the-art binding affinity on all 15 targets from the LIT-PCBA benchmark, and 5.8$\times$ improvement in sampling efficiency compared to 2D synthesis-based baseline. To our best knowledge, our method is also the first to achieve state of-art-performance in both Vina Dock (-9.38) and AiZynth success rate (62.2\%) on the CrossDocked benchmark.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Azimuthal anisotropy of direct photons in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
S. Antsupov,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
E. Bannikov,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov
, et al. (301 additional authors not shown)
Abstract:
The PHENIX experiment at the Relativistic Heavy Ion Collider measured the second Fourier component $v_2$ of the direct-photon azimuthal anisotropy at midrapidity in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. The results are presented in 10\% wide bins of collision centrality and cover the transverse-momentum range of $1<p_T<20$ GeV/$c$, and are in quantitative agreement with findings publis…
▽ More
The PHENIX experiment at the Relativistic Heavy Ion Collider measured the second Fourier component $v_2$ of the direct-photon azimuthal anisotropy at midrapidity in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. The results are presented in 10\% wide bins of collision centrality and cover the transverse-momentum range of $1<p_T<20$ GeV/$c$, and are in quantitative agreement with findings published earlier, but provide better granularity and higher $p_T$ reach. Above a $p_T$ of 8--10 GeV/$c$, where hard scattering dominates the direct-photon production, $v_2$ is consistent with zero. Below that in each centrality bin $v_2$ as a function of $p_T$ is comparable to the $π^0$ anisotropy albeit with a tendency of being somewhat smaller. The results are compared to recent theory calculations that include, in addition to thermal radiation from the quark-gluon plasma and hadron gas, sources of photons from pre-equilibrium, strong magnetic fields, or radiative hadronization. While the newer theoretical calculations describe the data better than previous models, none of them alone can fully explain the results, particularly in the region of $p_T=4$--8 GeV/$c$.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
Authors:
Yoon Gyo Jung,
Jaewoo Park,
Jaeho Yoon,
Kuan-Chuan Peng,
Wonchul Kim,
Andrew Beng Jin Teoh,
Octavia Camps
Abstract:
We aim to solve unsupervised anomaly detection in a practical challenging environment where the normal dataset is both contaminated with defective regions and its product class distribution is tailed but unknown. We observe that existing models suffer from tail-versus-noise trade-off where if a model is robust against pixel noise, then its performance deteriorates on tail class samples, and vice v…
▽ More
We aim to solve unsupervised anomaly detection in a practical challenging environment where the normal dataset is both contaminated with defective regions and its product class distribution is tailed but unknown. We observe that existing models suffer from tail-versus-noise trade-off where if a model is robust against pixel noise, then its performance deteriorates on tail class samples, and vice versa. To mitigate the issue, we handle the tail class and noise samples independently. To this end, we propose TailSampler, a novel class size predictor that estimates the class cardinality of samples based on a symmetric assumption on the class-wise distribution of embedding similarities. TailSampler can be utilized to sample the tail class samples exclusively, allowing to handle them separately. Based on these facets, we build a memory-based anomaly detection model TailedCore, whose memory both well captures tail class information and is noise-robust. We extensively validate the effectiveness of TailedCore on the unsupervised long-tail noisy anomaly detection setting, and show that TailedCore outperforms the state-of-the-art in most settings.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
Authors:
Dohyun Kim,
Sehwan Park,
Geonhee Han,
Seung Wook Kim,
Paul Hongsuck Seo
Abstract:
Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation, which transfers knowledge from a complex teacher to a simpler student model, has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application…
▽ More
Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation, which transfers knowledge from a complex teacher to a simpler student model, has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application to diffusion models remains underexplored, especially in enabling student models to generate concepts not covered by the training images. In this work, we propose Random Conditioning, a novel approach that pairs noised images with randomly selected text conditions to enable efficient, image-free knowledge distillation. By leveraging this technique, we show that the student can generate concepts unseen in the training images. When applied to conditional diffusion model distillation, our method allows the student to explore the condition space without generating condition-specific images, resulting in notable improvements in both generation quality and efficiency. This promotes resource-efficient deployment of generative diffusion models, broadening their accessibility for both research and real-world applications. Code, models, and datasets are available at https://dohyun-as.github.io/Random-Conditioning .
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
Authors:
Hyunwoo Park,
Gun Ryu,
Wonjun Kim
Abstract:
Recently, 3D Gaussian splatting (3DGS) has gained considerable attentions in the field of novel view synthesis due to its fast performance while yielding the excellent image quality. However, 3DGS in sparse-view settings (e.g., three-view inputs) often faces with the problem of overfitting to training views, which significantly drops the visual quality of novel view images. Many existing approache…
▽ More
Recently, 3D Gaussian splatting (3DGS) has gained considerable attentions in the field of novel view synthesis due to its fast performance while yielding the excellent image quality. However, 3DGS in sparse-view settings (e.g., three-view inputs) often faces with the problem of overfitting to training views, which significantly drops the visual quality of novel view images. Many existing approaches have tackled this issue by using strong priors, such as 2D generative contextual information and external depth signals. In contrast, this paper introduces a prior-free method, so-called DropGaussian, with simple changes in 3D Gaussian splatting. Specifically, we randomly remove Gaussians during the training process in a similar way of dropout, which allows non-excluded Gaussians to have larger gradients while improving their visibility. This makes the remaining Gaussians to contribute more to the optimization process for rendering with sparse input views. Such simple operation effectively alleviates the overfitting problem and enhances the quality of novel view synthesis. By simply applying DropGaussian to the original 3DGS framework, we can achieve the competitive performance with existing prior-based 3DGS methods in sparse-view settings of benchmark datasets without any additional complexity. The code and model are publicly available at: https://github.com/DCVL-3D/DropGaussian release.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
An In-Situ Spatial-Temporal Sequence Detector for Neuromorphic Vision Sensor Empowered by High Density Vertical NAND Storage
Authors:
Zijian Zhao,
Varun Darshana Parekh,
Po-Kai Hsu,
Yixin Qin,
Yiming Song,
A N M Nafiul Islam,
Ningyuan Cao,
Siddharth Joshi,
Thomas Kämpfe,
Moonyoung Jung,
Kwangyou Seo,
Kwangsoo Kim,
Wanki Kim,
Daewon Ha,
Sourav Dutta,
Abhronil Sengupta,
Xiao Gong,
Shimeng Yu,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements,…
▽ More
Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements, such as ferroelectric field-effect transistors (FeFETs), and mapping a pixel's temporal sequence onto consecutive word lines (WLs), we enable direct temporal pattern detection within NAND strings. Each NAND string serves as a dedicated reference for a single pixel, while different blocks store patterns for distinct pixels, allowing large-scale spatial-temporal pattern recognition via simple direct bit-line (BL) sensing, a well-established operation in vertical NAND storage. We experimentally validate our approach at both the cell and array levels, demonstrating that vertical NAND-based detector achieves more than six orders of magnitude improvement in energy efficiency and more than three orders of magnitude reduction in latency compared to conventional CPU-based methods. These findings establish vertical NAND storage as a scalable and energy-efficient solution for next-generation neuromorphic vision processing.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.