-
Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models
Authors:
Cheonbok Park,
Jeonghoon Kim,
Joosung Lee,
Sanghwan Bae,
Jaegul Choo,
Kang Min Yoo
Abstract:
We identify \textbf{Cross-lingual Collapse}, a systematic drift in which the chain-of-thought (CoT) of a multilingual language model reverts to its dominant pre-training language even when the prompt is expressed in a different language. Recent large language models (LLMs) with reinforcement learning with verifiable reward (RLVR) have achieved strong logical reasoning performances by exposing thei…
▽ More
We identify \textbf{Cross-lingual Collapse}, a systematic drift in which the chain-of-thought (CoT) of a multilingual language model reverts to its dominant pre-training language even when the prompt is expressed in a different language. Recent large language models (LLMs) with reinforcement learning with verifiable reward (RLVR) have achieved strong logical reasoning performances by exposing their intermediate reasoning traces, giving rise to large reasoning models (LRMs). However, the mechanism behind multilingual reasoning in LRMs is not yet fully explored. To investigate the issue, we fine-tune multilingual LRMs with Group-Relative Policy Optimization (GRPO) on translated versions of the GSM$8$K and SimpleRL-Zoo datasets in three different languages: Chinese, Korean, and Ukrainian. During training, we monitor both task accuracy and language consistency of the reasoning chains. Our experiments reveal three key findings: (i) GRPO rapidly amplifies pre-training language imbalances, leading to the erosion of low-resource languages within just a few hundred updates; (ii) language consistency reward mitigates this drift but does so at the expense of an almost 5 - 10 pp drop in accuracy. and (iii) the resulting language collapse is severely damaging and largely irreversible, as subsequent fine-tuning struggles to steer the model back toward its original target-language reasoning capabilities. Together, these findings point to a remarkable conclusion: \textit{not all languages are trained equally for reasoning}. Furthermore, our paper sheds light on the roles of reward shaping, data difficulty, and pre-training priors in eliciting multilingual reasoning.
△ Less
Submitted 9 June, 2025; v1 submitted 6 June, 2025;
originally announced June 2025.
-
Better Late than Never: the Complexity of Arrangements of Polyhedra
Authors:
Boris Aronov,
Sang Won Bae,
Sergio Cabello,
Otfried Cheong,
David Eppstein,
Christian Knauer,
Raimund Seidel
Abstract:
Let $\mathcal{A}$ be the subdivision of $\mathbb{R}^d$ induced by $m$ convex polyhedra having $n$ facets in total. We prove that $\mathcal{A}$ has combinatorial complexity $O(m^{\lceil d/2 \rceil} n^{\lfloor d/2 \rfloor})$ and that this bound is tight. The bound is mentioned several times in the literature, but no proof for arbitrary dimension has been published before.
Let $\mathcal{A}$ be the subdivision of $\mathbb{R}^d$ induced by $m$ convex polyhedra having $n$ facets in total. We prove that $\mathcal{A}$ has combinatorial complexity $O(m^{\lceil d/2 \rceil} n^{\lfloor d/2 \rfloor})$ and that this bound is tight. The bound is mentioned several times in the literature, but no proof for arbitrary dimension has been published before.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels
Authors:
Aravind R. Krishnan,
Thomas Z. Li,
Lucas W. Remedios,
Michael E. Kim,
Chenyu Gao,
Gaurav Rudravaram,
Elyssa M. McMaster,
Adam M. Saunders,
Shunxing Bao,
Kaiwen Xu,
Lianrui Zuo,
Kim L. Sandler,
Fabien Maldonado,
Yuankai Huo,
Bennett A. Landman
Abstract:
Reconstruction kernels in computed tomography (CT) affect spatial resolution and noise characteristics, introducing systematic variability in quantitative imaging measurements such as emphysema quantification. Choosing an appropriate kernel is therefore essential for consistent quantitative analysis. We propose a multipath cycleGAN model for CT kernel harmonization, trained on a mixture of paired…
▽ More
Reconstruction kernels in computed tomography (CT) affect spatial resolution and noise characteristics, introducing systematic variability in quantitative imaging measurements such as emphysema quantification. Choosing an appropriate kernel is therefore essential for consistent quantitative analysis. We propose a multipath cycleGAN model for CT kernel harmonization, trained on a mixture of paired and unpaired data from a low-dose lung cancer screening cohort. The model features domain-specific encoders and decoders with a shared latent space and uses discriminators tailored for each domain.We train the model on 42 kernel combinations using 100 scans each from seven representative kernels in the National Lung Screening Trial (NLST) dataset. To evaluate performance, 240 scans from each kernel are harmonized to a reference soft kernel, and emphysema is quantified before and after harmonization. A general linear model assesses the impact of age, sex, smoking status, and kernel on emphysema. We also evaluate harmonization from soft kernels to a reference hard kernel. To assess anatomical consistency, we compare segmentations of lung vessels, muscle, and subcutaneous adipose tissue generated by TotalSegmentator between harmonized and original images. Our model is benchmarked against traditional and switchable cycleGANs. For paired kernels, our approach reduces bias in emphysema scores, as seen in Bland-Altman plots (p<0.05). For unpaired kernels, harmonization eliminates confounding differences in emphysema (p>0.05). High Dice scores confirm preservation of muscle and fat anatomy, while lung vessel overlap remains reasonable. Overall, our shared latent space multipath cycleGAN enables robust harmonization across paired and unpaired CT kernels, improving emphysema quantification and preserving anatomical fidelity.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging
Authors:
Ho Hin Lee,
Quan Liu,
Shunxing Bao,
Yuankai Huo,
Bennett A. Landman
Abstract:
In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high-resolution 3D volumetric settings. However, naively increasing kernel size often leads to optimization instability and degradation in performance. Motivated by the spatial bias observed in effective…
▽ More
In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high-resolution 3D volumetric settings. However, naively increasing kernel size often leads to optimization instability and degradation in performance. Motivated by the spatial bias observed in effective receptive fields (ERFs), we hypothesize that different kernel elements converge at variable rates during training. To support this, we derive a theoretical connection between element-wise gradients and first-order optimization, showing that structurally re-parameterized convolution blocks inherently induce spatially varying learning rates. Building on this insight, we introduce Rep3D, a 3D convolutional framework that incorporates a learnable spatial prior into large kernel training. A lightweight two-stage modulation network generates a receptive-biased scaling mask, adaptively re-weighting kernel updates and enabling local-to-global convergence behavior. Rep3D adopts a plain encoder design with large depthwise convolutions, avoiding the architectural complexity of multi-branch compositions. We evaluate Rep3D on five challenging 3D segmentation benchmarks and demonstrate consistent improvements over state-of-the-art baselines, including transformer-based and fixed-prior re-parameterization methods. By unifying spatial inductive bias with optimization-aware learning, Rep3D offers an interpretable, and scalable solution for 3D medical image analysis. The source code is publicly available at https://github.com/leeh43/Rep3D.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Algebraic Zhou valuations
Authors:
Shijie Bao,
Qi'an Guan,
Lin Zhou
Abstract:
In this paper, we generalize Zhou valuations, originally defined on complex domains, to the framework of general schemes. We demonstrate that an algebraic version of the Jonsson--Mustaţă conjecture is equivalent to the statement that every Zhou valuation is quasi-monomial. By introducing a mixed version of jumping numbers and Tian functions associated with valuations, we obtain characterizations o…
▽ More
In this paper, we generalize Zhou valuations, originally defined on complex domains, to the framework of general schemes. We demonstrate that an algebraic version of the Jonsson--Mustaţă conjecture is equivalent to the statement that every Zhou valuation is quasi-monomial. By introducing a mixed version of jumping numbers and Tian functions associated with valuations, we obtain characterizations of a valuation being a Zhou valuation or computing some jumping number using the Tian functions. Furthermore, we establish the correspondence between Zhou valuations in algebraic settings and their counterparts in analytic settings.
△ Less
Submitted 5 June, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions
Authors:
Daeun Kyung,
Hyunseung Chung,
Seongsu Bae,
Jiho Kim,
Jae Ho Sohn,
Taerim Kim,
Soo Kyung Kim,
Edward Choi
Abstract:
Doctor-patient consultations require multi-turn, context-aware communication tailored to diverse patient personas. Training or evaluating doctor LLMs in such settings requires realistic patient interaction systems. However, existing simulators often fail to reflect the full range of personas seen in clinical practice. To address this, we introduce PatientSim, a patient simulator that generates rea…
▽ More
Doctor-patient consultations require multi-turn, context-aware communication tailored to diverse patient personas. Training or evaluating doctor LLMs in such settings requires realistic patient interaction systems. However, existing simulators often fail to reflect the full range of personas seen in clinical practice. To address this, we introduce PatientSim, a patient simulator that generates realistic and diverse patient personas for clinical scenarios, grounded in medical expertise. PatientSim operates using: 1) clinical profiles, including symptoms and medical history, derived from real-world data in the MIMIC-ED and MIMIC-IV datasets, and 2) personas defined by four axes: personality, language proficiency, medical history recall level, and cognitive confusion level, resulting in 37 unique combinations. We evaluated eight LLMs for factual accuracy and persona consistency. The top-performing open-source model, Llama 3.3, was validated by four clinicians to confirm the robustness of our framework. As an open-source, customizable platform, PatientSim provides a reproducible and scalable solution that can be customized for specific training needs. Offering a privacy-compliant environment, it serves as a robust testbed for evaluating medical dialogue systems across diverse patient presentations and shows promise as an educational tool for healthcare.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
Authors:
Feiran Li,
Qianqian Xu,
Shilong Bao,
Zhiyong Yang,
Xiaochun Cao,
Qingming Huang
Abstract:
Concept erasing has recently emerged as an effective paradigm to prevent text-to-image diffusion models from generating visually undesirable or even harmful content. However, current removal methods heavily rely on manually crafted text prompts, making it challenging to achieve a high erasure (efficacy) while minimizing the impact on other benign concepts (usability). In this paper, we attribute t…
▽ More
Concept erasing has recently emerged as an effective paradigm to prevent text-to-image diffusion models from generating visually undesirable or even harmful content. However, current removal methods heavily rely on manually crafted text prompts, making it challenging to achieve a high erasure (efficacy) while minimizing the impact on other benign concepts (usability). In this paper, we attribute the limitations to the inherent gap between the text and image modalities, which makes it hard to transfer the intricately entangled concept knowledge from text prompts to the image generation process. To address this, we propose a novel solution by directly integrating visual supervision into the erasure process, introducing the first text-image Collaborative Concept Erasing (Co-Erasing) framework. Specifically, Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts, and then reduces the generating probability of the target concept through negative guidance. This approach effectively bypasses the knowledge gap between text and image, significantly enhancing erasure efficacy. Additionally, we design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept, minimizing disruption to other benign concepts. Finally, comprehensive experiments suggest that Co-Erasing outperforms state-of-the-art erasure approaches significantly with a better trade-off between efficacy and usability. Codes are available at https://github.com/Ferry-Li/Co-Erasing.
△ Less
Submitted 26 May, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
Light Axion-Like Particles at Future Lepton Colliders
Authors:
Shou-shan Bao,
Yang Ma,
Yongcheng Wu,
Keping Xie,
Hong Zhang
Abstract:
Axion-like particles (ALPs) are well-motivated extensions of the Standard Model (SM) that appear in many new physics scenarios, with masses spanning a broad range. In this work, we systematically study the production and detection prospects of light ALPs at future lepton colliders, including electron-positron and multi-TeV muon colliders. At lepton colliders, light ALPs can be produced in associat…
▽ More
Axion-like particles (ALPs) are well-motivated extensions of the Standard Model (SM) that appear in many new physics scenarios, with masses spanning a broad range. In this work, we systematically study the production and detection prospects of light ALPs at future lepton colliders, including electron-positron and multi-TeV muon colliders. At lepton colliders, light ALPs can be produced in association with a photon or a $Z$ boson. For very light ALPs ($m_a < 1$ MeV), the ALPs are typically long-lived and escape detection, leading to a mono-$V$ ($V = γ, Z$) signature. In the long-lived limit, we find that the mono-photon channel at the Tera-$Z$ stage of future electron-positron colliders provides the strongest constraints on ALP couplings to SM gauge bosons, $g_{aVV}$, thanks to the high luminosity, low background, and resonant enhancement from on-shell $Z$ bosons. At higher energies, the mono-photon cross section becomes nearly energy-independent, and the sensitivity is governed by luminosity and background. At multi-TeV muon colliders, the mono-$Z$ channel can yield complementary constraints. For heavier ALPs ($m_a > 100$ MeV) that decay promptly, mono-$V$ signatures are no longer valid. In this case, ALPs can be probed via non-resonant vector boson scattering (VBS) processes, where the ALP is exchanged off-shell, leading to kinematic deviations from SM expectations. We analyze constraints from both light-by-light scattering and electroweak VBS, the latter only accessible at TeV-scale colliders. While generally weaker, these constraints are robust and model-independent. Our combined analysis shows that mono-$V$ and non-resonant VBS channels provide powerful and complementary probes of ALP-gauge boson interactions.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Authors:
Zeeshan Ahmad,
Shudi Bao,
Meng Chen
Abstract:
In recent years, generative adversarial networks (GANs) have made significant progress in generating audio sequences. However, these models typically rely on bandwidth-limited mel-spectrograms, which constrain the resolution of generated audio sequences, and lead to mode collapse during conditional generation. To address this issue, we propose Deformable Periodic Network based GAN (DPN-GAN), a nov…
▽ More
In recent years, generative adversarial networks (GANs) have made significant progress in generating audio sequences. However, these models typically rely on bandwidth-limited mel-spectrograms, which constrain the resolution of generated audio sequences, and lead to mode collapse during conditional generation. To address this issue, we propose Deformable Periodic Network based GAN (DPN-GAN), a novel GAN architecture that incorporates a kernel-based periodic ReLU activation function to induce periodic bias in audio generation. This innovative approach enhances the model's ability to capture and reproduce intricate audio patterns. In particular, our proposed model features a DPN module for multi-resolution generation utilizing deformable convolution operations, allowing for adaptive receptive fields that improve the quality and fidelity of the synthetic audio. Additionally, we enhance the discriminator network using deformable convolution to better distinguish between real and generated samples, further refining the audio quality. We trained two versions of the model: DPN-GAN small (38.67M parameters) and DPN-GAN large (124M parameters). For evaluation, we use five different datasets, covering both speech synthesis and music generation tasks, to demonstrate the efficiency of the DPN-GAN. The experimental results demonstrate that DPN-GAN delivers superior performance on both out-of-distribution and noisy data, showcasing its robustness and adaptability. Trained across various datasets, DPN-GAN outperforms state-of-the-art GAN architectures on standard evaluation metrics, and exhibits increased robustness in synthesized audio.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges
Authors:
Shixi Qin,
Zhiyong Yang,
Shilong Bao,
Shi Wang,
Qianqian Xu,
Qingming Huang
Abstract:
This paper focuses on implanting multiple heterogeneous backdoor triggers in bridge-based diffusion models designed for complex and arbitrary input distributions. Existing backdoor formulations mainly address single-attack scenarios and are limited to Gaussian noise input models. To fill this gap, we propose MixBridge, a novel diffusion Schrödinger bridge (DSB) framework to cater to arbitrary inpu…
▽ More
This paper focuses on implanting multiple heterogeneous backdoor triggers in bridge-based diffusion models designed for complex and arbitrary input distributions. Existing backdoor formulations mainly address single-attack scenarios and are limited to Gaussian noise input models. To fill this gap, we propose MixBridge, a novel diffusion Schrödinger bridge (DSB) framework to cater to arbitrary input distributions (taking I2I tasks as special cases). Beyond this trait, we demonstrate that backdoor triggers can be injected into MixBridge by directly training with poisoned image pairs. This eliminates the need for the cumbersome modifications to stochastic differential equations required in previous studies, providing a flexible tool to study backdoor behavior for bridge models. However, a key question arises: can a single DSB model train multiple backdoor triggers? Unfortunately, our theory shows that when attempting this, the model ends up following the geometric mean of benign and backdoored distributions, leading to performance conflict across backdoor tasks. To overcome this, we propose a Divide-and-Merge strategy to mix different bridges, where models are independently pre-trained for each specific objective (Divide) and then integrated into a unified model (Merge). In addition, a Weight Reallocation Scheme (WRS) is also designed to enhance the stealthiness of MixBridge. Empirical studies across diverse generation tasks speak to the efficacy of MixBridge.
△ Less
Submitted 26 May, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
From Rankings to Insights: Evaluation Should Shift Focus from Leaderboard to Feedback
Authors:
Zongqi Wang,
Tianle Gu,
Chen Gong,
Xin Tian,
Siqi Bao,
Yujiu Yang
Abstract:
Automatic evaluation benchmarks such as MT-Bench, Arena-Hard, and Auto-Arena are seeing growing adoption for the evaluation of Large Language Models (LLMs). Existing research has primarily focused on approximating human-based model rankings using limited data and LLM-as-a-Judge. However, the fundamental premise of these studies, which attempts to replicate human rankings, is flawed. Specifically,…
▽ More
Automatic evaluation benchmarks such as MT-Bench, Arena-Hard, and Auto-Arena are seeing growing adoption for the evaluation of Large Language Models (LLMs). Existing research has primarily focused on approximating human-based model rankings using limited data and LLM-as-a-Judge. However, the fundamental premise of these studies, which attempts to replicate human rankings, is flawed. Specifically, these benchmarks typically offer only overall scores, limiting their utility to leaderboard rankings, rather than providing feedback that can guide model optimization and support model profiling. Therefore, we advocate for an evaluation paradigm shift from approximating human-based model rankings to providing feedback with analytical value. To this end, we introduce \textbf{Feedbacker}, an evaluation framework that provides comprehensive and fine-grained results, thereby enabling thorough identification of a model's specific strengths and weaknesses. Such feedback not only supports the targeted optimization of the model but also enhances the understanding of its behavior. Feedbacker comprises three key components: an extensible tree-based query taxonomy builder, an automated query synthesis scheme, and a suite of visualization and analysis tools. Furthermore, we propose a novel LLM-as-a-Judge method: PC$^{2}$ (Pre-Comparison-derived Criteria) pointwise evaluation. This method derives evaluation criteria by pre-comparing the differences between several auxiliary responses, achieving the accuracy of pairwise evaluation while maintaining the time complexity of pointwise evaluation. Finally, leveraging the evaluation results of 17 mainstream LLMs, we demonstrate the usage of Feedbacker and highlight its effectiveness and potential. Our project homepage and dataset are available at https://liudan193.github.io/Feedbacker.
△ Less
Submitted 16 May, 2025; v1 submitted 10 May, 2025;
originally announced May 2025.
-
OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
Authors:
Cong Hua,
Qianqian Xu,
Zhiyong Yang,
Zitai Wang,
Shilong Bao,
Qingming Huang
Abstract:
Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the develop…
▽ More
Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What's more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose OpenworldAUC, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize OpenworldAUC effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on OpenworldAUC and other metrics. We release the code at https://github.com/huacong/OpenworldAUC
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Axion Dark Matter Search with Near-KSVZ Sensitivity Using the TM$_{020}$ Mode
Authors:
Sungjae Bae,
Junu Jeong,
Younggeun Kim,
SungWoo Youn,
Jinsu Kim,
Arjan F. van Loo,
Yasunobu Nakamura,
Seonjeong Oh,
Taehyeon Seong,
Sergey Uchaikin,
Jihn E. Kim,
Yannis K. Semertzidis
Abstract:
Dark matter remains one of the most profound mysteries in modern physics, with axions, a hypothetical particle proposed to resolve the strong CP problem, standing as a compelling candidate. Among various experimental strategies, cavity haloscopes currently offer the most sensitive method to detect axions, though their searches have largely been confined to axion masses below 10 $μ$eV. However, rec…
▽ More
Dark matter remains one of the most profound mysteries in modern physics, with axions, a hypothetical particle proposed to resolve the strong CP problem, standing as a compelling candidate. Among various experimental strategies, cavity haloscopes currently offer the most sensitive method to detect axions, though their searches have largely been confined to axion masses below 10 $μ$eV. However, recent theoretical developments suggest that the axion mass lies beyond this range. Higher-order cavity modes have been explored as a methodological approach to expand the search range, albeit with limited success in achieving both high sensitivity and broad tunability. In this work, we present a sensitive search for axions with masses around 21 $μ$eV, utilizing the TM$_{020}$ mode of a cylindrical cavity, which incorporated an innovative tuning mechanism. Our results reached 1.7 times the KSVZ sensitivity over 100 MHz, representing a significant improvement in this mass range and contributing to the experimental search for axion dark matter at higher masses.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Authors:
Manveer Singh Tamber,
Forrest Sheng Bao,
Chenyu Xu,
Ge Luo,
Suleman Kazi,
Minseok Bae,
Miaoran Li,
Ofer Mendelevitch,
Renyi Qu,
Jimmy Lin
Abstract:
Hallucinations remain a persistent challenge for LLMs. RAG aims to reduce hallucinations by grounding responses in contexts. However, even when provided context, LLMs still frequently introduce unsupported information or contradictions. This paper presents our efforts to measure LLM hallucinations with a focus on summarization tasks, assessing how often various LLMs introduce hallucinations when s…
▽ More
Hallucinations remain a persistent challenge for LLMs. RAG aims to reduce hallucinations by grounding responses in contexts. However, even when provided context, LLMs still frequently introduce unsupported information or contradictions. This paper presents our efforts to measure LLM hallucinations with a focus on summarization tasks, assessing how often various LLMs introduce hallucinations when summarizing documents. We discuss Vectara's existing LLM hallucination leaderboard, based on the Hughes Hallucination Evaluation Model (HHEM). While HHEM and Vectara's Hallucination Leaderboard have garnered great research interest, we examine challenges faced by HHEM and current hallucination detection methods by analyzing the effectiveness of these methods on existing hallucination datasets. To address these limitations, we propose FaithJudge, an LLM-as-a-judge approach guided by few-shot human hallucination annotations, which substantially improves automated LLM hallucination evaluation over current methods. We introduce an enhanced hallucination leaderboard centered on FaithJudge, alongside our current hallucination leaderboard, enabling more reliable benchmarking of LLMs for hallucinations in RAG.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving
Authors:
Faizan M. Tariq,
Zheng-Hang Yeh,
Avinash Singh,
David Isele,
Sangjae Bae
Abstract:
Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning stra…
▽ More
Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning strategy for autonomous driving that ensures smooth and safe navigation around obstacles. Modeling the vehicles as safety-augmented bounding boxes and pedestrians as convex hulls in the Frenet space, our approach defines a drivable corridor by determining the appropriate deviation side for static obstacles. Thereafter, a modified space-domain bicycle kinematics model enables path optimization for smoothness, boundary clearance, and dynamic obstacle risk minimization. The optimized path is then passed to a speed planner to generate the final trajectory. We validate FCP through extensive simulations and real-world hardware experiments, demonstrating its efficiency and effectiveness.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Atom-by-atom Imaging of Moiré Phasons using Electron Ptychography
Authors:
Yichao Zhang,
Ballal Ahammed,
Sang Hyun Bae,
Chia-Hao Lee,
Jeffrey Huang,
Mohammad Abir Hossain,
Tawfiqur Rakib,
Arend van der Zande,
Elif Ertekin,
Pinshane Y. Huang
Abstract:
Twisted 2D materials exhibit unique vibrational modes called moiré phonons, which arise from the moiré superlattice. Here, we demonstrate atom-by-atom imaging of phasons, an ultrasoft class of moiré phonons in twisted bilayer WSe2. Using ultrahigh-resolution (<15 pm) electron ptychography, we image the size and shape of each atom to extract time-averaged vibrational amplitudes as a function of twi…
▽ More
Twisted 2D materials exhibit unique vibrational modes called moiré phonons, which arise from the moiré superlattice. Here, we demonstrate atom-by-atom imaging of phasons, an ultrasoft class of moiré phonons in twisted bilayer WSe2. Using ultrahigh-resolution (<15 pm) electron ptychography, we image the size and shape of each atom to extract time-averaged vibrational amplitudes as a function of twist angle and position. We observe several signature properties of moiré phasons, such as increased vibrational amplitudes at solitons and AA-stacked regions. By correlating experiments with molecular dynamics simulations and lattice dynamics calculations, we show phasons dominate the thermal vibrations in low-angle twisted bilayers. These results represent a powerful route to image thermal vibrations at atomic resolution, unlocking experimental studies of a thus-far hidden branch of moiré phonon physics.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
Authors:
Qingqiu Li,
Zihang Cui,
Seongsu Bae,
Jilan Xu,
Runtian Yuan,
Yuejie Zhang,
Rui Feng,
Quanli Shen,
Xiaobo Zhang,
Junjun He,
Shujun Wang
Abstract:
Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings. Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency. However, despite their strong visual understanding, current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient region-level understanding an…
▽ More
Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings. Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency. However, despite their strong visual understanding, current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient region-level understanding and interaction, and (2) Limited accuracy and interpretability due to single-step reasoning. In this paper, we empower MLMMs with anatomy-centric reasoning capabilities to enhance their interactivity and explainability. Specifically, we first propose an Anatomical Ontology-Guided Reasoning (AOR) framework, which centers on cross-modal region-level information to facilitate multi-step reasoning. Next, under the guidance of expert physicians, we develop AOR-Instruction, a large instruction dataset for MLMMs training. Our experiments demonstrate AOR's superior performance in both VQA and report generation tasks.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Neural Logistic Bandits
Authors:
Seoungbin Bae,
Dabeen Lee
Abstract:
We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $κ$, where $1/κ$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-…
▽ More
We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $κ$, where $1/κ$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $κ$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{κT})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/κ})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Superradiant dark matter production from primordial black holes: Impact of multiple modes and gravitational wave emission
Authors:
Nayun Jia,
Shou-Shan Bao,
Chen Zhang,
Hong Zhang,
Xin Zhang
Abstract:
Rotating primordial black holes (PBHs) in the early universe can emit particles through superradiance, a process particularly efficient when the particle's Compton wavelength is comparable to the PBH's gravitational radius. Superradiance leads to an exponential growth of particle occupation numbers in gravitationally bound states. We present an analysis of heavy bosonic dark matter (DM) production…
▽ More
Rotating primordial black holes (PBHs) in the early universe can emit particles through superradiance, a process particularly efficient when the particle's Compton wavelength is comparable to the PBH's gravitational radius. Superradiance leads to an exponential growth of particle occupation numbers in gravitationally bound states. We present an analysis of heavy bosonic dark matter (DM) production through three gravitational mechanisms: Hawking radiation, superradiant instabilities, and ultraviolet (UV) freeze-in. We consider PBHs that evaporate before Big Bang Nucleosynthesis (BBN). For both scalar and vector DM, our analysis incorporates the evolution of a second superradiant mode. We demonstrate that the growth of a second superradiant mode causes the decay of the first mode, and thus the second mode cannot further enhance the DM abundance beyond that already achieved by the first mode. Our study also reveals that while superradiance generally enhances DM production, gravitational wave (GW) emission from the superradiant cloud may significantly modify this picture. For scalar DM, GW emission reduces the parameter space where superradiance effectively augments relic abundance. For vector DM, rapid GW emission from the superradiant cloud may yield relic abundances below those achieved through Hawking radiation alone. These findings demonstrate that multiple-mode effect and GW emission play critical roles in modeling DM production from PBHs in the early universe.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Authors:
Sungnyun Kim,
Sungwoo Cho,
Sangmin Bae,
Kangwook Jang,
Se-Young Yun
Abstract:
Audio-visual speech recognition (AVSR) incorporates auditory and visual modalities to improve recognition accuracy, particularly in noisy environments where audio-only speech systems are insufficient. While previous research has largely addressed audio disruptions, few studies have dealt with visual corruptions, e.g., lip occlusions or blurred videos, which are also detrimental. To address this re…
▽ More
Audio-visual speech recognition (AVSR) incorporates auditory and visual modalities to improve recognition accuracy, particularly in noisy environments where audio-only speech systems are insufficient. While previous research has largely addressed audio disruptions, few studies have dealt with visual corruptions, e.g., lip occlusions or blurred videos, which are also detrimental. To address this real-world challenge, we propose CAV2vec, a novel self-supervised speech representation learning framework particularly designed to handle audio-visual joint corruption. CAV2vec employs a self-distillation approach with a corrupted prediction task, where the student model learns to predict clean targets, generated by the teacher model, with corrupted input frames. Specifically, we suggest a unimodal multi-task learning, which distills cross-modal knowledge and aligns the corrupted modalities, by predicting clean audio targets with corrupted videos, and clean video targets with corrupted audios. This strategy mitigates the dispersion in the representation space caused by corrupted modalities, leading to more reliable and robust audio-visual fusion. Our experiments on robust AVSR benchmarks demonstrate that the corrupted representation learning method significantly enhances recognition accuracy across generalized environments involving various types of corruption. Our code is available at https://github.com/sungnyun/cav2vec.
△ Less
Submitted 30 April, 2025; v1 submitted 23 January, 2025;
originally announced April 2025.
-
I-INR: Iterative Implicit Neural Representations
Authors:
Ali Haider,
Muhammad Salman Ali,
Maryam Qamar,
Tahir Khalil,
Soo Ye Kim,
Jihyong Oh,
Enzo Tartaglione,
Sung-Ho Bae
Abstract:
Implicit Neural Representations (INRs) have revolutionized signal processing and computer vision by modeling signals as continuous, differentiable functions parameterized by neural networks. However, their inherent formulation as a regression problem makes them prone to regression to the mean, limiting their ability to capture fine details, retain high-frequency information, and handle noise effec…
▽ More
Implicit Neural Representations (INRs) have revolutionized signal processing and computer vision by modeling signals as continuous, differentiable functions parameterized by neural networks. However, their inherent formulation as a regression problem makes them prone to regression to the mean, limiting their ability to capture fine details, retain high-frequency information, and handle noise effectively. To address these challenges, we propose Iterative Implicit Neural Representations (I-INRs) a novel plug-and-play framework that enhances signal reconstruction through an iterative refinement process. I-INRs effectively recover high-frequency details, improve robustness to noise, and achieve superior reconstruction quality. Our framework seamlessly integrates with existing INR architectures, delivering substantial performance gains across various tasks. Extensive experiments show that I-INRs outperform baseline methods, including WIRE, SIREN, and Gauss, in diverse computer vision applications such as image restoration, image denoising, and object occupancy prediction.
△ Less
Submitted 9 June, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
Observation of Double Hysteresis in CoFe$_2$O$_4$/MnFe$_2$O$_4$ Core/Shell Nanoparticles and Its Contribution to AC Heat Induction
Authors:
Jie Wang,
Hyungsub Kim,
Ji-wook Kim,
HyeongJoo Seo,
Satoshi Ota,
Chun-Yeol You,
Yasushi Takemura,
Seongtae Bae
Abstract:
Magnetic core/shell nanoparticles are promising candidates for magnetic hyperthermia due to its high AC magnetic heat induction (specific loss power (SLP)). It's widely accepted that magnetic exchange-coupling between core and shell plays the crucial role in enhancing SLP of magnetic core/shell nanoparticles. However, the physical contribution of exchange coupling to SLP has not been systematicall…
▽ More
Magnetic core/shell nanoparticles are promising candidates for magnetic hyperthermia due to its high AC magnetic heat induction (specific loss power (SLP)). It's widely accepted that magnetic exchange-coupling between core and shell plays the crucial role in enhancing SLP of magnetic core/shell nanoparticles. However, the physical contribution of exchange coupling to SLP has not been systematically investigated, and the underlying mechanism remains unclear. In this study, magnetic hard/soft CoFe$_2$O$_4$/MnFe$_2$O$_4 and inverted soft/hard MnFe$_2$O$_4$/CoFe$_2$O$_4$ core/shell nanoparticles were synthesized, systematically varying the number of shell layers, to investigate the physical contribution of internal bias coupling at the core/shell interface to AC heat induction (SLP). Our results show that a unique magnetic property, double-hysteresis loop, was present and clearly observed, which was never reported in previous core/shell research literature. According to the experimentally and theoretically analyzed results, the double-hysteresis behavior in core/shell nanoparticles was caused by the difference in magnetic anisotropy between core and shell materials, separated by a non-magnetic interface. The enhanced SLP and maximum temperature rise (TAC,max) of core/shell nanoparticles are attributed to the optimized magnetic anisotropy, AC magnetic softness and double hysteresis behavior due to the internal bias coupling. These results demonstrate that the rational design capabilities to separately control the magnetic anisotropy, AC/DC magnetic properties by varying the volume ration between core and shell and by switching hard or soft phase materials between core and shell are effective modalities to enhance the AC heat induction of core/shell nanoparticles for magnetic nanoparticle hyperthermia.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Atomic-scale imaging and charge state manipulation of NV centers by scanning tunneling microscopy
Authors:
Arjun Raghavan,
Seokjin Bae,
Nazar Delegan,
F. Joseph Heremans,
Vidya Madhavan
Abstract:
Nitrogen-vacancy (NV) centers in diamond are among the most promising solid-state qubit candidates, owing to their exceptionally long spin coherence times, efficient spin-photon coupling, room-temperature operation, and steadily advancing fabrication and integration techniques. Despite significant progress in the field, atomic-scale characterization and control of individual NV centers have remain…
▽ More
Nitrogen-vacancy (NV) centers in diamond are among the most promising solid-state qubit candidates, owing to their exceptionally long spin coherence times, efficient spin-photon coupling, room-temperature operation, and steadily advancing fabrication and integration techniques. Despite significant progress in the field, atomic-scale characterization and control of individual NV centers have remained elusive. In this work, we present a novel approach utilizing a conductive graphene capping layer to enable direct imaging and manipulation of $NV^{-}$ defects via scanning tunneling microscopy (STM). By investigating over 40 individual $NV^{-}$ centers, we identify their spectroscopic signatures and spatial configurations. Our dI/dV conductance spectra reveal the ground state approximately 300 meV below the Fermi level. Additionally, density-of-states mapping uncovers a two-lobed wavefunction aligned along the [111] crystallographic direction. Remarkably, we demonstrate the ability to manipulate the charge state of the NV centers from $NV^{-}$ to $NV^{0}$ through STM tip-induced gating. This work represents a significant advancement in the atomic-scale understanding and engineering of NV centers, paving the way for future quantum device development.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
Authors:
Farhad Nawaz,
Minjun Sung,
Darshan Gadginmath,
Jovin D'sa,
Sangjae Bae,
David Isele,
Nadia Figueroa,
Nikolai Matni,
Faizan M. Tariq
Abstract:
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. O…
▽ More
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
△ Less
Submitted 7 May, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data
Authors:
Suyoung Bae,
Hyojun Kim,
YunSeok Choi,
Jee-Hyong Lee
Abstract:
In various natural language processing (NLP) tasks, fine-tuning Pre-trained Language Models (PLMs) often leads to the issue of spurious correlations, which negatively impacts performance, particularly when dealing with out-of-distribution data. To address this problem, we propose SALAD}(Structure Aware and LLM-driven Augmented Data), a novel approach designed to enhance model robustness and genera…
▽ More
In various natural language processing (NLP) tasks, fine-tuning Pre-trained Language Models (PLMs) often leads to the issue of spurious correlations, which negatively impacts performance, particularly when dealing with out-of-distribution data. To address this problem, we propose SALAD}(Structure Aware and LLM-driven Augmented Data), a novel approach designed to enhance model robustness and generalization by generating structure-aware and counterfactually augmented data for contrastive learning. Our method leverages a tagging-based approach to generate structure-aware positive samples and utilizes large language models (LLMs) to generate counterfactual negative samples with diverse sentence patterns. By applying contrastive learning, SALAD enables the model to focus on learning the structural relationships between key sentence components while minimizing reliance on spurious correlations. We validate our approach through experiments on three tasks: Sentiment Classification, Sexism Detection, and Natural Language Inference. The results demonstrate that SALAD not only improves model robustness and performance across different environments but also enhances generalization to out-of-distribution datasets and cross-domain scenarios.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Higher-Order Color Voronoi Diagrams and the Colorful Clarkson-Shor Framework
Authors:
Sang Won Bae,
Nicolau Oliver,
Evanthia Papadopoulou
Abstract:
Given a set $S$ of $n$ colored sites, each $s\in S$ associated with a distance-to-site function $δ_s \colon \mathbb{R}^2 \to \mathbb{R}$, we consider two distance-to-color functions for each color: one takes the minimum of $δ_s$ for sites $s\in S$ in that color and the other takes the maximum. These two sets of distance functions induce two families of higher-order Voronoi diagrams for colors in t…
▽ More
Given a set $S$ of $n$ colored sites, each $s\in S$ associated with a distance-to-site function $δ_s \colon \mathbb{R}^2 \to \mathbb{R}$, we consider two distance-to-color functions for each color: one takes the minimum of $δ_s$ for sites $s\in S$ in that color and the other takes the maximum. These two sets of distance functions induce two families of higher-order Voronoi diagrams for colors in the plane, namely, the minimal and maximal order-$k$ color Voronoi diagrams, which include various well-studied Voronoi diagrams as special cases. In this paper, we derive an exact upper bound $4k(n-k)-2n$ on the total number of vertices in both the minimal and maximal order-$k$ color diagrams for a wide class of distance functions $δ_s$ that satisfy certain conditions, including the case of point sites $S$ under convex distance functions and the $L_p$ metric for any $1\leq p \leq\infty$. For the $L_1$ (or, $L_\infty$) metric, and other convex polygonal metrics, we show that the order-$k$ minimal diagram of point sites has $O(\min\{k(n-k), (n-k)^2\})$ complexity, while its maximal counterpart has $O(\min\{k(n-k), k^2\})$ complexity. To obtain these combinatorial results, we extend the Clarkson--Shor framework to colored objects, and demonstrate its application to several fundamental geometric structures, including higher-order color Voronoi diagrams, colored $j$-facets, and levels in the arrangements of piecewise linear/algebraic curves/surfaces. We also present an iterative approach to compute higher-order color Voronoi diagrams.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Impact of newly measured $β$\nobreakdash-delayed neutron emitters around \myisoSimp{78}{Ni} on light element nucleosynthesis in the neutrino-wind following a neutron star merger
Authors:
A. Tolosa-Delgado,
J. L. Tain,
M. Reichert,
A. Arcones,
M. Eichler,
B. C. Rasco,
N. T. Brewer,
K. P. Rykaczewski,
R. Yokoyama,
R. Grzywacz,
I. Dillmann,
J. Agramunt,
D. S. Ahn,
A. Algora,
H. Baba,
S. Bae,
C. G. Bruno,
R. Caballero Folch,
F. Calvino,
P. J. Coleman-Smith,
G. Cortes,
T. Davinson,
C. Domingo-Pardo,
A. Estrade,
N. Fukuda
, et al. (49 additional authors not shown)
Abstract:
Neutron emission probabilities and half-lives of 37 beta-delayed neutron emitters from 75Ni to 92Br were measured at the RIKEN Nishina Center in Japan, including 11 one-neutron and 13 two-neutron emission probabilities and 6 half-lives measured for the first time, which supersede theoretical estimates. These nuclei lie in the path of the weak r-process occurring in neutrino-driven winds from the a…
▽ More
Neutron emission probabilities and half-lives of 37 beta-delayed neutron emitters from 75Ni to 92Br were measured at the RIKEN Nishina Center in Japan, including 11 one-neutron and 13 two-neutron emission probabilities and 6 half-lives measured for the first time, which supersede theoretical estimates. These nuclei lie in the path of the weak r-process occurring in neutrino-driven winds from the accretion disk formed after the merger of two neutron stars, synthesizing elements in the A~80 abundance peak. The presence of such elements dominates the accompanying kilonova emission over the first few days and has been identified in the AT2017gfo event, associated with the gravitational wave detection GW170817.
Abundance calculations based on over 17000 simulated trajectories describing the evolution of matter properties in the merger outflows show that the new data lead to an increase of 50-70 percent in the abundance of Y, Zr, Nb, and Mo. This enhancement is large compared to the scatter of relative abundances observed in old very metal-poor stars and is therefore significant in the comparison with other possible astrophysical processes contributing to light-element production.
These results underline the importance of including experimental decay data for very neutron-rich beta-delayed neutron emitters into r-process models.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning
Authors:
Sanghwan Bae,
Jiwoo Hong,
Min Young Lee,
Hanbyul Kim,
JeongYeon Nam,
Donghyun Kwak
Abstract:
Reasoning-Oriented Reinforcement Learning (RORL) enhances the reasoning ability of Large Language Models (LLMs). However, due to the sparsity of rewards in RORL, effective training is highly dependent on the selection of problems of appropriate difficulty. Although curriculum learning attempts to address this by adjusting difficulty, it often relies on static schedules, and even recent online filt…
▽ More
Reasoning-Oriented Reinforcement Learning (RORL) enhances the reasoning ability of Large Language Models (LLMs). However, due to the sparsity of rewards in RORL, effective training is highly dependent on the selection of problems of appropriate difficulty. Although curriculum learning attempts to address this by adjusting difficulty, it often relies on static schedules, and even recent online filtering methods lack theoretical grounding and a systematic understanding of their effectiveness. In this work, we theoretically and empirically show that curating the batch with the problems that the training model achieves intermediate accuracy on the fly can maximize the effectiveness of RORL training, namely balanced online difficulty filtering. We first derive that the lower bound of the KL divergence between the initial and the optimal policy can be expressed with the variance of the sampled accuracy. Building on those insights, we show that balanced filtering can maximize the lower bound, leading to better performance. Experimental results across five challenging math reasoning benchmarks show that balanced online filtering yields an additional 10% in AIME and 4% improvements in average over plain GRPO. Moreover, further analysis shows the gains in sample efficiency and training time efficiency, exceeding the maximum reward of plain GRPO within 60% training time and the volume of the training set.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Command A: An Enterprise-Ready Large Language Model
Authors:
Team Cohere,
:,
Aakanksha,
Arash Ahmadian,
Marwan Ahmed,
Jay Alammar,
Milad Alizadeh,
Yazeed Alnumay,
Sophia Althammer,
Arkady Arkhangorodsky,
Viraat Aryabumi,
Dennis Aumiller,
Raphaël Avalos,
Zahara Aviv,
Sammie Bae,
Saurabh Baji,
Alexandre Barbet,
Max Bartolo,
Björn Bebensee,
Neeral Beladia,
Walter Beller-Morales,
Alexandre Bérard,
Andrew Berneshawi,
Anna Bialas,
Phil Blunsom
, et al. (205 additional authors not shown)
Abstract:
In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera…
▽ More
In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Generation (RAG) capabilities with grounding and tool use to automate sophisticated business processes. These abilities are achieved through a decentralised training approach, including self-refinement algorithms and model merging techniques. We also include results for Command R7B which shares capability and architectural similarities to Command A. Weights for both models have been released for research purposes. This technical report details our original training pipeline and presents an extensive evaluation of our models across a suite of enterprise-relevant tasks and public benchmarks, demonstrating excellent performance and efficiency.
△ Less
Submitted 14 April, 2025; v1 submitted 1 April, 2025;
originally announced April 2025.
-
Demailly's approximation of general weights
Authors:
Shijie Bao,
Qi'an Guan
Abstract:
In this note, we demonstrate the convergence of the Demailly approximation of a general (weakly) upper semi-continuous weight.
In this note, we demonstrate the convergence of the Demailly approximation of a general (weakly) upper semi-continuous weight.
△ Less
Submitted 2 April, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Spatiotemporal Learning of Brain Dynamics from fMRI Using Frequency-Specific Multi-Band Attention for Cognitive and Psychiatric Applications
Authors:
Sangyoon Bae,
Junbeom Kwon,
Shinjae Yoo,
Jiook Cha
Abstract:
Understanding how the brain's complex nonlinear dynamics give rise to adaptive cognition and behavior is a central challenge in neuroscience. These dynamics exhibit scale-free and multifractal properties, influencing the reconfiguration of neural networks. However, conventional neuroimaging models are constrained by linear and stationary assumptions, limiting their ability to capture these process…
▽ More
Understanding how the brain's complex nonlinear dynamics give rise to adaptive cognition and behavior is a central challenge in neuroscience. These dynamics exhibit scale-free and multifractal properties, influencing the reconfiguration of neural networks. However, conventional neuroimaging models are constrained by linear and stationary assumptions, limiting their ability to capture these processes. Transformer-based architectures, known for capturing long-range dependencies, align well with the brain's hierarchical and temporal organization. We introduce Multi-Band Brain Net (MBBN), a transformer-based framework that models frequency-specific spatiotemporal brain dynamics from fMRI by integrating scale-free network principles with frequency-resolved multi-band self-attention. Trained on three large-scale neuroimaging cohorts (UK Biobank, ABCD, ABIDE) totaling 45,951 individuals, MBBN reveals previously undetectable frequency-dependent network interactions, shedding light on connectivity disruptions in psychiatric conditions (ADHD, ASD, depression). This validation shows robust generalizability and highlights core neural principles conserved across populations. MBBN achieves up to 30.59% higher predictive accuracy than state-of-the-art methods, demonstrating the advantage of frequency-informed spatiotemporal modeling in capturing latent neural computations. MBBN's interpretability uncovers novel frequency-specific biomarkers for neurodevelopmental disorders, providing insights into the hierarchical organization of brain function. By offering an interpretable framework for spatiotemporal learning, MBBN provides insights into how neural computations underpin cognitive function and psychiatric vulnerability, with implications for brain decoding, cognitive neuroscience, and precision psychiatry.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Authors:
Joonhyun Jeong,
Seyun Bae,
Yeonsung Jung,
Jaeryong Hwang,
Eunho Yang
Abstract:
Despite the remarkable versatility of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) to generalize across both language and vision tasks, LLMs and MLLMs have shown vulnerability to jailbreaking, generating textual outputs that undermine safety, ethical, and bias standards when exposed to harmful or sensitive inputs. With the recent advancement of safety alignment via preference-tuning fr…
▽ More
Despite the remarkable versatility of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) to generalize across both language and vision tasks, LLMs and MLLMs have shown vulnerability to jailbreaking, generating textual outputs that undermine safety, ethical, and bias standards when exposed to harmful or sensitive inputs. With the recent advancement of safety alignment via preference-tuning from human feedback, LLMs and MLLMs have been equipped with safety guardrails to yield safe, ethical, and fair responses with regard to harmful inputs. However, despite the significance of safety alignment, research on the vulnerabilities remains largely underexplored. In this paper, we investigate the unexplored vulnerability of the safety alignment, examining its ability to consistently provide safety guarantees for out-of-distribution(OOD)-ifying harmful inputs that may fall outside the aligned data distribution. Our key observation is that OOD-ifying the vanilla harmful inputs highly increases the uncertainty of the model to discern the malicious intent within the input, leading to a higher chance of being jailbroken. Exploiting this vulnerability, we propose JOOD, a new Jailbreak framework via OOD-ifying inputs beyond the safety alignment. We explore various off-the-shelf visual and textual transformation techniques for OOD-ifying the harmful inputs. Notably, we observe that even simple mixing-based techniques such as image mixup prove highly effective in increasing the uncertainty of the model, thereby facilitating the bypass of the safety alignment. Experiments across diverse jailbreak scenarios demonstrate that JOOD effectively jailbreaks recent proprietary LLMs and MLLMs such as GPT-4 and o1 with high attack success rate, which previous attack approaches have consistently struggled to jailbreak. Code is available at https://github.com/naver-ai/JOOD.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
Authors:
Suyoung Bae,
YunSeok Choi,
Jee-Hyong Lee
Abstract:
While Large Language Models (LLMs) excel in zero-shot Question Answering (QA), they tend to expose biases in their internal knowledge when faced with socially sensitive questions, leading to a degradation in performance. Existing zero-shot methods are efficient but fail to consider context and prevent bias propagation in the answers. To address this, we propose DeCAP, a method for debiasing LLMs u…
▽ More
While Large Language Models (LLMs) excel in zero-shot Question Answering (QA), they tend to expose biases in their internal knowledge when faced with socially sensitive questions, leading to a degradation in performance. Existing zero-shot methods are efficient but fail to consider context and prevent bias propagation in the answers. To address this, we propose DeCAP, a method for debiasing LLMs using Context-Adaptive Prompt Generation. DeCAP leverages a Question Ambiguity Detection to take appropriate debiasing actions based on the context and a Neutral Answer Guidance Generation to suppress the LLMs make objective judgments about the context, minimizing the propagation of bias from their internal knowledge. Our various experiments across eight LLMs show that DeCAP achieves state-of-the-art zero-shot debiased QA performance. This demonstrates DeCAP's efficacy in enhancing the fairness and accuracy of LLMs in diverse QA settings.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Rapid Vapor-Assisted Solution Process of Metal-Organic Chalcogenides for High-Performance Light-Emitting Diodes
Authors:
Sang-Hyun Chin,
Daseul Lee,
Donggyu Lee,
Kwanghyun Chung,
Eunjong Yoo,
Tong-Il Kim,
Su Hwan Lee,
Sang Woo Bae,
Young-Hoon Kim,
Yeonjin Yi
Abstract:
Metal-organic chalcogenides (MOCs), robust crystalline assemblies composed of coinage metals, chalcogens and organic ligands, are typically synthesized via prolonged, high temperature tarnishing of vacuum-deposited metal films with organochalcogen precursors. The prolonged exposure to high temperatures and the necessity for direct vacuum deposition of silver can induce damage to the underlying fil…
▽ More
Metal-organic chalcogenides (MOCs), robust crystalline assemblies composed of coinage metals, chalcogens and organic ligands, are typically synthesized via prolonged, high temperature tarnishing of vacuum-deposited metal films with organochalcogen precursors. The prolonged exposure to high temperatures and the necessity for direct vacuum deposition of silver can induce damage to the underlying films, posing significant challenges to the fabrication of optoelectronic devices, despite their cost-effectiveness and chemical robustness. This study introduces vapor-assisted solution processing, a novel chemical vapor deposition method, enabling remarkably rapid fabrication of luminescent MOC films. Furthermore, the first MOC-based light-emitting diodes (MOCLEDs) are realized, achieving an external quantum efficiency (EQE) approaching 0.1% and electroluminescence peaking at 633 nm. These results highlight the potential of MOCs as next-generation emitters for displays and solid-state lighting. This work offers a promising fabrication strategy and insights for advancing MOCLEDs and expanding their optoelectronic potential.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Fourier-Based 3D Multistage Transformer for Aberration Correction in Multicellular Specimens
Authors:
Thayer Alshaabi,
Daniel E. Milkie,
Gaoxiang Liu,
Cyna Shirazinejad,
Jason L. Hong,
Kemal Achour,
Frederik Görlitz,
Ana Milunovic-Jevtic,
Cat Simmons,
Ibrahim S. Abuzahriyeh,
Erin Hong,
Samara Erin Williams,
Nathanael Harrison,
Evan Huang,
Eun Seok Bae,
Alison N. Killilea,
David G. Drubin,
Ian A. Swinburne,
Srigokul Upadhyayula,
Eric Betzig
Abstract:
High-resolution tissue imaging is often compromised by sample-induced optical aberrations that degrade resolution and contrast. While wavefront sensor-based adaptive optics (AO) can measure these aberrations, such hardware solutions are typically complex, expensive to implement, and slow when serially mapping spatially varying aberrations across large fields of view. Here, we introduce AOViFT (Ada…
▽ More
High-resolution tissue imaging is often compromised by sample-induced optical aberrations that degrade resolution and contrast. While wavefront sensor-based adaptive optics (AO) can measure these aberrations, such hardware solutions are typically complex, expensive to implement, and slow when serially mapping spatially varying aberrations across large fields of view. Here, we introduce AOViFT (Adaptive Optical Vision Fourier Transformer) -- a machine learning-based aberration sensing framework built around a 3D multistage Vision Transformer that operates on Fourier domain embeddings. AOViFT infers aberrations and restores diffraction-limited performance in puncta-labeled specimens with substantially reduced computational cost, training time, and memory footprint compared to conventional architectures or real-space networks. We validated AOViFT on live gene-edited zebrafish embryos, demonstrating its ability to correct spatially varying aberrations using either a deformable mirror or post-acquisition deconvolution. By eliminating the need for the guide star and wavefront sensing hardware and simplifying the experimental workflow, AOViFT lowers technical barriers for high-resolution volumetric microscopy across diverse biological samples.
△ Less
Submitted 23 May, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations
Authors:
Piyush Gupta,
Sangjae Bae,
David Isele
Abstract:
The adoption of Large Language Models (LLMs) is rapidly expanding across various tasks that involve inherent graphical structures. Graphs are integral to a wide range of applications, including motion planning for autonomous vehicles, social networks, scene understanding, and knowledge graphs. Many problems, even those not initially perceived as graph-based, can be effectively addressed through gr…
▽ More
The adoption of Large Language Models (LLMs) is rapidly expanding across various tasks that involve inherent graphical structures. Graphs are integral to a wide range of applications, including motion planning for autonomous vehicles, social networks, scene understanding, and knowledge graphs. Many problems, even those not initially perceived as graph-based, can be effectively addressed through graph theory. However, when applied to these tasks, LLMs often encounter challenges, such as hallucinations and mathematical inaccuracies. To overcome these limitations, we propose Graph-Grounded LLMs, a system that improves LLM performance on graph-related tasks by integrating a graph library through function calls. By grounding LLMs in this manner, we demonstrate significant reductions in hallucinations and improved mathematical accuracy in solving graph-based problems, as evidenced by the performance on the NLGraph benchmark. Finally, we showcase a disaster rescue application where the Graph-Grounded LLM acts as a decision-support system.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
"Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding
Authors:
Hyunbin Jin,
Je Won Yeom,
Seunghyun Bae,
Taesup Kim
Abstract:
Large language models (LLMs) exhibit strong reasoning abilities, often attributed to few-shot or zero-shot chain-of-thought (CoT) prompting. While effective, these methods require labor-intensive prompt engineering, raising the question of whether reasoning can be induced without reliance on explicit prompts. In this work, we unlock the reasoning capabilities of LLMs without explicit prompting. In…
▽ More
Large language models (LLMs) exhibit strong reasoning abilities, often attributed to few-shot or zero-shot chain-of-thought (CoT) prompting. While effective, these methods require labor-intensive prompt engineering, raising the question of whether reasoning can be induced without reliance on explicit prompts. In this work, we unlock the reasoning capabilities of LLMs without explicit prompting. Inspired by zero-shot CoT and CoT-decoding, we propose a novel decoding strategy that systematically nudges LLMs to continue reasoning, thereby preventing immature reasoning processes. Specifically, we monitor the model's generation and inject a designated phrase whenever it is likely to conclude its response prematurely, before completing the reasoning process. Our experimental evaluations on diverse reasoning benchmarks demonstrate that our proposed strategy substantially improves LLM reasoning capabilities, highlighting the potential of decoding-based interventions as an alternative to traditional prompting techniques.
△ Less
Submitted 17 March, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Predicting Volleyball Season Performance Using Pre-Season Wearable Data and Machine Learning
Authors:
Melik Ozolcer,
Tongze Zhang,
Sang Won Bae
Abstract:
Predicting performance outcomes has the potential to transform training approaches, inform coaching strategies, and deepen our understanding of the factors that contribute to athletic success. Traditional non-automated data analysis in sports are often difficult to scale. To address this gap, this study analyzes factors influencing athletic performance by leveraging passively collected sensor data…
▽ More
Predicting performance outcomes has the potential to transform training approaches, inform coaching strategies, and deepen our understanding of the factors that contribute to athletic success. Traditional non-automated data analysis in sports are often difficult to scale. To address this gap, this study analyzes factors influencing athletic performance by leveraging passively collected sensor data from smartwatches and ecological momentary assessments (EMA). The study aims to differentiate between 14 collegiate volleyball players who go on to perform well or poorly, using data collected prior to the beginning of the season. This is achieved through an integrated feature set creation approach. The model, validated using leave-one-subject-out cross-validation, achieved promising predictive performance (F1 score = 0.75). Importantly, by utilizing data collected before the season starts, our approach offers an opportunity for players predicted to perform poorly to improve their projected outcomes through targeted interventions by virtue of daily model predictions. The findings from this study not only demonstrate the potential of machine learning in sports performance prediction but also shed light on key features along with subjective psycho-physiological states that are predictive of, or associated with, athletic success.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Formulas for Mutually Orthogonal Quantum States in Two-Qubit Systems: Orthogonal Schmidt Decompositions
Authors:
Yonghae Lee,
Youngho Min,
Sunghyun Bae,
Youngrong Lim
Abstract:
We present Schmidt decomposition formulas for mutually orthogonal two-qubit pure states and classify orthonormal sets based on their entanglement structure. First, we derive explicit Schmidt decomposition formulas for any pure state and extend them to two orthogonal pure states. For three mutually orthogonal states, we provide formulas for specific cases and discuss the challenges of obtaining ana…
▽ More
We present Schmidt decomposition formulas for mutually orthogonal two-qubit pure states and classify orthonormal sets based on their entanglement structure. First, we derive explicit Schmidt decomposition formulas for any pure state and extend them to two orthogonal pure states. For three mutually orthogonal states, we provide formulas for specific cases and discuss the challenges of obtaining analytic expressions for the rest. Additionally, we derive explicit formulas for certain orthonormal bases and analyze those containing one or two maximally entangled states. Finally, we prove that no orthonormal basis can consist of three product states and one entangled state.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Authors:
Haoqiang Kang,
Enna Sachdeva,
Piyush Gupta,
Sangjae Bae,
Kwonjoon Lee
Abstract:
Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while…
▽ More
Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while PPO focuses on maximizing cumulative rewards. These limitations often restrict solution diversity and hinder generalization in multi-step reasoning tasks. To address these challenges, we introduce a novel framework, GFlowVLM, a framework that fine-tune VLMs using Generative Flow Networks (GFlowNets) to promote generation of diverse solutions for complex reasoning tasks. GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. It takes observations and task descriptions as inputs to prompt chain-of-thought (CoT) reasoning which subsequently guides action selection. We use task based rewards to fine-tune VLM with GFlowNets. This approach enables VLMs to outperform prior fine-tuning methods, including SFT and RL. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld), showing enhanced training efficiency, solution diversity, and stronger generalization capabilities across both in-distribution and out-of-distribution scenarios.
△ Less
Submitted 25 March, 2025; v1 submitted 9 March, 2025;
originally announced March 2025.
-
AXAI-CDSS : An Affective Explainable AI-Driven Clinical Decision Support System for Cannabis Use
Authors:
Tongze Zhang,
Tammy Chung,
Anind Dey,
Sang Won Bae
Abstract:
As cannabis use has increased in recent years, researchers have come to rely on sophisticated machine learning models to predict cannabis use behavior and its impact on health. However, many artificial intelligence (AI) models lack transparency and interpretability due to their opaque nature, limiting their trust and adoption in real-world medical applications, such as clinical decision support sy…
▽ More
As cannabis use has increased in recent years, researchers have come to rely on sophisticated machine learning models to predict cannabis use behavior and its impact on health. However, many artificial intelligence (AI) models lack transparency and interpretability due to their opaque nature, limiting their trust and adoption in real-world medical applications, such as clinical decision support systems (CDSS). To address this issue, this paper enhances algorithm explainability underlying CDSS by integrating multiple Explainable Artificial Intelligence (XAI) methods and applying causal inference techniques to clarify the model' predictive decisions under various scenarios. By providing deeper interpretability of the XAI outputs using Large Language Models (LLMs), we provide users with more personalized and accessible insights to overcome the challenges posed by AI's "black box" nature. Our system dynamically adjusts feedback based on user queries and emotional states, combining text-based sentiment analysis with real-time facial emotion recognition to ensure responses are empathetic, context-adaptive, and user-centered. This approach bridges the gap between the learning demands of interpretability and the need for intuitive understanding, enabling non-technical users such as clinicians and clinical researchers to interact effectively with AI models.} Ultimately, this approach improves usability, enhances perceived trustworthiness, and increases the impact of CDSS in healthcare applications.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Unsupervised Waste Classification By Dual-Encoder Contrastive Learning and Multi-Clustering Voting (DECMCV)
Authors:
Kui Huang,
Mengke Song,
Shuo Ba,
Ling An,
Huajie Liang,
Huanxi Deng,
Yang Liu,
Zhenyu Zhang,
Chichun Zhou
Abstract:
Waste classification is crucial for improving processing efficiency and reducing environmental pollution. Supervised deep learning methods are commonly used for automated waste classification, but they rely heavily on large labeled datasets, which are costly and inefficient to obtain. Real-world waste data often exhibit category and style biases, such as variations in camera angles, lighting condi…
▽ More
Waste classification is crucial for improving processing efficiency and reducing environmental pollution. Supervised deep learning methods are commonly used for automated waste classification, but they rely heavily on large labeled datasets, which are costly and inefficient to obtain. Real-world waste data often exhibit category and style biases, such as variations in camera angles, lighting conditions, and types of waste, which can impact the model's performance and generalization ability. Therefore, constructing a bias-free dataset is essential. Manual labeling is not only costly but also inefficient. While self-supervised learning helps address data scarcity, it still depends on some labeled data and generally results in lower accuracy compared to supervised methods. Unsupervised methods show potential in certain cases but typically do not perform as well as supervised models, highlighting the need for an efficient and cost-effective unsupervised approach. This study presents a novel unsupervised method, Dual-Encoder Contrastive Learning with Multi-Clustering Voting (DECMCV). The approach involves using a pre-trained ConvNeXt model for image encoding, leveraging VisionTransformer to generate positive samples, and applying a multi-clustering voting mechanism to address data labeling and domain shift issues. Experimental results demonstrate that DECMCV achieves classification accuracies of 93.78% and 98.29% on the TrashNet and Huawei Cloud datasets, respectively, outperforming or matching supervised models. On a real-world dataset of 4,169 waste images, only 50 labeled samples were needed to accurately label thousands, improving classification accuracy by 29.85% compared to supervised models. This method effectively addresses style differences, enhances model generalization, and contributes to the advancement of automated waste classification.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Delayed-Decision Motion Planning in the Presence of Multiple Predictions
Authors:
David Isele,
Alexandre Miranda Anon,
Faizan M. Tariq,
Goro Yeh,
Avinash Singh,
Sangjae Bae
Abstract:
Reliable automated driving technology is challenged by various sources of uncertainties, in particular, behavioral uncertainties of traffic agents. It is common for traffic agents to have intentions that are unknown to others, leaving an automated driving car to reason over multiple possible behaviors. This paper formalizes a behavior planning scheme in the presence of multiple possible futures wi…
▽ More
Reliable automated driving technology is challenged by various sources of uncertainties, in particular, behavioral uncertainties of traffic agents. It is common for traffic agents to have intentions that are unknown to others, leaving an automated driving car to reason over multiple possible behaviors. This paper formalizes a behavior planning scheme in the presence of multiple possible futures with corresponding probabilities. We present a maximum entropy formulation and show how, under certain assumptions, this allows delayed decision-making to improve safety. The general formulation is then turned into a model predictive control formulation, which is solved as a quadratic program or a set of quadratic programs. We discuss implementation details for improving computation and verify operation in simulation and on a mobile robot.
△ Less
Submitted 6 June, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Compression in 3D Gaussian Splatting: A Survey of Methods, Trends, and Future Directions
Authors:
Muhammad Salman Ali,
Chaoning Zhang,
Marco Cagnazzo,
Giuseppe Valenzise,
Enzo Tartaglione,
Sung-Ho Bae
Abstract:
3D Gaussian Splatting (3DGS) has recently emerged as a pioneering approach in explicit scene rendering and computer graphics. Unlike traditional neural radiance field (NeRF) methods, which typically rely on implicit, coordinate-based models to map spatial coordinates to pixel values, 3DGS utilizes millions of learnable 3D Gaussians. Its differentiable rendering technique and inherent capability fo…
▽ More
3D Gaussian Splatting (3DGS) has recently emerged as a pioneering approach in explicit scene rendering and computer graphics. Unlike traditional neural radiance field (NeRF) methods, which typically rely on implicit, coordinate-based models to map spatial coordinates to pixel values, 3DGS utilizes millions of learnable 3D Gaussians. Its differentiable rendering technique and inherent capability for explicit scene representation and manipulation positions 3DGS as a potential game-changer for the next generation of 3D reconstruction and representation technologies. This enables 3DGS to deliver real-time rendering speeds while offering unparalleled editability levels. However, despite its advantages, 3DGS suffers from substantial memory and storage requirements, posing challenges for deployment on resource-constrained devices. In this survey, we provide a comprehensive overview focusing on the scalability and compression of 3DGS. We begin with a detailed background overview of 3DGS, followed by a structured taxonomy of existing compression methods. Additionally, we analyze and compare current methods from the topological perspective, evaluating their strengths and limitations in terms of fidelity, compression ratios, and computational efficiency. Furthermore, we explore how advancements in efficient NeRF representations can inspire future developments in 3DGS optimization. Finally, we conclude with current research challenges and highlight key directions for future exploration.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
First measurement of 87Rb(α, xn) cross sections at weak r-process energies in supernova ν-driven ejecta to investigate elemental abundances in low-metallicity stars
Authors:
C. Fougères,
M. L. Avila,
A. Psaltis,
M. Anastasiou,
S. Bae,
L. Balliet,
K. Bhatt,
L. Dienis,
H. Jayatissa,
V. Karayonchev,
P. Mohr,
F. Montes,
D. Neto,
F. de Oliveira Santos,
W. -J. Ong,
K. E. Rehm,
W. Reviol,
D. Santiago-Gonzalez,
N. Sensharma,
R. S. Sidhu,
I. A. Tolstukhin
Abstract:
Observed abundances of Z ~ 40 elements in metal-poor stars vary from star to star, indicating that the rapid and slow neutron capture processes may not contribute alone to the synthesis of elements beyond iron. The weak r-process was proposed to produce Z ~ 40 elements in a subset of old stars. Thought to occur in the ν-driven ejecta of a core-collapse supernova, (α, xn) reactions would drive the…
▽ More
Observed abundances of Z ~ 40 elements in metal-poor stars vary from star to star, indicating that the rapid and slow neutron capture processes may not contribute alone to the synthesis of elements beyond iron. The weak r-process was proposed to produce Z ~ 40 elements in a subset of old stars. Thought to occur in the ν-driven ejecta of a core-collapse supernova, (α, xn) reactions would drive the nuclear flow toward heavier masses at T = 2-5 GK. However, current comparisons between modelled and observed yields do not bring satisfactory insights into the stellar environment, mainly due to the uncertainties of the nuclear physics inputs where the dispersion in a given reaction rate often exceeds one order of magnitude. Involved rates are calculated with the statistical model where the choice of an α-optical-model potential (αOMP) leads to such a poor precision. The first experiment on 87Rb(α, xn) reactions at weak r-process energies is reported here. Total inclusive cross sections were assessed at Ec.m. = 8.1 - 13 MeV (3.7 - 7.6 GK) with the active target MUlti-Sampling Ionization Chamber (MUSIC). With a N = 50 seed nucleus, the measured values agree with statistical model estimates using the αOMP Atomki-V2. A re-evaluated reaction rate was incorporated into new nucleosynthesis calculations, focusing on ν-driven ejecta conditions known to be sensitive to this specific rate. These conditions were found to fail to reproduce the lighter-heavy element abundances in metal-poor stars.
△ Less
Submitted 15 February, 2025;
originally announced February 2025.
-
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition
Authors:
Sungnyun Kim,
Kangwook Jang,
Sangmin Bae,
Sungwoo Cho,
Se-Young Yun
Abstract:
Audio-visual speech recognition (AVSR) has become critical for enhancing speech recognition in noisy environments by integrating both auditory and visual modalities. However, existing AVSR systems struggle to scale up without compromising computational efficiency. In this study, we introduce MoHAVE (Mixture of Hierarchical Audio-Visual Experts), a novel robust AVSR framework designed to address th…
▽ More
Audio-visual speech recognition (AVSR) has become critical for enhancing speech recognition in noisy environments by integrating both auditory and visual modalities. However, existing AVSR systems struggle to scale up without compromising computational efficiency. In this study, we introduce MoHAVE (Mixture of Hierarchical Audio-Visual Experts), a novel robust AVSR framework designed to address these scalability constraints. By leveraging a Mixture-of-Experts (MoE) architecture, MoHAVE activates modality-specific expert groups, ensuring dynamic adaptation to various audio-visual inputs with minimal computational overhead. Key contributions of MoHAVE include: (1) a sparse MoE framework that efficiently scales AVSR model capacity, (2) a hierarchical gating mechanism that dynamically utilizes the expert groups based on input context, enhancing adaptability and robustness, and (3) remarkable performance across robust AVSR benchmarks, including LRS3 and MuAViC transcription and translation tasks, setting a new standard for scalable speech recognition systems.
△ Less
Submitted 21 May, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Predictive Planner for Autonomous Driving with Consistency Models
Authors:
Anjian Li,
Sangjae Bae,
David Isele,
Ryne Beeson,
Faizan M. Tariq
Abstract:
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments. Traditional approaches often treat them separately, limiting the ability for interactive planning. While recent diffusion-based generative models have shown promise in multi-agent trajectory generation, their slow sampling is less suitable for high-frequency planning…
▽ More
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments. Traditional approaches often treat them separately, limiting the ability for interactive planning. While recent diffusion-based generative models have shown promise in multi-agent trajectory generation, their slow sampling is less suitable for high-frequency planning tasks. In this paper, we leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle's navigational goal. Trained on real-world human driving datasets, our consistency model generates higher-quality trajectories with fewer sampling steps than standard diffusion models, making it more suitable for real-time deployment. To enforce multiple planning constraints simultaneously on the ego trajectory, a novel online guided sampling approach inspired by the Alternating Direction Method of Multipliers (ADMM) is introduced. Evaluated on the Waymo Open Motion Dataset (WOMD), our method enables proactive behavior such as nudging and yielding, and also demonstrates smoother, safer, and more efficient trajectories and satisfaction of multiple constraints under a limited computational budget.
△ Less
Submitted 2 May, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Contextual Scenario Generation for Two-Stage Stochastic Programming
Authors:
David Islip,
Roy H. Kwon,
Sanghyeon Bae,
Woo Chang Kim
Abstract:
Two-stage stochastic programs (2SPs) are important tools for making decisions under uncertainty. Decision-makers use contextual information to generate a set of scenarios to represent the true conditional distribution. However, the number of scenarios required is a barrier to implementing 2SPs, motivating the problem of generating a small set of surrogate scenarios that yield high-quality decision…
▽ More
Two-stage stochastic programs (2SPs) are important tools for making decisions under uncertainty. Decision-makers use contextual information to generate a set of scenarios to represent the true conditional distribution. However, the number of scenarios required is a barrier to implementing 2SPs, motivating the problem of generating a small set of surrogate scenarios that yield high-quality decisions when they represent uncertainty. Current scenario generation approaches do not leverage contextual information or do not address computational concerns. In response, we propose contextual scenario generation (CSG) to learn a mapping between the context and a set of surrogate scenarios of user-specified size. First, we propose a distributional approach that learns the mapping by minimizing a distributional distance between the predicted surrogate scenarios and the true contextual distribution. Second, we propose a task-based approach that aims to produce surrogate scenarios that yield high-quality decisions. The task-based approach uses neural architectures to approximate the downstream objective and leverages the approximation to search for the mapping. The proposed approaches apply to various problem structures and loosely only require efficient solving of the associated subproblems and 2SPs defined on the reduced scenario sets. Numerical experiments demonstrating the effectiveness of the proposed methods are presented.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Enhanced Feature-based Image Stitching for Endoscopic Videos in Pediatric Eosinophilic Esophagitis
Authors:
Juming Xiong,
Muyang Li,
Ruining Deng,
Tianyuan Yao,
Shunxing Bao,
Regina N Tyree,
Girish Hiremath,
Yuankai Huo
Abstract:
Video endoscopy represents a major advance in the investigation of gastrointestinal diseases. Reviewing endoscopy videos often involves frequent adjustments and reorientations to piece together a complete view, which can be both time-consuming and prone to errors. Image stitching techniques address this issue by providing a continuous and complete visualization of the examined area. However, endos…
▽ More
Video endoscopy represents a major advance in the investigation of gastrointestinal diseases. Reviewing endoscopy videos often involves frequent adjustments and reorientations to piece together a complete view, which can be both time-consuming and prone to errors. Image stitching techniques address this issue by providing a continuous and complete visualization of the examined area. However, endoscopic images, particularly those of the esophagus, present unique challenges. The smooth surface, lack of distinct feature points, and non-horizontal orientation complicate the stitching process, rendering traditional feature-based methods often ineffective for these types of images. In this paper, we propose a novel preprocessing pipeline designed to enhance endoscopic image stitching through advanced computational techniques. Our approach converts endoscopic video data into continuous 2D images by following four key steps: (1) keyframe selection, (2) image rotation adjustment to correct distortions, (3) surface unwrapping using polar coordinate transformation to generate a flat image, and (4) feature point matching enhanced by Adaptive Histogram Equalization for improved feature detection. We evaluate stitching quality through the assessment of valid feature point match pairs. Experiments conducted on 20 pediatric endoscopy videos demonstrate that our method significantly improves image alignment and stitching quality compared to traditional techniques, laying a robust foundation for more effective panoramic image creation.
△ Less
Submitted 13 February, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
Triple-Q state in magnetic breathing kagome lattice
Authors:
Hangyu Zhou,
Manuel dos Santos Dias,
Shijian Bao,
Hanchen Lu,
Youguang Zhang,
Weisheng Zhao,
Samir Lounis
Abstract:
Magnetic frustration in two-dimensional spin lattices with triangular motifs underpins a series of exotic states, ranging from multi-Q configurations to disordered spin-glasses. The antiferromagnetic kagome lattice, characterized by its network of corner-sharing triangles, represents a paradigmatic frustrated system exhibiting macroscopic degeneracy. Expanding upon the kagomerization mechanism, we…
▽ More
Magnetic frustration in two-dimensional spin lattices with triangular motifs underpins a series of exotic states, ranging from multi-Q configurations to disordered spin-glasses. The antiferromagnetic kagome lattice, characterized by its network of corner-sharing triangles, represents a paradigmatic frustrated system exhibiting macroscopic degeneracy. Expanding upon the kagomerization mechanism, we focus on the magnetic breathing kagome lattice formed by a Mn monolayer deposited on a heavy metal substrate and capped with h-BN. The Mn kagome arrangement induces pronounced magnetic frustration, as evidenced by the nearly flat bands derived from spin spiral energy calculations. Including further-neighbor interactions reveals a spin spiral energy minimum along the $Γ$-K line and an intriguing triple-Q state with nonzero topological charge, potentially leading to highly nonlinear Hall effects. Furthermore, the flat band properties can further give rise to an even more complex spin configuration, marked by several Q-pockets in the spin structure factor. These results present a fertile ground for advancing the study of multi-Q states and exploring emergent topological phenomena.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.