-
Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation
Authors:
Jinxing Zhou,
Zhihui Li,
Yongqiang Yu,
Yanghao Zhou,
Ruohao Guo,
Guangyao Li,
Yuxin Mao,
Mingfei Han,
Xiaojun Chang,
Meng Wang
Abstract:
We present \textbf{Met}a-\textbf{T}oken \textbf{Le}arning (Mettle), a simple and memory-efficient method for adapting large-scale pretrained transformer models to downstream audio-visual tasks. Instead of sequentially modifying the output feature distribution of the transformer backbone, Mettle utilizes a lightweight \textit{Layer-Centric Distillation (LCD)} module to distill in parallel the intac…
▽ More
We present \textbf{Met}a-\textbf{T}oken \textbf{Le}arning (Mettle), a simple and memory-efficient method for adapting large-scale pretrained transformer models to downstream audio-visual tasks. Instead of sequentially modifying the output feature distribution of the transformer backbone, Mettle utilizes a lightweight \textit{Layer-Centric Distillation (LCD)} module to distill in parallel the intact audio or visual features embedded by each transformer layer into compact meta-tokens. This distillation process considers both pretrained knowledge preservation and task-specific adaptation. The obtained meta-tokens can be directly applied to classification tasks, such as audio-visual event localization and audio-visual video parsing. To further support fine-grained segmentation tasks, such as audio-visual segmentation, we introduce a \textit{Meta-Token Injection (MTI)} module, which utilizes the audio and visual meta-tokens distilled from the top transformer layer to guide feature adaptation in earlier layers. Extensive experiments on multiple audiovisual benchmarks demonstrate that our method significantly reduces memory usage and training time while maintaining parameter efficiency and competitive accuracy.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover
Authors:
Youzhuo Wang,
Jiayi Ye,
Chuyang Xiao,
Yiming Zhong,
Heng Tao,
Hang Yu,
Yumeng Liu,
Jingyi Yu,
Yuexin Ma
Abstract:
Handover between a human and a dexterous robotic hand is a fundamental yet challenging task in human-robot collaboration. It requires handling dynamic environments and a wide variety of objects and demands robust and adaptive grasping strategies. However, progress in developing effective dynamic dexterous grasping methods is limited by the absence of high-quality, real-world human-to-robot handove…
▽ More
Handover between a human and a dexterous robotic hand is a fundamental yet challenging task in human-robot collaboration. It requires handling dynamic environments and a wide variety of objects and demands robust and adaptive grasping strategies. However, progress in developing effective dynamic dexterous grasping methods is limited by the absence of high-quality, real-world human-to-robot handover datasets. Existing datasets primarily focus on grasping static objects or rely on synthesized handover motions, which differ significantly from real-world robot motion patterns, creating a substantial gap in applicability. In this paper, we introduce DexH2R, a comprehensive real-world dataset for human-to-robot handovers, built on a dexterous robotic hand. Our dataset captures a diverse range of interactive objects, dynamic motion patterns, rich visual sensor data, and detailed annotations. Additionally, to ensure natural and human-like dexterous motions, we utilize teleoperation for data collection, enabling the robot's movements to align with human behaviors and habits, which is a crucial characteristic for intelligent humanoid robots. Furthermore, we propose an effective solution, DynamicGrasp, for human-to-robot handover and evaluate various state-of-the-art approaches, including auto-regressive models and diffusion policy methods, providing a thorough comparison and analysis. We believe our benchmark will drive advancements in human-to-robot handover research by offering a high-quality dataset, effective solutions, and comprehensive evaluation metrics.
△ Less
Submitted 2 July, 2025; v1 submitted 29 June, 2025;
originally announced June 2025.
-
Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format
Authors:
Dingzirui Wang,
Xuanliang Zhang,
Rongyu Cao,
Longxu Dou,
Xianzhen Luo,
Yingwei Ma,
Qingfu Zhu,
Wanxiang Che,
Binhua Li,
Fei Huang,
Yongbin Li
Abstract:
Generating and voting multiple answers is an effective method to mitigate reasoning inconsistencies of large language models (LLMs). Prior works have shown that multiple reasoning formats outperform a single format when generating multiple answers. However, previous works using multiple formats rely on formats labeled by humans, which could be unsuitable for all tasks and have high labeling costs.…
▽ More
Generating and voting multiple answers is an effective method to mitigate reasoning inconsistencies of large language models (LLMs). Prior works have shown that multiple reasoning formats outperform a single format when generating multiple answers. However, previous works using multiple formats rely on formats labeled by humans, which could be unsuitable for all tasks and have high labeling costs. To address this issue, we adapt suitable formats to the given tasks by generating and selecting formats. We first propose how to measure the reasoning error when generating multiple answers. Then, we introduce Format-Adapter, which utilizes LLMs to generate and select suitable reasoning formats by minimizing the error measurement we present. We conduct experiments on math and commonsense reasoning tasks, where Format-Adapter achieves a 4.3% performance improvement on average over previous works, demonstrating the effectiveness.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
GamerAstra: Enhancing Video Game Accessibility for Blind and Low-Vision Players through a Multi-Agent AI Framework
Authors:
Tianrun Qiu,
Changxin Chen,
Sizhe Cheng,
Yiming Yang,
Yixiao Guo,
Zhicong Lu,
Yuxin Ma
Abstract:
Blind and low-vision (BLV) players encounter critical challenges in engaging with video games due to the inaccessibility of visual elements, difficulties in navigating interfaces, and limitations in sending interaction input. Moreover, the development of specialized accessibility features typically requires substantial programming effort and is often implemented on a game-by-game basis. To address…
▽ More
Blind and low-vision (BLV) players encounter critical challenges in engaging with video games due to the inaccessibility of visual elements, difficulties in navigating interfaces, and limitations in sending interaction input. Moreover, the development of specialized accessibility features typically requires substantial programming effort and is often implemented on a game-by-game basis. To address these challenges, we introduce \textit{GamerAstra}, a generalized accessibility framework that leverages a multi-agent design to facilitate access to video games for BLV players. It integrates multi-modal techniques including large language models and vision-language models, enabling interaction with games lacking native accessibility support. The framework further incorporates customizable assistance granularities to support varying degrees of visual impairment and enhances interface navigation through multiple input modalities. The evaluation through technical assessments and user studies indicate that \textit{GamerAstra} effectively enhances playability and delivers a more immersive gaming experience for BLV players. These findings also underscore potential avenues for advancing intelligent accessibility frameworks in the gaming domain.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Learning Truthful Mechanisms without Discretization
Authors:
Yunxuan Ma,
Siqiang Wang,
Zhijian Duan,
Yukun Cheng,
Xiaotie Deng
Abstract:
This paper introduces TEDI (Truthful, Expressive, and Dimension-Insensitive approach), a discretization-free algorithm to learn truthful and utility-maximizing mechanisms. Existing learning-based approaches often rely on discretization of outcome spaces to ensure truthfulness, which leads to inefficiency with increasing problem size. To address this limitation, we formalize the concept of pricing…
▽ More
This paper introduces TEDI (Truthful, Expressive, and Dimension-Insensitive approach), a discretization-free algorithm to learn truthful and utility-maximizing mechanisms. Existing learning-based approaches often rely on discretization of outcome spaces to ensure truthfulness, which leads to inefficiency with increasing problem size. To address this limitation, we formalize the concept of pricing rules, defined as functions that map outcomes to prices. Based on this concept, we propose a novel menu mechanism, which can be equivalent to a truthful direct mechanism under specific conditions. The core idea of TEDI lies in its parameterization of pricing rules using Partial GroupMax Network, a new network architecture designed to universally approximate partial convex functions. To learn optimal pricing rules, we develop novel training techniques, including covariance trick and continuous sampling, to derive unbiased gradient estimators compatible with first-order optimization. Theoretical analysis establishes that TEDI guarantees truthfulness, full expressiveness, and dimension-insensitivity. Experimental evaluation in the studied auction setting demonstrates that TEDI achieves strong performance, competitive with or exceeding state-of-the-art methods.
This work presents the first approaches to learn truthful mechanisms without outcome discretization, thereby enhancing algorithmic efficiency. The proposed concepts, network architecture, and learning techniques might offer potential value and provide new insights for automated mechanism design and differentiable economics.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
The mechanics of disclination emergence in 3D active nematics
Authors:
Yingyou Ma,
Christopher Amey,
Aparna Baskaran,
Michael F. Hagan
Abstract:
The spontaneous creation of disclinations is a defining characteristic of active nematics, which is rarely observed in equilibrium systems or other active matter systems. Thus, understanding the mechanics of disclinations is crucial for developing reliable continuum theories and practical applications. In this work, we explore this intrinsic mechanics by performing large-scale 3D simulations of a…
▽ More
The spontaneous creation of disclinations is a defining characteristic of active nematics, which is rarely observed in equilibrium systems or other active matter systems. Thus, understanding the mechanics of disclinations is crucial for developing reliable continuum theories and practical applications. In this work, we explore this intrinsic mechanics by performing large-scale 3D simulations of a particle-based model of active semiflexible filaments. We investigate the effects of filament stiffness and activity on the collective behavior of active nematics. Analysis of the steady state and the topological properties of initial disclination loops reveals that the system is governed by a single parameter, an activity-dependent effective stiffness. Then, we develop a method to visualize director field orientations in a physically transparent manner during the formation of disclination loops. Based on this, we establish a unified theory for the mechanics of disclination emergence, across the range of bend and twist. This disclination analysis framework can also be applied to diverse other 3D liquid crystal systems.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Deep Hedging to Manage Tail Risk
Authors:
Yuming Ma
Abstract:
Extending Buehler et al.'s 2019 Deep Hedging paradigm, we innovatively employ deep neural networks to parameterize convex-risk minimization (CVaR/ES) for the portfolio tail-risk hedging problem. Through comprehensive numerical experiments on crisis-era bootstrap market simulators -- customizable with transaction costs, risk budgets, liquidity constraints, and market impact -- our end-to-end framew…
▽ More
Extending Buehler et al.'s 2019 Deep Hedging paradigm, we innovatively employ deep neural networks to parameterize convex-risk minimization (CVaR/ES) for the portfolio tail-risk hedging problem. Through comprehensive numerical experiments on crisis-era bootstrap market simulators -- customizable with transaction costs, risk budgets, liquidity constraints, and market impact -- our end-to-end framework not only achieves significant one-day 99% CVaR reduction but also yields practical insights into friction-aware strategy adaptation, demonstrating robustness and operational viability in realistic markets.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Unsupervised Learning-Based Joint Resource Allocation and Beamforming Design for RIS-Assisted MISO-OFDMA Systems
Authors:
Yu Ma,
Xingyu Zhou,
Xiao Li,
Le Liang,
Shi Jin
Abstract:
Reconfigurable intelligent surfaces (RIS) are key enablers for 6G wireless systems. This paper studies downlink transmission in an RIS-assisted MISO-OFDMA system, addressing resource allocation challenges. A two-stage unsupervised learning-based framework is proposed to jointly design RIS phase shifts, BS beamforming, and resource block (RB) allocation. The framework includes BeamNet, which predic…
▽ More
Reconfigurable intelligent surfaces (RIS) are key enablers for 6G wireless systems. This paper studies downlink transmission in an RIS-assisted MISO-OFDMA system, addressing resource allocation challenges. A two-stage unsupervised learning-based framework is proposed to jointly design RIS phase shifts, BS beamforming, and resource block (RB) allocation. The framework includes BeamNet, which predicts RIS phase shifts from CSI, and AllocationNet, which allocates RBs using equivalent CSI derived from BeamNet outputs. Active beamforming is implemented via maximum ratio transmission and water-filling. To handle discrete constraints while ensuring differentiability, quantization and the Gumbel-softmax trick are adopted. A customized loss and phased training enhance performance under QoS constraints. Simulations show the method achieves 99.93% of the sum rate of the SCA baseline with only 0.036% of its runtime, and it remains robust across varying channel and user conditions.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Spin polarization from nucleon-nucleon scatterings in intermediate-energy heavy-ion collisions
Authors:
Rong-Jun Liu,
Jun Xu,
Yu-Gang Ma
Abstract:
We propose a new mechanism of generating spin polarization in heavy-ion collisions dominated by nucleon degree of freedom. By incorporating the spin change in nucleon-nucleon scatterings based on the phase shift data together with the constraint of rigorous angular momentum conservation and Pauli blocking, we illustrate through a Boltzmann-Uehling-Uhlenbeck transport model that appreciable spin po…
▽ More
We propose a new mechanism of generating spin polarization in heavy-ion collisions dominated by nucleon degree of freedom. By incorporating the spin change in nucleon-nucleon scatterings based on the phase shift data together with the constraint of rigorous angular momentum conservation and Pauli blocking, we illustrate through a Boltzmann-Uehling-Uhlenbeck transport model that appreciable spin polarization (about $1 \sim 2\%$) can be generated in intermediate-energy heavy-ion collisions. This mechanism, together with the nuclear spin-orbit potential, may help to understand the spin polarization in few-GeV heavy-ion collisions dominated by nucleon degree of freedom.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Hybrid Constellation Modulation for Symbol-Level Precoding in RIS-Enhanced MU-MISO Systems
Authors:
Yupeng Zheng,
Yi Ma,
Rahim Tafazolli
Abstract:
The application of symbol-level precoding (SLP) in reconfigurable intelligent surfaces (RIS) enhanced multi-user multiple-input single-output (MU-MISO) systems faces two main challenges. First, the state-of-the-art joint reflecting and SLP optimization approach requires exhaustive enumeration of all possible transmit symbol combinations, resulting in scalability issues as the modulation order and…
▽ More
The application of symbol-level precoding (SLP) in reconfigurable intelligent surfaces (RIS) enhanced multi-user multiple-input single-output (MU-MISO) systems faces two main challenges. First, the state-of-the-art joint reflecting and SLP optimization approach requires exhaustive enumeration of all possible transmit symbol combinations, resulting in scalability issues as the modulation order and number of users increase. Second, conventional quadrature amplitude modulation (QAM) exhibits strict constructive interference (CI) regions, limiting its effectiveness for CI exploitation in SLP. To address these challenges, this paper proposes a novel modulation scheme, termed hybrid-constellation modulation (HCM), which has a structure of superposed QAM and ASK sub-constellations (SCs). HCM extends the CI regions compared to QAM. Additionally, a two-stage reflecting and SLP optimization method is developed to support HCM. The proposed methods are designed for practical RIS with discrete phase shifts and has good scalability. Simulation results show that HCM achieves up to 1.5 dB and 1 dB SER gains over QAM with modulation order 16 and 64, respectively.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
AnyAni: An Interactive System with Generative AI for Animation Effect Creation and Code Understanding in Web Development
Authors:
Tianrun Qiu,
Yuxin Ma
Abstract:
Generative AI assistants have been widely used in front-end programming. However, besides code writing, developers often encounter the need to generate animation effects. As novices in creative design without the assistance of professional designers, developers typically face difficulties in describing, designing, and implementing desired animations. To address this issue, we conducted a formative…
▽ More
Generative AI assistants have been widely used in front-end programming. However, besides code writing, developers often encounter the need to generate animation effects. As novices in creative design without the assistance of professional designers, developers typically face difficulties in describing, designing, and implementing desired animations. To address this issue, we conducted a formative study (N=6) to identify the challenges that code developers face when dealing with animation design issues. Then, we introduce AnyAni, a human-AI collaborative system that supports front-end developers in the ideation, manipulation, and implementation of animation effects. The system combines the assistance of generative AI in creative design by adopting a nonlinear workflow for iterative animation development. In addition, developers can understand and learn the code generated for implementing animations through various interactive methods. A user study (N=9) demonstrated the usability of AnyAni in animation effect creation support for developers.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation
Authors:
Yingzhi He,
Xiaohao Liu,
An Zhang,
Yunshan Ma,
Tat-Seng Chua
Abstract:
Sequential recommendation aims to predict users' future interactions by modeling collaborative filtering (CF) signals from historical behaviors of similar users or items. Traditional sequential recommenders predominantly rely on ID-based embeddings, which capture CF signals through high-order co-occurrence patterns. However, these embeddings depend solely on past interactions, lacking transferable…
▽ More
Sequential recommendation aims to predict users' future interactions by modeling collaborative filtering (CF) signals from historical behaviors of similar users or items. Traditional sequential recommenders predominantly rely on ID-based embeddings, which capture CF signals through high-order co-occurrence patterns. However, these embeddings depend solely on past interactions, lacking transferable knowledge to generalize to unseen domains. Recent advances in large language models (LLMs) have motivated text-based recommendation approaches that derive item representations from textual descriptions. While these methods enhance generalization, they fail to encode CF signals-i.e., latent item correlations and preference patterns-crucial for effective recommendation. We argue that an ideal embedding model should seamlessly integrate CF signals with rich semantic representations to improve both in-domain and out-of-domain recommendation performance.
To this end, we propose LLM2Rec, a novel embedding model tailored for sequential recommendation, integrating the rich semantic understanding of LLMs with CF awareness. Our approach follows a two-stage training framework: (1) Collaborative Supervised Fine-tuning, which adapts LLMs to infer item relationships based on historical interactions, and (2) Item-level Embedding Modeling, which refines these specialized LLMs into structured item embedding models that encode both semantic and collaborative information. Extensive experiments on real-world datasets demonstrate that LLM2Rec effectively improves recommendation quality across both in-domain and out-of-domain settings. Our findings highlight the potential of leveraging LLMs to build more robust, generalizable embedding models for sequential recommendation. Our codes are available at https://github.com/HappyPointer/LLM2Rec.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Digital Gatekeepers: Exploring Large Language Model's Role in Immigration Decisions
Authors:
Yicheng Mao,
Yang Zhao
Abstract:
With globalization and increasing immigrant populations, immigration departments face significant work-loads and the challenge of ensuring fairness in decision-making processes. Integrating artificial intelligence offers a promising solution to these challenges. This study investigates the potential of large language models (LLMs),such as GPT-3.5 and GPT-4, in supporting immigration decision-makin…
▽ More
With globalization and increasing immigrant populations, immigration departments face significant work-loads and the challenge of ensuring fairness in decision-making processes. Integrating artificial intelligence offers a promising solution to these challenges. This study investigates the potential of large language models (LLMs),such as GPT-3.5 and GPT-4, in supporting immigration decision-making. Utilizing a mixed-methods approach,this paper conducted discrete choice experiments and in-depth interviews to study LLM decision-making strategies and whether they are fair. Our findings demonstrate that LLMs can align their decision-making with human strategies, emphasizing utility maximization and procedural fairness. Meanwhile, this paper also reveals that while ChatGPT has safeguards to prevent unintentional discrimination, it still exhibits stereotypes and biases concerning nationality and shows preferences toward privileged group. This dual analysis highlights both the potential and limitations of LLMs in automating and enhancing immigration decisions.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Cluster-Aware Two-Stage Method for Fast Iterative MIMO Detection in LEO Satellite Communications
Authors:
Jiuyu Liu,
Yi Ma,
Qihao Peng,
Rahim Tafazolli
Abstract:
In this paper, a cluster-aware two-stage multiple-input multiple-output (MIMO) detection method is proposed for direct-to-cell satellite communications. The method achieves computational efficiency by exploiting a distinctive property of satellite MIMO channels: users within the same geographical cluster exhibit highly correlated channel characteristics due to their physical proximity, which typic…
▽ More
In this paper, a cluster-aware two-stage multiple-input multiple-output (MIMO) detection method is proposed for direct-to-cell satellite communications. The method achieves computational efficiency by exploiting a distinctive property of satellite MIMO channels: users within the same geographical cluster exhibit highly correlated channel characteristics due to their physical proximity, which typically impedes convergence in conventional iterative MIMO detectors. The proposed method implements a two-stage strategy that first eliminates intra-cluster interference using computationally efficient small matrix inversions, then utilizes these pre-computed matrices to accelerate standard iterative MIMO detectors such as Gauss-Seidel (GS) and symmetric successive over-relaxation (SSOR) for effective inter-cluster interference cancellation. Computer simulations demonstrate that the proposed method achieves more than 12 times faster convergence under perfect channel state information. Even when accounting for channel estimation errors, the method maintains 9 times faster convergence, demonstrating its robustness and effectiveness for next-generation satellite MIMO communications.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Survival analysis under label shift
Authors:
Yuxiang Zong,
Yanyuan Ma,
Ingrid Van Keilegom
Abstract:
Let P represent the source population with complete data, containing covariate $\mathbf{Z}$ and response $T$, and Q the target population, where only the covariate $\mathbf{Z}$ is available. We consider a setting with both label shift and label censoring. Label shift assumes that the marginal distribution of $T$ differs between $P$ and $Q$, while the conditional distribution of $\mathbf{Z}$ given…
▽ More
Let P represent the source population with complete data, containing covariate $\mathbf{Z}$ and response $T$, and Q the target population, where only the covariate $\mathbf{Z}$ is available. We consider a setting with both label shift and label censoring. Label shift assumes that the marginal distribution of $T$ differs between $P$ and $Q$, while the conditional distribution of $\mathbf{Z}$ given $T$ remains the same. Label censoring refers to the case where the response $T$ in $P$ is subject to random censoring. Our goal is to leverage information from the label-shifted and label-censored source population $P$ to conduct statistical inference in the target population $Q$. We propose a parametric model for $T$ given $\mathbf{Z}$ in $Q$ and estimate the model parameters by maximizing an approximate likelihood. This allows for statistical inference in $Q$ and accommodates a range of classical survival models. Under the label shift assumption, the likelihood depends not only on the unknown parameters but also on the unknown distribution of $T$ in $P$ and $\mathbf{Z}$ in $Q$, which we estimate nonparametrically. The asymptotic properties of the estimator are rigorously established and the effectiveness of the method is demonstrated through simulations and a real data application. This work is the first to combine survival analysis with label shift, offering a new research direction in this emerging topic.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Ferroelectricity in 6 Angstrom-Thick Two-dimensional Ga$_2$O$_3$
Authors:
Tong Jiang,
Han Chen,
Yubo Yuan,
Xiang Xu,
Junwei Cao,
Hao Wang,
Xuechun Sun,
Junshuai Li,
Yaqing Ma,
Huaze Zhu,
Wenbin Li,
Wei Kong
Abstract:
Atomic-scale ferroelectric thin films hold great promise for high-density, low-power applications but face stability and voltage scaling challenges at extreme thinness. Here, we demonstrate ferroelectricity in single-crystalline two-dimensional (2D) Ga$_2$O$_3$, an ultra-wide-bandgap semiconductor, at just 6 angstrom thickness, exhibiting exceptional retention and thermal stability. We show that e…
▽ More
Atomic-scale ferroelectric thin films hold great promise for high-density, low-power applications but face stability and voltage scaling challenges at extreme thinness. Here, we demonstrate ferroelectricity in single-crystalline two-dimensional (2D) Ga$_2$O$_3$, an ultra-wide-bandgap semiconductor, at just 6 angstrom thickness, exhibiting exceptional retention and thermal stability. We show that epitaxial beta-Ga$_2$O$_3$ can be exfoliated down to a half-unit cell thickness via a self-limiting mechanism, enabling a biaxial strain-induced phase transition into a novel ferroelectric layered structure. Strain modulation enables the reduction of polarization switching voltage to 0.8 V, meeting CMOS voltage scaling requirements. Theoretical calculations reveal that switching is driven by covalent bond reconstruction, effectively countering depolarization and enhancing stability. Additionally, we integrate ferroelectric 2D Ga$_2$O$_3$ onto silicon using a low-temperature, back-end-of-line-compatible process. This work advances the exploration of sub-nanometer ferroelectrics, paving the way for high-density, low-power, non-volatile applications seamlessly integrated with advanced silicon technology.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
RecCoT: Enhancing Recommendation via Chain-of-Thought
Authors:
Shuo Yang,
Jiangxia Cao,
Haipeng Li,
Yuqi Mao,
Shuchao Pang
Abstract:
In real-world applications, users always interact with items in multiple aspects, such as through implicit binary feedback (e.g., clicks, dislikes, long views) and explicit feedback (e.g., comments, reviews). Modern recommendation systems (RecSys) learn user-item collaborative signals from these implicit feedback signals as a large-scale binary data-streaming, subsequently recommending other highl…
▽ More
In real-world applications, users always interact with items in multiple aspects, such as through implicit binary feedback (e.g., clicks, dislikes, long views) and explicit feedback (e.g., comments, reviews). Modern recommendation systems (RecSys) learn user-item collaborative signals from these implicit feedback signals as a large-scale binary data-streaming, subsequently recommending other highly similar items based on users' personalized historical interactions. However, from this collaborative-connection perspective, the RecSys does not focus on the actual content of the items themselves but instead prioritizes higher-probability signals of behavioral co-occurrence among items. Consequently, under this binary learning paradigm, the RecSys struggles to understand why a user likes or dislikes certain items. To alleviate it, some works attempt to utilize the content-based reviews to capture the semantic knowledge to enhance recommender models. However, most of these methods focus on predicting the ratings of reviews, but do not provide a human-understandable explanation.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
Authors:
Xiao Zhang,
Yongqiang Ma,
Haodong Jing,
Nanning Zheng
Abstract:
Compositional Zero-Shot Learning (CZSL) investigates compositional generalization capacity to recognize unknown state-object pairs based on learned primitive concepts. Existing CZSL methods typically derive primitives features through a simple composition-prototype mapping, which is suboptimal for a set of individuals that can be divided into distinct semantic subsets. Moreover, the all-to-one cro…
▽ More
Compositional Zero-Shot Learning (CZSL) investigates compositional generalization capacity to recognize unknown state-object pairs based on learned primitive concepts. Existing CZSL methods typically derive primitives features through a simple composition-prototype mapping, which is suboptimal for a set of individuals that can be divided into distinct semantic subsets. Moreover, the all-to-one cross-modal primitives matching neglects compositional divergence within identical states or objects, limiting fine-grained image-composition alignment. In this study, we propose EVA, a Mixture-of-Experts Semantic Variant Alignment framework for CZSL. Specifically, we introduce domain-expert adaption, leveraging multiple experts to achieve token-aware learning and model high-quality primitive representations. To enable accurate compositional generalization, we further present semantic variant alignment to select semantically relevant representation for image-primitives matching. Our method significantly outperforms other state-of-the-art CZSL methods on three popular benchmarks in both closed- and open-world settings, demonstrating the efficacy of the proposed insight.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
SPA: Towards More Stealth and Persistent Backdoor Attacks in Federated Learning
Authors:
Chengcheng Zhu,
Ye Li,
Bosen Rao,
Jiale Zhang,
Yunlong Mao,
Sheng Zhong
Abstract:
Federated Learning (FL) has emerged as a leading paradigm for privacy-preserving distributed machine learning, yet the distributed nature of FL introduces unique security challenges, notably the threat of backdoor attacks. Existing backdoor strategies predominantly rely on end-to-end label supervision, which, despite their efficacy, often results in detectable feature disentanglement and limited p…
▽ More
Federated Learning (FL) has emerged as a leading paradigm for privacy-preserving distributed machine learning, yet the distributed nature of FL introduces unique security challenges, notably the threat of backdoor attacks. Existing backdoor strategies predominantly rely on end-to-end label supervision, which, despite their efficacy, often results in detectable feature disentanglement and limited persistence. In this work, we propose a novel and stealthy backdoor attack framework, named SPA, which fundamentally departs from traditional approaches by leveraging feature-space alignment rather than direct trigger-label association. Specifically, SPA reduces representational distances between backdoor trigger features and target class features, enabling the global model to misclassify trigger-embedded inputs with high stealth and persistence. We further introduce an adaptive, adversarial trigger optimization mechanism, utilizing boundary-search in the feature space to enhance attack longevity and effectiveness, even against defensive FL scenarios and non-IID data distributions. Extensive experiments on various FL benchmarks demonstrate that SPA consistently achieves high attack success rates with minimal impact on model utility, maintains robustness under challenging participation and data heterogeneity conditions, and exhibits persistent backdoor effects far exceeding those of conventional techniques. Our results call urgent attention to the evolving sophistication of backdoor threats in FL and emphasize the pressing need for advanced, feature-level defense techniques.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
MHD simulation of tilt instability during the dynamic FRC magnetic compression process
Authors:
Yiming Ma,
Ping Zhu,
Bo Rao,
Haolong Li
Abstract:
The nonlinear evolution of the tilt instability in a field reversed configuration (FRC) during the dynamic magnetic compression process has been investigated using magnetohydrodynamic (MHD) simulations with the NIMROD code [C. R. Sovinec \textit{et al.}, J. Comput. Phys. \textbf{195}, 355 (2004)]. The tilt mode induces significant deformations in the linear growth phase and results in complete con…
▽ More
The nonlinear evolution of the tilt instability in a field reversed configuration (FRC) during the dynamic magnetic compression process has been investigated using magnetohydrodynamic (MHD) simulations with the NIMROD code [C. R. Sovinec \textit{et al.}, J. Comput. Phys. \textbf{195}, 355 (2004)]. The tilt mode induces significant deformations in the linear growth phase and results in complete confinement loss of the FRC in the nonlinear phase, with no evidence of dynamic nonlinear stabilization. The growth rate of the tilt mode increases with the compression field ramping rate and approaches an asymptotic value. Toroidal flow can reduce both the growth rate and the nonlinear saturation amplitude of the tilt mode. The stabilizing effect of the toroidal rotation is enhanced with higher compression field ramping rates due to the spontaneous toroidal field generation and increased flow shear during compression. Although the tilt mode remains unstable with a toroidal rotation Mach number close to 0.5, the onset of tilt distortion can be delayed, allowing a magnetic compression ratio up to 5.3 before the compressional heating terminates.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
EBC-ZIP: Improving Blockwise Crowd Counting with Zero-Inflated Poisson Regression
Authors:
Yiming Ma,
Victor Sanchez,
Tanaya Guha
Abstract:
Density map estimation has become the mainstream paradigm in crowd counting. However, most existing methods overlook the extreme sparsity of ground-truth density maps. In real-world crowd scenes, the vast majority of spatial regions (often over 95%) contain no people, leading to heavily imbalanced count distributions. Ignoring this imbalance can bias models toward overestimating dense regions and…
▽ More
Density map estimation has become the mainstream paradigm in crowd counting. However, most existing methods overlook the extreme sparsity of ground-truth density maps. In real-world crowd scenes, the vast majority of spatial regions (often over 95%) contain no people, leading to heavily imbalanced count distributions. Ignoring this imbalance can bias models toward overestimating dense regions and underperforming in sparse areas. Furthermore, most loss functions used in density estimation are majorly based on MSE and implicitly assume Gaussian distributions, which are ill-suited for modeling discrete, non-negative count data. In this paper, we propose EBC-ZIP, a crowd counting framework that models the spatial distribution of counts using a Zero-Inflated Poisson (ZIP) regression formulation. Our approach replaces the traditional regression loss with the negative log-likelihood of the ZIP distribution, enabling better handling of zero-heavy distributions while preserving count accuracy. Built upon the recently proposed Enhanced Block Classification (EBC) framework, EBC-ZIP inherits EBC's advantages in preserving the discreteness of targets and ensuring training stability, while further improving performance through a more principled probabilistic loss. We also evaluate EBC-ZIP with backbones of varying computational complexity to assess its scalability. Extensive experiments on four crowd counting benchmarks demonstrate that EBC-ZIP consistently outperforms EBC and achieves state-of-the-art results.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Precise Measurement of the $Λ$ Electric Dipole Moment through the Entangled Strange Baryon-Antibaryon System
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (696 additional authors not shown)
Abstract:
The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipol…
▽ More
The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipole moment (EDM). However, direct measurements of hyperon EDMs through spin precession are highly challenging due to their short lifetimes. In this paper, we present a novel method to extract the EDM of the lightest hyperon, $Λ$, using the entangled $Λ$$\overlineΛ$ system. Our result is consistent with zero, achieving a three-order-of-magnitude improvement over the previous upper limit established in the 1980s with comparable statistics, providing stringent constraints on potential new physics.
△ Less
Submitted 28 June, 2025; v1 submitted 23 June, 2025;
originally announced June 2025.
-
A new window into the sub-parsec scale magnetic field in the Milky Way? Unveiling small-scale magneto-ionic structures with Faraday complexity
Authors:
Yik Ki Ma,
Amit Seta,
N. M. McClure-Griffiths,
C. L. Van Eck,
S. A. Mao,
A. Ordog,
J. C. Brown,
T. O. Kovacs,
Takuya Akahori,
K. Kurahara,
L. Oberhelman,
C. S. Anderson
Abstract:
Radio broadband spectro-polarimetric observations are sensitive to the spatial fluctuations of the Faraday depth (FD) within the telescope beam. Such FD fluctuations are referred to as "Faraday complexity", and can unveil small-scale magneto-ionic structures in both the synchrotron-emitting and the foreground volumes. We explore the astrophysical origin of the Faraday complexity exhibited by 191 p…
▽ More
Radio broadband spectro-polarimetric observations are sensitive to the spatial fluctuations of the Faraday depth (FD) within the telescope beam. Such FD fluctuations are referred to as "Faraday complexity", and can unveil small-scale magneto-ionic structures in both the synchrotron-emitting and the foreground volumes. We explore the astrophysical origin of the Faraday complexity exhibited by 191 polarised extragalactic radio sources (EGSs) within 5 deg from the Galactic plane in the longitude range of 20-52 deg, using broadband data from the Karl G. Jansky Very Large Array presented by a previous work. A new parameter called the FD spread is devised to quantify the spatial FD fluctuations. We find that the FD spread of the EGSs (i) demonstrates an enhancement near the Galactic mid-plane, most notable within Galactic latitude of +-3 deg, (ii) exhibits hints of modulations across Galactic longitude, (iii) does not vary with the source size across the entire range of 2.5"-300", and (iv) has an amplitude higher than expected from magneto-ionic structures of extragalactic origin. All these suggest that the primary cause of the Faraday complexity exhibited by our target EGSs is <2.5"-scale magneto-ionic structures in the Milky Way. We argue that the anisotropic turbulent magnetic field generated by galactic-scale shocks and shears, or the stellar feedback-driven isotropic turbulent magnetic field, are the most likely candidates. Our work highlights the use of broadband radio polarimetric observations of EGSs as a powerful probe of multi-scale magnetic structures in the Milky Way.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Quantum-Classical Hybrid Quantized Neural Network
Authors:
Wenxin Li,
Chuan Wang,
Hongdong Zhu,
Qi Gao,
Yin Ma,
Hai Wei,
Kai Wen
Abstract:
Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activat…
▽ More
Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activation functions into linear subintervals. This approach preserves the universal approximation properties of neural networks while allowing complex nonlinear functions to be optimized using quantum computers, thus broadening their applicability in artificial intelligence. We provide theoretical upper bounds on the approximation error and the number of Ising spins required, by deriving the sample complexity of the empirical risk minimization problem, from an optimization perspective. A significant challenge in solving the associated Quadratic Constrained Binary Optimization (QCBO) model on a large scale is the presence of numerous constraints. When employing the penalty method to handle these constraints, tuning a large number of penalty coefficients becomes a critical hyperparameter optimization problem, increasing computational complexity and potentially affecting solution quality. To address this, we employ the Quantum Conditional Gradient Descent (QCGD) algorithm, which leverages quantum computing to directly solve the QCBO problem. We prove the convergence of QCGD under a quantum oracle with randomness and bounded variance in objective value, as well as under limited precision constraints in the coefficient matrix. Additionally, we provide an upper bound on the Time-To-Solution for the QCBO solving process. Experimental results using a coherent Ising machine (CIM) demonstrate a 94.95% accuracy on the Fashion MNIST classification task, with only 1.1-bit precision.
△ Less
Submitted 24 June, 2025; v1 submitted 22 June, 2025;
originally announced June 2025.
-
Large Language Model Unlearning for Source Code
Authors:
Xue Jiang,
Yihong Dong,
Zheng Fang,
Yingwei Ma,
Tangxinyu Wang,
Rongyu Cao,
Binhua Li,
Zhi Jin,
Wenpin Jiao,
Yongbin Li,
Ge Li
Abstract:
LLM4SE has demonstrated significant success, but LLMs' potential memorization of sensitive or outdated training data introduces critical risks to legal compliance, software security, and code quality. LLM unlearning techniques, which can eliminate the influence of undesired data from LLMs in a post-training way, present a promising solution to address these concerns. While recent efforts in LLM un…
▽ More
LLM4SE has demonstrated significant success, but LLMs' potential memorization of sensitive or outdated training data introduces critical risks to legal compliance, software security, and code quality. LLM unlearning techniques, which can eliminate the influence of undesired data from LLMs in a post-training way, present a promising solution to address these concerns. While recent efforts in LLM unlearning show effectiveness in natural language, their applicability to source code remains underexplored. Our empirical study reveals that existing LLM unlearning approaches, when applied to source code, cause severe model utility degradation, rendering models practically unusable for code generation. In this paper, we propose PROD, a novel unlearning approach that enables LLMs to forget undesired code content while effectively preserving their code generation capabilities. PROD suppresses the probability of forget data in LLMs' output distribution while promoting candidate distributional components, enabling the model to jointly learn to forget specific content and retain its general capabilities. To facilitate this study, we establish a benchmark for code unlearning evaluation, which includes three critical downstream tasks: copyrighted code unlearning, insecure code unlearning, and deprecated API unlearning. Our evaluation demonstrates that PROD achieves superior balance between forget quality and model utility compared to existing unlearning approaches across three downstream tasks, while consistently exhibiting improvements when applied to LLMs of varying series. PROD also exhibits superior robustness against adversarial attacks without generating or exposing the data to be forgotten. The results underscore that our approach not only extends the application boundary of unlearning techniques to source code, but also holds significant implications for advancing reliable code generation.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration
Authors:
Yuntao Ma,
Yang Liu,
Kaixian Qu,
Marco Hutter
Abstract:
Throwing is a fundamental skill that enables robots to manipulate objects in ways that extend beyond the reach of their arms. We present a control framework that combines learning and model-based control for prehensile whole-body throwing with legged mobile manipulators. Our framework consists of three components: a nominal tracking policy for the end-effector, a high-frequency residual policy to…
▽ More
Throwing is a fundamental skill that enables robots to manipulate objects in ways that extend beyond the reach of their arms. We present a control framework that combines learning and model-based control for prehensile whole-body throwing with legged mobile manipulators. Our framework consists of three components: a nominal tracking policy for the end-effector, a high-frequency residual policy to enhance tracking accuracy, and an optimization-based module to improve end-effector acceleration control. The proposed controller achieved the average of 0.28 m landing error when throwing at targets located 6 m away. Furthermore, in a comparative study with university students, the system achieved a velocity tracking error of 0.398 m/s and a success rate of 56.8%, hitting small targets randomly placed at distances of 3-5 m while throwing at a specified speed of 6 m/s. In contrast, humans have a success rate of only 15.2%. This work provides an early demonstration of prehensile throwing with quantified accuracy on hardware, contributing to progress in dynamic whole-body manipulation.
△ Less
Submitted 23 June, 2025; v1 submitted 20 June, 2025;
originally announced June 2025.
-
MM-AttacKG: A Multimodal Approach to Attack Graph Construction with Large Language Models
Authors:
Yongheng Zhang,
Xinyun Zhao,
Yunshan Ma,
Haokai Ma,
Yingxiao Guan,
Guozheng Yang,
Yuliang Lu,
Xiang Wang
Abstract:
Cyber Threat Intelligence (CTI) parsing aims to extract key threat information from massive data, transform it into actionable intelligence, enhance threat detection and defense efficiency, including attack graph construction, intelligence fusion and indicator extraction. Among these research topics, Attack Graph Construction (AGC) is essential for visualizing and understanding the potential attac…
▽ More
Cyber Threat Intelligence (CTI) parsing aims to extract key threat information from massive data, transform it into actionable intelligence, enhance threat detection and defense efficiency, including attack graph construction, intelligence fusion and indicator extraction. Among these research topics, Attack Graph Construction (AGC) is essential for visualizing and understanding the potential attack paths of threat events from CTI reports. Existing approaches primarily construct the attack graphs purely from the textual data to reveal the logical threat relationships between entities within the attack behavioral sequence. However, they typically overlook the specific threat information inherent in visual modalities, which preserves the key threat details from inherently-multimodal CTI report. Therefore, we enhance the effectiveness of attack graph construction by analyzing visual information through Multimodal Large Language Models (MLLMs). Specifically, we propose a novel framework, MM-AttacKG, which can effectively extract key information from threat images and integrate it into attack graph construction, thereby enhancing the comprehensiveness and accuracy of attack graphs. It first employs a threat image parsing module to extract critical threat information from images and generate descriptions using MLLMs. Subsequently, it builds an iterative question-answering pipeline tailored for image parsing to refine the understanding of threat images. Finally, it achieves content-level integration between attack graphs and image-based answers through MLLMs, completing threat information enhancement. The experimental results demonstrate that MM-AttacKG can accurately identify key information in threat images and significantly improve the quality of multimodal attack graph construction, effectively addressing the shortcomings of existing methods in utilizing image-based threat information.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
How fast does spectral radius of truncated circular unitary ensemble converge?
Authors:
Yutao Ma,
Xujia Meng
Abstract:
Let $z_1, \cdots, z_p$ be the eigenvalues of $A,$ which is the left-top $p\times p$ submatrix of an $n\times n$ Haar-invariant unitary matrix. Suppose there exist two constants $0<h_1<h_2<1$ such that $h_1<\frac pn<h_2.$ Then,
$$\sup_{x\in \mathbb{R}}|\mathbb{P}(X_n\le x)-e^{-e^{-x}}|=\frac{(\log \log n)^{2}}{2e\log n}(1+o(1))$$ and further…
▽ More
Let $z_1, \cdots, z_p$ be the eigenvalues of $A,$ which is the left-top $p\times p$ submatrix of an $n\times n$ Haar-invariant unitary matrix. Suppose there exist two constants $0<h_1<h_2<1$ such that $h_1<\frac pn<h_2.$ Then,
$$\sup_{x\in \mathbb{R}}|\mathbb{P}(X_n\le x)-e^{-e^{-x}}|=\frac{(\log \log n)^{2}}{2e\log n}(1+o(1))$$ and further
$$ W_{1}\left(\mathcal{L}(X_n),Λ\right)=\frac{(\log\log n)^2}{2\log n}(1+o(1))$$
for $n$ large enough. Here, $Λ$ is the Gumbel distribution and $\mathcal{L}(X_n)$ is the distribution of $X_n$ with $X_n$ being some rescaled version of $\max_{1\le i\le p}|z_i|,$ the spectral radius of $A.$
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Preferred Synthesis of Armchair SnS2 Nanotubes
Authors:
Abid,
Luneng Zhao,
Ju Huang,
Yongjia Zheng,
Yuta Sato,
Qingyun Lin,
Zhen Han,
Chunxia Yang,
Tianyu Wang,
Bill Herve Nduwarugira,
Yicheng Ma,
Lingfeng Wang,
Yige Zheng,
Hang Wang,
Salman Ullah,
Afzal Khan,
Qi Zhang,
Wenbin Li,
Junfeng Gao,
Bingfeng Ju,
Feng Ding,
Yan Li,
Kazu Suenaga,
Shigeo Maruyama,
Huayong Yang
, et al. (1 additional authors not shown)
Abstract:
In this work, we present the synthesis of tin disulfide (SnS2) nanotubes (NTs) with preferred chiral angle. A sacrificial template is used to create channels of boron nitride nanotubes (BNNTs) with an optimized diameter of 4-5 nm, inside of which SnS2 NTs are formed with the high yield and structural purity. Atomic resolution imaging and nano-area electron diffraction reveal that these synthesized…
▽ More
In this work, we present the synthesis of tin disulfide (SnS2) nanotubes (NTs) with preferred chiral angle. A sacrificial template is used to create channels of boron nitride nanotubes (BNNTs) with an optimized diameter of 4-5 nm, inside of which SnS2 NTs are formed with the high yield and structural purity. Atomic resolution imaging and nano-area electron diffraction reveal that these synthesized SnS2 NTs prefer to have an armchair configuration with a probability of approximately 85%. Calculations using density functional theory (DFT) reveal a negligible difference in the formation energy between armchair and zigzag NTs, suggesting that structural stability does not play a key role in this chirality-selective growth. However, a detailed TEM investigation revealed that some SnS2 nanoribbons are found connected to the ends of SnS2 NTs, and that these nanoribbons primarily have a zigzag configuration. Subsequent DFT and machine learning potential molecular dynamic simulations verify that nanoribbons with zigzag configurations are more stable than armchair ones, and indeed zigzag nanoribbons aligned along the BNNT axis tend to roll up to form an armchair SnS2 NTs. Finally, this "zigzag nanoribbon to armchair nanotube" transition hypothesis is verified by in-situ high-resolution transmission electron microscopy, in which the transformation of SnS2 nanoribbons into a nanotube is reproduced in real time. This work is the first demonstration of preferred-chirality growth of transition metal dichalcogenide nanotubes.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Wavelet-based Global Orientation and Surface Reconstruction for Point Clouds
Authors:
Yueji Ma,
Yanzun Meng,
Dong Xiao,
Zuoqiang Shi,
Bin Wang
Abstract:
Unoriented surface reconstruction is an important task in computer graphics and has extensive applications. Based on the compact support of wavelet and orthogonality properties, classic wavelet surface reconstruction achieves good and fast reconstruction. However, this method can only handle oriented points. Despite some improved attempts for unoriented points, such as iWSR, these methods perform…
▽ More
Unoriented surface reconstruction is an important task in computer graphics and has extensive applications. Based on the compact support of wavelet and orthogonality properties, classic wavelet surface reconstruction achieves good and fast reconstruction. However, this method can only handle oriented points. Despite some improved attempts for unoriented points, such as iWSR, these methods perform poorly on sparse point clouds. To address these shortcomings, we propose a wavelet-based method to represent the mollified indicator function and complete both the orientation and surface reconstruction tasks. We use the modifying kernel function to smoothen out discontinuities on the surface, aligning with the continuity of the wavelet basis function. During the calculation of coefficient, we fully utilize the properties of the convolutional kernel function to shift the modifying computation onto wavelet basis to accelerate. In addition, we propose a novel method for constructing the divergence-free function field and using them to construct the additional homogeneous constraints to improve the effectiveness and stability. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both orientation and reconstruction for sparse models. We align the matrix construction with the compact support property of wavelet basis functions to further accelerate our method, resulting in efficient performance on CPU. Our source codes will be released on GitHub.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Shift current in 2D Janus Transition-Metal Dichalcogenides: the role of excitons
Authors:
Yuncheng Mao,
Ju Zhou,
Myrta Grüning,
Claudio Attaccalite
Abstract:
We study the shift current in two two-dimensional (2D) Janus transition metal dichalcogenides: molybdenum diselenide (MoSSe) and tungsten diselenide (WSSe). The shift current is evaluated using a real-time approach, in which the coupling with an external field is described in terms of a dynamical Berry phase. This approach incorporates electron-hole interactions and quasiparticle band structure re…
▽ More
We study the shift current in two two-dimensional (2D) Janus transition metal dichalcogenides: molybdenum diselenide (MoSSe) and tungsten diselenide (WSSe). The shift current is evaluated using a real-time approach, in which the coupling with an external field is described in terms of a dynamical Berry phase. This approach incorporates electron-hole interactions and quasiparticle band structure renormalization through an effective Hamiltonian derived from many-body perturbation theory. We find that the shift current is strongly enhanced in correspondence of C excitons. An analysis in terms of the electron-hole pairs reveals that electron and hole are localized on different atoms, and thus following an optical excitation, the center of the electron charge is shifted thus giving rise to a significant photocurrent. These results highlight the role played by excitons in the shift-current response of Janus TMDs and demonstrate that these materials are promising building blocks for future photovoltaic devices.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence
Authors:
Yao Zhang,
Chenyang Lin,
Shijie Tang,
Haokun Chen,
Shijie Zhou,
Yunpu Ma,
Volker Tresp
Abstract:
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic sys…
▽ More
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation. Our code is publicly released at https://yaoz720.github.io/SwarmAgentic/.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $D^+\to K^+η^{\prime}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (697 additional authors not shown)
Abstract:
Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773\,GeV with the BESIII detector, we present improved measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $ D^+ \to K^+ η^{\prime}$ with the double-tag method. The statistical significance of each signal decay exceeds $10σ$. The bra…
▽ More
Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773\,GeV with the BESIII detector, we present improved measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $ D^+ \to K^+ η^{\prime}$ with the double-tag method. The statistical significance of each signal decay exceeds $10σ$. The branching fractions are determined to be ${\mathcal B}(D^+\to K^+ π^0) = (1.45 \pm 0.06 \pm 0.06)\times 10^{-4}$, ${\mathcal B}(D^+\to K^+ η) = (1.17 \pm 0.10 \pm 0.03)\times 10^{-4}$ and ${\mathcal B}(D^+\to K^+ η^{\prime}) = (1.88 \pm 0.15 \pm 0.06)\times 10^{-4}$, where the first uncertainties are statistical and the second systematic. These results are consistent with the world average values but with significantly improved precision.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
ECFA Higgs, electroweak, and top Factory Study
Authors:
H. Abidi,
J. A. Aguilar-Saavedra,
S. Airen,
S. Ajmal,
M. Al-Thakeel,
G. L. Alberghi,
J. Alcaraz Maestre,
J. Alimena,
S. Alshamaily,
J. Altmann,
W. Altmannshofer,
Y. Amhis,
A. Amiri,
A. Andreazza,
S. Antusch,
O. Arnaez,
K. A. Assamagan,
S. Aumiller,
K. Azizi,
P. Azzi,
P. Azzurri,
E. Bagnaschi,
Z. Baharyioon,
H. Bahl,
V. Balagura
, et al. (346 additional authors not shown)
Abstract:
The ECFA Higgs, electroweak, and top Factory Study ran between 2021 and 2025 as a broad effort across the experimental and theoretical particle physics communities, bringing together participants from many different proposed future collider projects. Activities across three main working groups advanced the joint development of tools and analysis techniques, fostered new considerations of detector…
▽ More
The ECFA Higgs, electroweak, and top Factory Study ran between 2021 and 2025 as a broad effort across the experimental and theoretical particle physics communities, bringing together participants from many different proposed future collider projects. Activities across three main working groups advanced the joint development of tools and analysis techniques, fostered new considerations of detector design and optimisation, and led to a new set of studies resulting in improved projected sensitivities across a wide physics programme. This report demonstrates the significant expansion in the state-of-the-art understanding of the physics potential of future e+e- Higgs, electroweak, and top factories, and has been submitted as input to the 2025 European Strategy for Particle Physics Update.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Determination of $|V_{cb}|$ using $B\to D\ellν_\ell$ Decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
K. Adamczyk,
L. Aggarwal,
H. Ahmed,
Y. Ahn,
H. Aihara,
N. Akopov,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
K. Amos,
M. Angelsmark,
N. Anh Ky,
C. Antonioli,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
N. K. Baghel,
S. Bahinipati
, et al. (385 additional authors not shown)
Abstract:
We present a determination of the Cabibbo-Kobayashi-Maskawa matrix element $|V_{cb}|$ from the decay $B\to D\ellν_\ell$ using a $365~\mathrm{fb}^{-1}$ $e^+e^-\toΥ(4S)\to B\bar B$ data sample recorded by the Belle II experiment at the SuperKEKB collider. The semileptonic decay of one $B$ meson is reconstructed in the modes $B^0\to D^-(\to K^+π^-π^-)\ell^+ν_\ell$ and…
▽ More
We present a determination of the Cabibbo-Kobayashi-Maskawa matrix element $|V_{cb}|$ from the decay $B\to D\ellν_\ell$ using a $365~\mathrm{fb}^{-1}$ $e^+e^-\toΥ(4S)\to B\bar B$ data sample recorded by the Belle II experiment at the SuperKEKB collider. The semileptonic decay of one $B$ meson is reconstructed in the modes $B^0\to D^-(\to K^+π^-π^-)\ell^+ν_\ell$ and $B^+\to \bar D^0(\to K^+π^-)\ell^+ν_\ell$, where $\ell$ denotes either an electron or a muon. Charge conjugation is implied. The second $B$ meson in the $Υ(4S)$ event is not reconstructed explicitly. Using an inclusive reconstruction of the unobserved neutrino momentum, we determine the recoil variable $w=v_B\cdot v_D$, where $v_B$ and $v_D$ are the 4-velocities of the $B$ and $D$ mesons. We measure the total decay branching fractions to be $\mathcal{B}(B^0\to D^-\ell^+ν_\ell)=(2.06 \pm 0.05\,(\mathrm{stat.}) \pm 0.10\,(\mathrm{sys.}))\%$ and $\mathcal{B}(B^+\to\bar D^0\ell^+ν_\ell)=(2.31 \pm 0.04\,(\mathrm{stat.}) \pm 0.09\,(\mathrm{sys.}))\%$. We probe lepton flavor universality by measuring $\mathcal{B}(B\to Deν_e)/\mathcal{B}(B\to Dμν_μ)=1.020 \pm 0.020\,(\mathrm{stat.})\pm 0.022\,(\mathrm{sys.})$. Fitting the partial decay branching fraction as a function of $w$ and using the average of lattice QCD calculations of the $B\to D$ form factor, we obtain $ |V_{cb}|=(39.2\pm 0.4\,(\mathrm{stat.}) \pm 0.6\,(\mathrm{sys.}) \pm 0.5\,(\mathrm{th.})$.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Authors:
Yanxu Mao,
Tiehan Cui,
Peipei Liu,
Datao You,
Hongsong Zhu
Abstract:
Large language models (LLMs) are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak attacks and corresponding defense mechanisms within the expanding LLM ecosystem. We first trace the devel…
▽ More
Large language models (LLMs) are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak attacks and corresponding defense mechanisms within the expanding LLM ecosystem. We first trace the developmental trajectory from LLMs to MLLMs and Agents, highlighting the core security challenges emerging at each stage. Next, we categorize mainstream jailbreak techniques from both the attack impact and visibility perspectives, and provide a comprehensive analysis of representative attack methods, related datasets, and evaluation metrics. On the defense side, we organize existing strategies based on response timing and technical approach, offering a structured understanding of their applicability and implementation. Furthermore, we identify key limitations in existing surveys, such as insufficient attention to agent-specific security issues, the absence of a clear taxonomy for hybrid jailbreak methods, a lack of detailed analysis of experimental setups, and outdated coverage of recent advancements. To address these limitations, we provide an updated synthesis of recent work and outline future research directions in areas such as dataset construction, evaluation framework optimization, and strategy generalization. Our study seeks to enhance the understanding of jailbreak mechanisms and facilitate the advancement of more resilient and adaptive defense strategies in the context of ever more capable LLMs.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
Authors:
Yao Zhang,
Hewei Gao,
Haokun Chen,
Weiguo Li,
Yunpu Ma,
Volker Tresp
Abstract:
Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challe…
▽ More
Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
Authors:
Yujun Wang,
Jinhe Bi,
Yunpu Ma,
Soeren Pirk
Abstract:
Multimodal Large Language Model (MLLM) often suffer from hallucinations. They over-rely on partial cues and generate incorrect responses. Recently, methods like Visual Contrastive Decoding (VCD) and Instruction Contrastive Decoding (ICD) have been proposed to mitigate hallucinations by contrasting predictions from perturbed or negatively prefixed inputs against original outputs. In this work, we u…
▽ More
Multimodal Large Language Model (MLLM) often suffer from hallucinations. They over-rely on partial cues and generate incorrect responses. Recently, methods like Visual Contrastive Decoding (VCD) and Instruction Contrastive Decoding (ICD) have been proposed to mitigate hallucinations by contrasting predictions from perturbed or negatively prefixed inputs against original outputs. In this work, we uncover that methods like VCD and ICD fundamentally influence internal attention dynamics of the model. This observation suggests that their effectiveness may not stem merely from surface-level modifications to logits but from deeper shifts in attention distribution. Inspired by this insight, we propose an attention-steerable contrastive decoding framework that directly intervenes in attention mechanisms of the model to offer a more principled approach to mitigating hallucinations. Our experiments across multiple MLLM architectures and diverse decoding methods demonstrate that our approach significantly reduces hallucinations and improves the performance on benchmarks such as POPE, CHAIR, and MMHal-Bench, while simultaneously enhancing performance on standard VQA benchmarks.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions
Authors:
Aishan Liu,
Zonghao Ying,
Le Wang,
Junjie Mu,
Jinyang Guo,
Jiakai Wang,
Yuqing Ma,
Siyuan Liang,
Mingchuan Zhang,
Xianglong Liu,
Dacheng Tao
Abstract:
The rapid advancement of vision-language models (VLMs) and their integration into embodied agents have unlocked powerful capabilities for decision-making. However, as these systems are increasingly deployed in real-world environments, they face mounting safety concerns, particularly when responding to hazardous instructions. In this work, we propose AGENTSAFE, the first comprehensive benchmark for…
▽ More
The rapid advancement of vision-language models (VLMs) and their integration into embodied agents have unlocked powerful capabilities for decision-making. However, as these systems are increasingly deployed in real-world environments, they face mounting safety concerns, particularly when responding to hazardous instructions. In this work, we propose AGENTSAFE, the first comprehensive benchmark for evaluating the safety of embodied VLM agents under hazardous instructions. AGENTSAFE simulates realistic agent-environment interactions within a simulation sandbox and incorporates a novel adapter module that bridges the gap between high-level VLM outputs and low-level embodied controls. Specifically, it maps recognized visual entities to manipulable objects and translates abstract planning into executable atomic actions in the environment. Building on this, we construct a risk-aware instruction dataset inspired by Asimovs Three Laws of Robotics, including base risky instructions and mutated jailbroken instructions. The benchmark includes 45 adversarial scenarios, 1,350 hazardous tasks, and 8,100 hazardous instructions, enabling systematic testing under adversarial conditions ranging from perception, planning, and action execution stages.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
A statistical framework for dynamic cognitive diagnosis in digital learning environments
Authors:
Yawen Ma,
Anastasia Ushakova,
Kate Cain,
Gabriel Wallin
Abstract:
Reading is foundational for educational, employment, and economic outcomes, but a persistent proportion of students globally struggle to develop adequate reading skills. Some countries promote digital tools to support reading development, alongside regular classroom instruction. Such tools generate rich log data capturing students' behaviour and performance. This study proposes a dynamic cognitive…
▽ More
Reading is foundational for educational, employment, and economic outcomes, but a persistent proportion of students globally struggle to develop adequate reading skills. Some countries promote digital tools to support reading development, alongside regular classroom instruction. Such tools generate rich log data capturing students' behaviour and performance. This study proposes a dynamic cognitive diagnostic modeling (CDM) framework based on restricted latent class models to trace students' time-varying skills mastery using log files from digital tools. Unlike traditional CDMs that require expert-defined skill-item mappings (Q-matrix), our approach jointly estimates the Q-matrix and latent skill profiles, integrates log-derived covariates (e.g., reattempts, response times, counts of mastered items) and individual characteristics, and models transitions in mastery using a Bayesian estimation approach. Applied to real-world data, the model demonstrates practical value in educational settings by effectively uncovering individual skill profiles and the skill-item mappings. Simulation studies confirm robust recovery of Q-matrix structures and latent profiles with high accuracy under varied sample sizes, item counts and different sparsity of Q-matrices. The framework offers a data-driven, time-dependent restricted latent class modeling approach to understanding early reading development.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Compositional Attribute Imbalance in Vision Datasets
Authors:
Jiayi Chen,
Yanbiao Ma,
Andi Zhang,
Weidong Tang,
Wei Dai,
Bowei Liu
Abstract:
Visual attribute imbalance is a common yet underexplored issue in image classification, significantly impacting model performance and generalization. In this work, we first define the first-level and second-level attributes of images and then introduce a CLIP-based framework to construct a visual attribute dictionary, enabling automatic evaluation of image attributes. By systematically analyzing b…
▽ More
Visual attribute imbalance is a common yet underexplored issue in image classification, significantly impacting model performance and generalization. In this work, we first define the first-level and second-level attributes of images and then introduce a CLIP-based framework to construct a visual attribute dictionary, enabling automatic evaluation of image attributes. By systematically analyzing both single-attribute imbalance and compositional attribute imbalance, we reveal how the rarity of attributes affects model performance. To tackle these challenges, we propose adjusting the sampling probability of samples based on the rarity of their compositional attributes. This strategy is further integrated with various data augmentation techniques (such as CutMix, Fmix, and SaliencyMix) to enhance the model's ability to represent rare attributes. Extensive experiments on benchmark datasets demonstrate that our method effectively mitigates attribute imbalance, thereby improving the robustness and fairness of deep neural networks. Our research highlights the importance of modeling visual attribute distributions and provides a scalable solution for long-tail image classification tasks.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge
Authors:
Zeinab Sadat Taghavi,
Ali Modarressi,
Yunpu Ma,
Hinrich Schütze
Abstract:
Retrieval systems are central to many NLP pipelines, but often rely on surface-level cues such as keyword overlap and lexical semantic similarity. To evaluate retrieval beyond these shallow signals, recent benchmarks introduce reasoning-heavy queries; however, they primarily shift the burden to query-side processing techniques -- like prompting or multi-hop retrieval -- that can help resolve compl…
▽ More
Retrieval systems are central to many NLP pipelines, but often rely on surface-level cues such as keyword overlap and lexical semantic similarity. To evaluate retrieval beyond these shallow signals, recent benchmarks introduce reasoning-heavy queries; however, they primarily shift the burden to query-side processing techniques -- like prompting or multi-hop retrieval -- that can help resolve complexity. In contrast, we present ImpliRet, a benchmark that shifts the reasoning challenge to document-side processing: The queries are simple, but relevance depends on facts stated implicitly in documents through temporal (e.g., resolving "two days ago"), arithmetic, and world knowledge relationships. We evaluate a range of sparse and dense retrievers, all of which struggle in this setting: the best nDCG@10 is only 15.07%. We also test whether long-context models can overcome this limitation. But even with a short context of only ten documents, including the positive document, GPT-4.1 scores only 35.06%, showing that document-side reasoning remains a challenge. Our codes are available at github.com/ZeinabTaghavi/IMPLIRET.Contribution.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Temperature dependent single- and double-quantum relaxation of negatively charged boron vacancies in hexagonal boron nitride
Authors:
Lin-Ke Xie,
Wei Liu,
Kaiyu Huang,
Nai-Jie Guo,
Jun-You Liu,
Yu-Hang Ma,
Ya-Qi Wu,
Yi-Tao Wang,
Zhao-an Wang,
Xiao-Dong Zeng,
Jia-Ming Ren,
Chun Ao,
Shuo Deng,
Haifei Lu,
Jian-Shun Tang,
Chuan-Feng Li,
Guang-Can Guo
Abstract:
The negatively charged boron vacancy in two-dimensional hexagonal boron nitride has emerged as a promising candidate for quantum sensing. The coherence time of this defect spins which coherent quantum sensing resides in is limited spin-phonon interactions, while the underlying physical mechanism of the corresponding high-temperature behavior is still not fully understood. Here, we probe the single…
▽ More
The negatively charged boron vacancy in two-dimensional hexagonal boron nitride has emerged as a promising candidate for quantum sensing. The coherence time of this defect spins which coherent quantum sensing resides in is limited spin-phonon interactions, while the underlying physical mechanism of the corresponding high-temperature behavior is still not fully understood. Here, we probe the single- and double-quantum relaxation rates on this center over the temperature range from 293 to 393 K. The results show that both relaxation rates increase with increasing temperature, and the double-quantum relaxation rate significantly increases rapidly. At high temperature (above 400 K), the double-quantum relaxation rate is much greater than single-quantum relaxation rate, and may dominate the decoherence channel of spin-phonon interactions. Using a theoretical model of second-order spin-phonon interactions, we attribute the high-temperature spin relaxation rates to interactions with higher-energy effective phonon mode, aiding the further understanding and guiding high-temperature sensing applications.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies
Authors:
Jinyan Yuan,
Bangbang Yang,
Keke Wang,
Panwang Pan,
Lin Ma,
Xuehai Zhang,
Xiao Liu,
Zhaopeng Cui,
Yuewen Ma
Abstract:
Automatic creation of 3D scenes for immersive VR presence has been a significant research focus for decades. However, existing methods often rely on either high-poly mesh modeling with post-hoc simplification or massive 3D Gaussians, resulting in a complex pipeline or limited visual realism. In this paper, we demonstrate that such exhaustive modeling is unnecessary for achieving compelling immersi…
▽ More
Automatic creation of 3D scenes for immersive VR presence has been a significant research focus for decades. However, existing methods often rely on either high-poly mesh modeling with post-hoc simplification or massive 3D Gaussians, resulting in a complex pipeline or limited visual realism. In this paper, we demonstrate that such exhaustive modeling is unnecessary for achieving compelling immersive experience. We introduce ImmerseGen, a novel agent-guided framework for compact and photorealistic world modeling. ImmerseGen represents scenes as hierarchical compositions of lightweight geometric proxies, i.e., simplified terrain and billboard meshes, and generates photorealistic appearance by synthesizing RGBA textures onto these proxies. Specifically, we propose terrain-conditioned texturing for user-centric base world synthesis, and RGBA asset texturing for midground and foreground scenery. This reformulation offers several advantages: (i) it simplifies modeling by enabling agents to guide generative models in producing coherent textures that integrate seamlessly with the scene; (ii) it bypasses complex geometry creation and decimation by directly synthesizing photorealistic textures on proxies, preserving visual quality without degradation; (iii) it enables compact representations suitable for real-time rendering on mobile VR headsets. To automate scene creation from text prompts, we introduce VLM-based modeling agents enhanced with semantic grid-based analysis for improved spatial reasoning and accurate asset placement. ImmerseGen further enriches scenes with dynamic effects and ambient audio to support multisensory immersion. Experiments on scene generation and live VR showcases demonstrate that ImmerseGen achieves superior photorealism, spatial coherence and rendering efficiency compared to prior methods. Project webpage: https://immersegen.github.io.
△ Less
Submitted 18 June, 2025; v1 submitted 17 June, 2025;
originally announced June 2025.
-
FEWSim: A Visual Analytic Framework for Exploring the Nexus of Food-Energy-Water Simulations
Authors:
Fan Lei,
David A. Sampson,
Jiayi Hong,
Yuxin Ma,
Giuseppe Mascaro,
Dave White,
Rimjhim Agarwal,
Ross Maciejewski
Abstract:
The interdependencies of food, energy, and water (FEW) systems create a nexus opportunity to explore the strengths and vulnerabilities of individual and cross-sector interactions within FEW systems. However, the variables quantifying nexus interactions are hard to observe, which hinders the cross-sector analysis. To overcome such challenges, we present FEWSim, a visual analytics framework designed…
▽ More
The interdependencies of food, energy, and water (FEW) systems create a nexus opportunity to explore the strengths and vulnerabilities of individual and cross-sector interactions within FEW systems. However, the variables quantifying nexus interactions are hard to observe, which hinders the cross-sector analysis. To overcome such challenges, we present FEWSim, a visual analytics framework designed to support domain experts in exploring and interpreting simulation results from a coupled FEW model. FEWSim employs a three-layer asynchronous architecture: the model layer integrates food, energy, and water models to simulate the FEW nexus; the middleware layer manages scenario configuration and execution; and the visualization layer provides interactive visual exploration of simulated time-series results across FEW sectors. The visualization layer further facilitates the exploration across multiple scenarios and evaluates scenario differences in performance using sustainability indices of the FEW nexus. We demonstrate the utility of FEWSim through a case study for the Phoenix Active Management Area (AMA) in Arizona.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
MeerKAT HI observations of Low Surface Brightness/Ultradiffuse Galaxy Candidates Projected around Two Southern Loose Groups
Authors:
Chandreyee Sengupta,
Tom C. Scott,
Hao Chen,
Hyein Yoon,
Yogesh Chandola,
Mengtian Li,
Gyula I. G. Józsa,
O. Ivy Wong,
Yin-Zhe Ma,
Patricio Lagos,
Ruta Kale,
Denis Tramonte
Abstract:
A large catalogue of low surface brightness galaxies (LSBGs) from the Dark Energy Survey showed significant clustering around nearby galaxy groups and clusters. Using the HIPASS survey, we tried to determine the redshift of a sub-sample of these LSBGs and determine whether they were members of the groups they were projected near, but this was hampered by HIPASS's high spectral rms. This letter rep…
▽ More
A large catalogue of low surface brightness galaxies (LSBGs) from the Dark Energy Survey showed significant clustering around nearby galaxy groups and clusters. Using the HIPASS survey, we tried to determine the redshift of a sub-sample of these LSBGs and determine whether they were members of the groups they were projected near, but this was hampered by HIPASS's high spectral rms. This letter reports on MeerKAT H I observations to determine the redshifts of 52 LSBG candidates projected in the vicinity of two groups from our previous HIPASS study. The main goal is to investigate and ascertain whether these LSBGs are genuine group members. H I was detected with MeerKAT and redshifts were determined for only five of the 52 candidates within a velocity range of $\pm$ 2500 km/s of their respective group velocities. All five H I detections were blue LSBGs and two of them were confirmed to be ultradiffuse galaxies (UDGs). Both these UDGs were group members, while the other three detections were either foreground or background galaxies. In this letter we explore scenarios that can explain the 90% non-detection. MeerKAT's excellent sensitivity allows us to conclude that the majority of the non-detected candidates, particularly the blue galaxies, are not group members but lie at higher redshifts. However, this still leaves the open question as why Tanoglidis LSBG candidates, in particular the red ones, appear to be clustered in projection around nearby groups.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
ExtendAttack: Attacking Servers of LRMs via Extending Reasoning
Authors:
Zhenhao Zhu,
Yue Liu,
Yingwei Ma,
Hongcheng Gao,
Nuo Chen,
Yanpei Guo,
Wenjie Qu,
Huiying Xu,
Xinzhong Zhu,
Jiaheng Zhang
Abstract:
Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by ste…
▽ More
Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it increases the length of the model's response by over 2.5 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Catalogue of chiral phonon materials
Authors:
Yue Yang,
Zhenyu Xiao,
Yu Mao,
Zhanghuan Li,
Zhenyang Wang,
Tianqi Deng,
Yanhao Tang,
Zhi-Da Song,
Yuan Li,
Huiqiu Yuan,
Ming Shi,
Yuanfeng Xu
Abstract:
Chiral phonons, circularly polarized lattice vibrations carrying intrinsic angular momentum, offer unprecedented opportunities for controlling heat flow, manipulating quantum states through spin-phonon coupling, and realizing exotic transport phenomena. Despite their fundamental importance, a universal framework for identifying and classifying these elusive excitations has remained out of reach. H…
▽ More
Chiral phonons, circularly polarized lattice vibrations carrying intrinsic angular momentum, offer unprecedented opportunities for controlling heat flow, manipulating quantum states through spin-phonon coupling, and realizing exotic transport phenomena. Despite their fundamental importance, a universal framework for identifying and classifying these elusive excitations has remained out of reach. Here, we address this challenge by establishing a comprehensive symmetry-based theory that systematically classifies the helicity and the velocity-angular momentum tensor underlying phonon magnetization in thermal transport across all 230 crystallographic space groups. Our approach, grounded in fundamental representations of phononic angular momentum, reveals three distinct classes of crystals: achiral crystals with vanishing angular momentum, chiral crystals with s-wave helicity, and achiral crystals exhibiting higher-order helicity patterns beyond the s-wave. By performing high-throughput computations and symmetry analysis of the dynamical matrices for 11614 crystalline compounds, we identified 2738 materials exhibiting chiral phonon modes and shortlisted the 170 most promising candidates for future experimental investigation. These results are compiled into an open-access Chiral Phonon Materials Database website, enabling rapid screening for materials with desired chiral phonon properties. Our theoretical framework transcends phonons--it provides a universal paradigm for classifying chiral excitations in crystalline lattices, from magnons to electronic quasiparticles.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Authors:
MiniMax,
:,
Aili Chen,
Aonian Li,
Bangwei Gong,
Binyang Jiang,
Bo Fei,
Bo Yang,
Boji Shan,
Changqing Yu,
Chao Wang,
Cheng Zhu,
Chengjun Xiao,
Chengyu Du,
Chi Zhang,
Chu Qiao,
Chunhao Zhang,
Chunhui Du,
Congchao Guo,
Da Chen,
Deming Ding,
Dianjun Sun,
Dong Li,
Enwei Jiao,
Haigang Zhou
, et al. (103 additional authors not shown)
Abstract:
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model…
▽ More
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition to M1's inherent efficiency advantage for RL training, we propose CISPO, a novel RL algorithm to further enhance RL efficiency. CISPO clips importance sampling weights rather than token updates, outperforming other competitive RL variants. Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on 512 H800 GPUs to complete in only three weeks, with a rental cost of just $534,700. We release two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively, where the 40K model represents an intermediate phase of the 80K training. Experiments on standard benchmarks show that our models are comparable or superior to strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, with particular strengths in complex software engineering, tool utilization, and long-context tasks. We publicly release MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
Authors:
Yu Yang,
Alan Liang,
Jianbiao Mei,
Yukai Ma,
Yong Liu,
Gim Hee Lee
Abstract:
Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, the generation of large-scale 3D scenes that require spatial coherence remains underexplored. In this paper, we propose X-Scene, a novel framework for large-scale driving scene generati…
▽ More
Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, the generation of large-scale 3D scenes that require spatial coherence remains underexplored. In this paper, we propose X-Scene, a novel framework for large-scale driving scene generation that achieves both geometric intricacy and appearance fidelity, while offering flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level conditions such as user-provided or text-driven layout for detailed scene composition and high-level semantic guidance such as user-intent and LLM-enriched text prompts for efficient customization. To enhance geometrical and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and the corresponding multiview images, while ensuring alignment between modalities. Additionally, we extend the generated local region into a large-scale scene through consistency-aware scene outpainting, which extrapolates new occupancy and images conditioned on the previously generated area, enhancing spatial continuity and preserving visual coherence. The resulting scenes are lifted into high-quality 3DGS representations, supporting diverse applications such as scene exploration. Comprehensive experiments demonstrate that X-Scene significantly advances controllability and fidelity for large-scale driving scene generation, empowering data generation and simulation for autonomous driving.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.