-
Joint DOA and Attitude Sensing Based on Tri-Polarized Continuous Aperture Array
Authors:
Haonan Si,
Zhaolin Wang,
Xiansheng Guo,
Jin Zhang,
Yuanwei Liu
Abstract:
This paper investigates joint direction-of-arrival (DOA) and attitude sensing using tri-polarized continuous aperture arrays (CAPAs). By employing electromagnetic (EM) information theory, the spatially continuous received signals in tri-polarized CAPA are modeled, thereby enabling accurate DOA and attitude estimation. To facilitate subspace decomposition for continuous operators, an equivalent con…
▽ More
This paper investigates joint direction-of-arrival (DOA) and attitude sensing using tri-polarized continuous aperture arrays (CAPAs). By employing electromagnetic (EM) information theory, the spatially continuous received signals in tri-polarized CAPA are modeled, thereby enabling accurate DOA and attitude estimation. To facilitate subspace decomposition for continuous operators, an equivalent continuous-discrete transformation technique is developed. Moreover, both self- and cross-covariances of tri-polarized signals are exploited to construct a tri-polarized spectrum, significantly enhancing DOA estimation performance. Theoretical analyses reveal that the identifiability of attitude information fundamentally depends on the availability of prior target snapshots. Accordingly, two attitude estimation algorithms are proposed: one capable of estimating partial attitude information without prior knowledge, and the other achieving full attitude estimation when such knowledge is available. Numerical results demonstrate the feasibility and superiority of the proposed framework.
△ Less
Submitted 2 October, 2025;
originally announced October 2025.
-
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Authors:
Minhui Zhu,
Minyang Tian,
Xiaocheng Yang,
Tianci Zhou,
Penghao Zhu,
Eli Chertkov,
Shengyan Liu,
Yufeng Du,
Lifan Yuan,
Ziming Ji,
Indranil Das,
Junyi Cao,
Yufeng Du,
Jinchen He,
Yifan Su,
Jiabin Yu,
Yikun Jiang,
Yujie Zhang,
Chang Liu,
Ze-Min Huang,
Weizhen Jia,
Xinan Chen,
Peixue Wu,
Yunkai Wang,
Juntai Zhou
, et al. (40 additional authors not shown)
Abstract:
While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integr…
▽ More
While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integrated Thinking - Physics Test, pronounced "critical point"), the first benchmark designed to test LLMs on unpublished, research-level reasoning tasks that broadly covers modern physics research areas, including condensed matter, quantum physics, atomic, molecular & optical physics, astrophysics, high energy physics, mathematical physics, statistical physics, nuclear physics, nonlinear dynamics, fluid dynamics and biophysics. CritPt consists of 71 composite research challenges designed to simulate full-scale research projects at the entry level, which are also decomposed to 190 simpler checkpoint tasks for more fine-grained insights. All problems are newly created by 50+ active physics researchers based on their own research. Every problem is hand-curated to admit a guess-resistant and machine-verifiable answer and is evaluated by an automated grading pipeline heavily customized for advanced physics-specific output formats. We find that while current state-of-the-art LLMs show early promise on isolated checkpoints, they remain far from being able to reliably solve full research-scale challenges: the best average accuracy among base models is only 4.0% , achieved by GPT-5 (high), moderately rising to around 10% when equipped with coding tools. Through the realistic yet standardized evaluation offered by CritPt, we highlight a large disconnect between current model capabilities and realistic physics research demands, offering a foundation to guide the development of scientifically grounded AI tools.
△ Less
Submitted 30 September, 2025; v1 submitted 30 September, 2025;
originally announced September 2025.
-
Dolphin v1.0 Technical Report
Authors:
Taohan Weng,
Chi zhang,
Chaoran Yan,
Siya Liu,
Xiaoyang Liu,
Yalun Wu,
Boyang Wang,
Boyan Wang,
Jiren Ren,
Kaiwen Yan,
Jinze Yu,
Kaibing Hu,
Henan Liu,
Haoyun Zheng,
Zhenyu Liu,
Duo Zhang,
Xiaoqing Guo,
Anjie Le,
Hongcheng Guo
Abstract:
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultras…
▽ More
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.
△ Less
Submitted 30 September, 2025; v1 submitted 30 September, 2025;
originally announced September 2025.
-
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
Authors:
Shijie Zhang,
Guohao Sun,
Kevin Zhang,
Xiang Guo,
Rujun Guo
Abstract:
Recently, online Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically treat all training samples uniformly, overlooking the vast differences in problem difficulty relative to the model's current capabilities. This uniform training strategy leads to inefficient ex…
▽ More
Recently, online Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically treat all training samples uniformly, overlooking the vast differences in problem difficulty relative to the model's current capabilities. This uniform training strategy leads to inefficient exploration of problems the model has already mastered, while concurrently lacking effective guidance on problems that are challenging its abilities the most, limiting both learning efficiency and upper-bound performance. To address this, we propose CLPO (Curriculum-guided Learning for Policy Optimization), a novel algorithm that creates a dynamic pedagogical feedback loop within the policy optimization process. The core of CLPO leverages the model's own rollout performance to conduct real-time difficulty assessment, thereby constructing an Online Curriculum. This curriculum then guides an Adaptive Problem Restructuring mechanism, where the model acts as its own teacher: it diversifies medium-difficulty problems to promote generalization and simplifies challenging problems to make them more attainable. Our approach transforms the static training procedure into a dynamic process that co-evolves with the model's capabilities. Experiments show that CLPO achieves state-of-the-art performance across eight challenging mathematical and general reasoning benchmarks, with an average pass@1 improvement of 6.96% over other methods, demonstrating its potential for more efficiently training more capable reasoning models.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
Observation of a resonance-like structure near the $π^+π^-$ mass threshold in $ψ(3686) \rightarrow π^{+}π^{-}J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (677 additional authors not shown)
Abstract:
Based on the $(2712.4\pm14.4)\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we present a high-precision study of the $π^+π^-$ mass spectrum in $ψ(3686)\rightarrowπ^{+}π^{-}J/ψ$ decays. A clear resonance-like structure is observed near the $π^+π^-$ mass threshold for the first time. A fit with a Breit-Wigner function yields a mass of $285.6\pm 2.5~{\rm MeV}/c^2$ and a width of…
▽ More
Based on the $(2712.4\pm14.4)\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we present a high-precision study of the $π^+π^-$ mass spectrum in $ψ(3686)\rightarrowπ^{+}π^{-}J/ψ$ decays. A clear resonance-like structure is observed near the $π^+π^-$ mass threshold for the first time. A fit with a Breit-Wigner function yields a mass of $285.6\pm 2.5~{\rm MeV}/c^2$ and a width of $16.3\pm 0.9~{\rm MeV}$ with a statistical significance exceeding 10$σ$. To interpret the data, we incorporate final-state interactions (FSI) within two theoretical frameworks: chiral perturbation theory (ChPT) and QCD multipole expansion (QCDME). ChPT describes the spectrum above 0.3 GeV/$c^2$ but fails to reproduce the threshold enhancement. In contrast, the QCDME model, assuming the $ψ(3686)$ is an admixture of S- and D-wave charmonium, reproduces the data well. The pronounced dip near 0.3 GeV/$c^2$ offers new insight into the interplay between chiral dynamics and low-energy QCD.
△ Less
Submitted 28 September, 2025;
originally announced September 2025.
-
Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
Authors:
Ziheng Cheng,
Xin Guo,
Yufei Zhang
Abstract:
The theory of discrete-time reinforcement learning (RL) has advanced rapidly over the past decades. Although primarily designed for discrete environments, many real-world RL applications are inherently continuous and complex. A major challenge in extending discrete-time algorithms to continuous-time settings is their sensitivity to time discretization, often leading to poor stability and slow conv…
▽ More
The theory of discrete-time reinforcement learning (RL) has advanced rapidly over the past decades. Although primarily designed for discrete environments, many real-world RL applications are inherently continuous and complex. A major challenge in extending discrete-time algorithms to continuous-time settings is their sensitivity to time discretization, often leading to poor stability and slow convergence. In this paper, we investigate deterministic policy gradient methods for continuous-time RL. We derive a continuous-time policy gradient formula based on an analogue of the advantage function and establish its martingale characterization. This theoretical foundation leads to our proposed algorithm, CT-DDPG, which enables stable learning with deterministic policies in continuous-time environments. Numerical experiments show that the proposed CT-DDPG algorithm offers improved stability and faster convergence compared to existing discrete-time and continuous-time methods, across a wide range of control tasks with varying time discretizations and noise levels.
△ Less
Submitted 28 September, 2025;
originally announced September 2025.
-
FLAME: A Serving System Optimized for Large-Scale Generative Recommendation with Efficiency
Authors:
Xianwen Guo,
Bin Huang,
Xiaomeng Wu,
Guanlin Wu,
Fangjian Li,
Shijia Wang,
Qiang Xiao,
Chuanjiang Luo,
Yong Li
Abstract:
Generative recommendation (GR) models possess greater scaling power compared to traditional deep learning recommendation models (DLRMs), yet they also impose a tremendous increase in computational burden. Measured in FLOPs, a typical GR model's workload sits in $10^9 \sim 10^{11}$ range, roughly four orders of magnitude higher than traditional DLRMs. Delivering accurate results in a few tens of mi…
▽ More
Generative recommendation (GR) models possess greater scaling power compared to traditional deep learning recommendation models (DLRMs), yet they also impose a tremendous increase in computational burden. Measured in FLOPs, a typical GR model's workload sits in $10^9 \sim 10^{11}$ range, roughly four orders of magnitude higher than traditional DLRMs. Delivering accurate results in a few tens of milliseconds while processing billions of such requests per day puts extreme demands on the performance of the online serving system. Therefore, for industry practitioners, the alluring gains of GR models are tempered by the formidable challenge of online deployment at scale in production services. In this work, we introduce a comprehensive solution of online serving system tailored For Large-scale GenerAtive RecoMmendation with Efficiency (FLAME). Specifically, we leveraging CPU-GPU heterogeneous hardware to decouple feature pre-processing and model computation. We encapsulated several memory optimization features as the Proximal Data Accelerator (PDA) module to make full use of limited bandwidth and storage resources, which achieves a 1.9x throughput gain and a 1.7x latency reduction. We implement the Fused Kernel Engine (FKE) module based on the functionality and interface of NVIDIA TensorRT to boost model computation, delivering a speedup ratio of 4.6x-6.1x, throughput gain ratio of 4.7x-6.3x one step further. In addition, we design the Dynamic Stream Orchestrator (DSO) module to coordinate concurrent requests, enhancing the system throughput performance with 1.3x improvement in throughput and 2.3x speed-up under non-uniform distribution of upstream candidates. Comprehensive evaluations demonstrate that our FLAME effectively supports large-scale online deployment of GR models and achieves remarkable improvements in system performance.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
Modeling Tennis In-Match Momentum Using Probability Method
Authors:
Jackson Graves,
Daniel X. Guo,
Ridge Shepherd,
Alexander Young
Abstract:
This paper investigates the Tennis Momentum Model (TMM), which aims to enhance the understanding of match dynamics by integrating key factors such as efficiency, historical scoring probabilities, and real-time scoring data. The model is designed to explore how momentum affects player performance throughout a match and how it might influence overall match outcomes. By leveraging this model, players…
▽ More
This paper investigates the Tennis Momentum Model (TMM), which aims to enhance the understanding of match dynamics by integrating key factors such as efficiency, historical scoring probabilities, and real-time scoring data. The model is designed to explore how momentum affects player performance throughout a match and how it might influence overall match outcomes. By leveraging this model, players and coaches could gain valuable insights that may help them adjust their strategies in response to shifting momentum during a match.
To validate the model, it was tested on two tennis matches, revealing its effectiveness in capturing shifts in momentum and correlating these shifts with scoring events. The results showed that the TMM accurately depicted the flow of momentum during matches, highlighting how shifts in momentum are directly linked to changes in scoring as the match progresses.
△ Less
Submitted 11 September, 2025;
originally announced September 2025.
-
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
Authors:
Ruoyu Chen,
Xiaoqing Guo,
Kangwei Liu,
Siyuan Liang,
Shiming Liu,
Qunli Zhang,
Hua Zhang,
Xiaochun Cao
Abstract:
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs. Yet, the extent to which generated tokens depend on visual modalities remains poorly understood, limiting interpretability and reliability. In this work, we present EAGLE, a lightweight black-box framework for explaining autoregressive token generation in MLLM…
▽ More
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs. Yet, the extent to which generated tokens depend on visual modalities remains poorly understood, limiting interpretability and reliability. In this work, we present EAGLE, a lightweight black-box framework for explaining autoregressive token generation in MLLMs. EAGLE attributes any selected tokens to compact perceptual regions while quantifying the relative influence of language priors and perceptual evidence. The framework introduces an objective function that unifies sufficiency (insight score) and indispensability (necessity score), optimized via greedy search over sparsified image regions for faithful and efficient attribution. Beyond spatial attribution, EAGLE performs modality-aware analysis that disentangles what tokens rely on, providing fine-grained interpretability of model decisions. Extensive experiments across open-source MLLMs show that EAGLE consistently outperforms existing methods in faithfulness, localization, and hallucination diagnosis, while requiring substantially less GPU memory. These results highlight its effectiveness and practicality for advancing the interpretability of MLLMs. The code is available at https://github.com/RuoyuChen10/EAGLE.
△ Less
Submitted 26 September, 2025;
originally announced September 2025.
-
MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization
Authors:
Jianxuan Yang,
Xiaoran Yang,
Lipan Zhang,
Xinyue Guo,
Zhao Wang,
Gongping Huang
Abstract:
Current video-to-audio (V2A) methods struggle in complex multi-event scenarios (video scenarios involving multiple sound sources, sound events, or transitions) due to two critical limitations. First, existing methods face challenges in precisely aligning intricate semantic information together with rapid dynamic features. Second, foundational training lacks quantitative preference optimization for…
▽ More
Current video-to-audio (V2A) methods struggle in complex multi-event scenarios (video scenarios involving multiple sound sources, sound events, or transitions) due to two critical limitations. First, existing methods face challenges in precisely aligning intricate semantic information together with rapid dynamic features. Second, foundational training lacks quantitative preference optimization for semantic-temporal alignment and audio quality. As a result, it fails to enhance integrated generation quality in cluttered multi-event scenes. To address these core limitations, this study proposes a novel V2A framework: MultiSoundGen. It introduces direct preference optimization (DPO) into the V2A domain, leveraging audio-visual pretraining (AVP) to enhance performance in complex multi-event scenarios. Our contributions include two key innovations: the first is SlowFast Contrastive AVP (SF-CAVP), a pioneering AVP model with a unified dual-stream architecture. SF-CAVP explicitly aligns core semantic representations and rapid dynamic features of audio-visual data to handle multi-event complexity; second, we integrate the DPO method into V2A task and propose AVP-Ranked Preference Optimization (AVP-RPO). It uses SF-CAVP as a reward model to quantify and prioritize critical semantic-temporal matches while enhancing audio quality. Experiments demonstrate that MultiSoundGen achieves state-of-the-art (SOTA) performance in multi-event scenarios, delivering comprehensive gains across distribution matching, audio quality, semantic alignment, and temporal synchronization. The complete code and dataset will be released soon.
△ Less
Submitted 24 September, 2025;
originally announced September 2025.
-
Distribution of non-Gaussian states in a deployed telecommunication fiber channel
Authors:
Casper A. Breum,
Xueshi Guo,
Mikkel V. Larsen,
Shigehito Miki,
Hirotaka Terai,
Ulrik L. Andersen,
Jonas S. Neergaard-Nielsen
Abstract:
Optical non-Gaussian states hold great promise as a pivotal resource for advanced optical quantum information processing and fault-tolerant long-distance quantum communication. Establishing their faithful transmission in a real-world communication channel, therefore, marks an important milestone. In this study, we experimentally demonstrate the distribution of such non-Gaussian states in a functio…
▽ More
Optical non-Gaussian states hold great promise as a pivotal resource for advanced optical quantum information processing and fault-tolerant long-distance quantum communication. Establishing their faithful transmission in a real-world communication channel, therefore, marks an important milestone. In this study, we experimentally demonstrate the distribution of such non-Gaussian states in a functioning telecommunication channel that connects separate buildings within the DTU campus premises. We send photon-subtracted squeezed states, exhibiting pronounced Wigner negativity, through 300 m of deployed optical fibers to a distant building. Using quantum homodyne tomography, we fully characterize the states upon arrival. Our results show the survival of the Wigner function negativity after transmission when correcting for detection losses, indicating that the established link can potentially facilitate the violation of Bell's inequality and enable quantum steering. This achievement not only validates the practical feasibility of distributing non-Gaussian states in real-world settings, but also provides an exciting impetus towards realizing fully coherent quantum networks for high-dimensional, continuous-variable quantum information processing.
△ Less
Submitted 22 September, 2025;
originally announced September 2025.
-
mmExpert: Integrating Large Language Models for Comprehensive mmWave Data Synthesis and Understanding
Authors:
Yifan Yan,
Shuai Yang,
Xiuzhen Guo,
Xiangguang Wang,
Wei Chow,
Yuanchao Shu,
Shibo He
Abstract:
Millimeter-wave (mmWave) sensing technology holds significant value in human-centric applications, yet the high costs associated with data acquisition and annotation limit its widespread adoption in our daily lives. Concurrently, the rapid evolution of large language models (LLMs) has opened up opportunities for addressing complex human needs. This paper presents mmExpert, an innovative mmWave und…
▽ More
Millimeter-wave (mmWave) sensing technology holds significant value in human-centric applications, yet the high costs associated with data acquisition and annotation limit its widespread adoption in our daily lives. Concurrently, the rapid evolution of large language models (LLMs) has opened up opportunities for addressing complex human needs. This paper presents mmExpert, an innovative mmWave understanding framework consisting of a data generation flywheel that leverages LLMs to automate the generation of synthetic mmWave radar datasets for specific application scenarios, thereby training models capable of zero-shot generalization in real-world environments. Extensive experiments demonstrate that the data synthesized by mmExpert significantly enhances the performance of downstream models and facilitates the successful deployment of large models for mmWave understanding.
△ Less
Submitted 20 September, 2025;
originally announced September 2025.
-
Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Authors:
Xingang Guo,
Yaxin Li,
Xiangyi Kong,
Yilan Jiang,
Xiayu Zhao,
Zhihua Gong,
Yufan Zhang,
Daixuan Li,
Tianle Sang,
Beixiao Zhu,
Gregory Jun,
Yingbing Huang,
Yiqi Liu,
Yuqi Xue,
Rahul Dev Kundu,
Qi Jian Lim,
Yizhou Zhao,
Luke Alexander Granger,
Mohamed Badr Younis,
Darioush Keivan,
Nippun Sabharwal,
Shreyanka Sinha,
Prakhar Agarwal,
Kojo Vandyck,
Hanlin Mai
, et al. (40 additional authors not shown)
Abstract:
Today, industry pioneers dream of developing general-purpose AI engineers capable of designing and building humanity's most ambitious projects--from starships that will carry us to distant worlds to Dyson spheres that harness stellar energy. Yet engineering design represents a fundamentally different challenge for large language models (LLMs) compared to traditional textbook-style problem solving…
▽ More
Today, industry pioneers dream of developing general-purpose AI engineers capable of designing and building humanity's most ambitious projects--from starships that will carry us to distant worlds to Dyson spheres that harness stellar energy. Yet engineering design represents a fundamentally different challenge for large language models (LLMs) compared to traditional textbook-style problem solving or factual question answering. Real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers' time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce ENGDESIGN, an Engineering Design benchmark that evaluates LLMs' abilities to perform practical design tasks across nine engineering domains: Operating System Design, Computer Architecture Design, Control System Design, Mechanical Systems, Structural Design, Digital Hardware Design, Analog Integrated Circuit Design, Robotics, and Signal Processing. Unlike existing benchmarks that focus on factual recall or question answering, ENGDESIGN uniquely emphasizes LLMs' ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented designs. Each task in ENGDESIGN represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. We pioneer a simulation-based evaluation paradigm where LLM-generated designs undergo rigorous testing through executable, domain-specific simulations-from circuit SPICE simulations to structural finite element analysis, from control system validation to robotic motion planning.
△ Less
Submitted 1 July, 2025;
originally announced September 2025.
-
Minimal Semantic Sufficiency Meets Unsupervised Domain Generalization
Authors:
Tan Pan,
Kaiyu Guo,
Dongli Xu,
Zhaorui Tan,
Chen Jiang,
Deshu Chen,
Xin Guo,
Brian C. Lovell,
Limei Han,
Yuan Cheng,
Mahsa Baktashmotlagh
Abstract:
The generalization ability of deep learning has been extensively studied in supervised settings, yet it remains less explored in unsupervised scenarios. Recently, the Unsupervised Domain Generalization (UDG) task has been proposed to enhance the generalization of models trained with prevalent unsupervised learning techniques, such as Self-Supervised Learning (SSL). UDG confronts the challenge of d…
▽ More
The generalization ability of deep learning has been extensively studied in supervised settings, yet it remains less explored in unsupervised scenarios. Recently, the Unsupervised Domain Generalization (UDG) task has been proposed to enhance the generalization of models trained with prevalent unsupervised learning techniques, such as Self-Supervised Learning (SSL). UDG confronts the challenge of distinguishing semantics from variations without category labels. Although some recent methods have employed domain labels to tackle this issue, such domain labels are often unavailable in real-world contexts. In this paper, we address these limitations by formalizing UDG as the task of learning a Minimal Sufficient Semantic Representation: a representation that (i) preserves all semantic information shared across augmented views (sufficiency), and (ii) maximally removes information irrelevant to semantics (minimality). We theoretically ground these objectives from the perspective of information theory, demonstrating that optimizing representations to achieve sufficiency and minimality directly reduces out-of-distribution risk. Practically, we implement this optimization through Minimal-Sufficient UDG (MS-UDG), a learnable model by integrating (a) an InfoNCE-based objective to achieve sufficiency; (b) two complementary components to promote minimality: a novel semantic-variation disentanglement loss and a reconstruction-based mechanism for capturing adequate variation. Empirically, MS-UDG sets a new state-of-the-art on popular unsupervised domain-generalization benchmarks, consistently outperforming existing SSL and UDG methods, without category or domain labels during representation learning.
△ Less
Submitted 24 September, 2025; v1 submitted 19 September, 2025;
originally announced September 2025.
-
Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs
Authors:
Junhong Lin,
Song Wang,
Xiaojie Guo,
Julian Shun,
Yada Zhu
Abstract:
Large language models (LLMs) excel at many language understanding tasks but struggle to reason over knowledge that evolves. To address this, recent work has explored augmenting LLMs with knowledge graphs (KGs) to provide structured, up-to-date information. However, many existing approaches assume a static snapshot of the KG and overlook the temporal dynamics and factual inconsistencies inherent in…
▽ More
Large language models (LLMs) excel at many language understanding tasks but struggle to reason over knowledge that evolves. To address this, recent work has explored augmenting LLMs with knowledge graphs (KGs) to provide structured, up-to-date information. However, many existing approaches assume a static snapshot of the KG and overlook the temporal dynamics and factual inconsistencies inherent in real-world data. To address the challenge of reasoning over temporally shifting knowledge, we propose EvoReasoner, a temporal-aware multi-hop reasoning algorithm that performs global-local entity grounding, multi-route decomposition, and temporally grounded scoring. To ensure that the underlying KG remains accurate and up-to-date, we introduce EvoKG, a noise-tolerant KG evolution module that incrementally updates the KG from unstructured documents through confidence-based contradiction resolution and temporal trend tracking. We evaluate our approach on temporal QA benchmarks and a novel end-to-end setting where the KG is dynamically updated from raw documents. Our method outperforms both prompting-based and KG-enhanced baselines, effectively narrowing the gap between small and large LLMs on dynamic question answering. Notably, an 8B-parameter model using our approach matches the performance of a 671B model prompted seven months later. These results highlight the importance of combining temporal reasoning with KG evolution for robust and up-to-date LLM performance. Our code is publicly available at github.com/junhongmit/TREK.
△ Less
Submitted 18 September, 2025;
originally announced September 2025.
-
First Observation of $Λ$ Hyperon Transverse Polarization in $ψ(3686)\toΛ\barΛ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (687 additional authors not shown)
Abstract:
Based on $(448.1\pm2.9)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we present the first observation of spin transverse polarization of $Λ$ and $\barΛ$ hyperons produced coherently in the decay $ψ(3686)\toΛ(\to pπ^-)\barΛ(\to\bar pπ^+)$. The relative phase between the electric and magnetic hadronic form factors is measured to be…
▽ More
Based on $(448.1\pm2.9)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we present the first observation of spin transverse polarization of $Λ$ and $\barΛ$ hyperons produced coherently in the decay $ψ(3686)\toΛ(\to pπ^-)\barΛ(\to\bar pπ^+)$. The relative phase between the electric and magnetic hadronic form factors is measured to be $ΔΦ=(21.0\pm3.7_{\rm stat.}\pm0.8_{\rm syst.})^{\circ}$. The angular distribution parameter $α_ψ=0.83\pm0.02_{\rm stat.}\pm0.01_{\rm syst.}$ is determined with a precision improved by a factor of 3.7 compared to the previous measurement. The relative phase between the $S$- and $D$-wave amplitudes for $Λ\barΛ$ is observed, and the effective interaction radius is determined to be $0.0450\pm0.0026_{\rm stat.}\pm0.0012_{\rm syst.}$ fm. These results provide new insights into the strong interaction mechanisms and the internal structure of baryons.
△ Less
Submitted 18 September, 2025;
originally announced September 2025.
-
Shift-Left Techniques in Electronic Design Automation: A Survey
Authors:
Xinyue Wu,
Zixuan Li,
Fan Hu,
Ting Lin,
Xiaotian Zhao,
Runxi Wang,
Xinfei Guo
Abstract:
The chip design process involves numerous steps, beginning with defining product requirements and progressing through architectural planning, system-level design, and the physical layout of individual circuit blocks. As the enablers of large-scale chip development, Electronic Design Automation (EDA) tools play a vital role in helping designers achieve high-quality results. The Shift-Left methodolo…
▽ More
The chip design process involves numerous steps, beginning with defining product requirements and progressing through architectural planning, system-level design, and the physical layout of individual circuit blocks. As the enablers of large-scale chip development, Electronic Design Automation (EDA) tools play a vital role in helping designers achieve high-quality results. The Shift-Left methodology introduces a pathway toward creating digital twins and fusing multiple design steps, thereby transitioning traditionally sequential, physically-aware processes into virtual design environments. This shift allows designers to establish stronger correlations earlier and optimize designs more effectively. However, challenges remain, especially in accurately replicating downstream behaviors and determining the right scope and timing for adoption. These challenges, in turn, have revealed new opportunities for EDA vendors, physical designers, and logic designers alike. As the industry advances toward intelligent EDA tools and techniques, it is timely to reflect on Shift-Left progress made and the challenges that remain. The rise of AI techniques and the momentum of open-source design flows have significantly strengthened prediction and modeling capabilities, making data-driven methods increasingly relevant to the EDA community. This, in turn, enhances the ''Shift-Left'' features embedded in current tools. In this paper, we present a comprehensive survey of existing and emerging paradigms in Shift-Left research within EDA and the broader design ecosystem. Our goal is to provide a unique perspective on the state of the field and its future directions. Relevant papers mentioned are organized in https://github.com/iCAS-SJTU/Shift-Left-EDA-Papers.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
Authors:
Xifeng Yao,
Dongyu Lang,
Wu Zhang,
Xintong Guo,
Huarui Xie,
Yinhao Ni,
Ping Liu,
Guang Shen,
Yi Bai,
Dandan Tu,
Changzheng Zhang
Abstract:
Significant advancements have been made in the capabilities of code large language models, leading to their rapid adoption and application across a wide range of domains. However, their further advancements are often constrained by the scarcity of real-world coding problems. To bridge this gap, we propose a novel framework for synthesizing code problems that emulate authentic real-world scenarios.…
▽ More
Significant advancements have been made in the capabilities of code large language models, leading to their rapid adoption and application across a wide range of domains. However, their further advancements are often constrained by the scarcity of real-world coding problems. To bridge this gap, we propose a novel framework for synthesizing code problems that emulate authentic real-world scenarios. This framework systematically integrates domain knowledge, domain skills, and coding skills, all of which are meticulously extracted from real-world programming-related datasets, including Stack Overflow and Kaggle. The extracted elements serve as the foundational building blocks for constructing code problems. To align the generated problems with practical applications, application scenarios are also mined from the aforementioned datasets. These scenarios are then utilized to construct a scenario-centric graph that interconnects domain knowledge, domain skills, and coding skills. Based on this structured representation, a sampling strategy on the graph is designed, which effectively controls the generation of a code problem with complexity and diversity, reflects real-world challenges. Experimental results demonstrate that the proposed method consistently achieves superior performance over state-of-the-art open-source large language models of varying sizes and functionalities, including both coders and general-purpose models, across a diverse set of real-world benchmarks.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
Authors:
Colin Hong,
Xu Guo,
Anand Chaanan Singh,
Esha Choukse,
Dmitrii Ustiugov
Abstract:
Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior…
▽ More
Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior attempts to accelerate SC mainly rely on model-based confidence scores or heuristics with limited empirical support. For the first time, we theoretically and empirically analyze the inefficiencies of SC and reveal actionable opportunities for improvement. Building on these insights, we propose Slim-SC, a step-wise pruning strategy that identifies and removes redundant chains using inter-chain similarity at the thought level. Experiments on three STEM reasoning datasets and two recent LLM architectures show that Slim-SC reduces inference latency and KVC usage by up to 45% and 26%, respectively, with R1-Distill, while maintaining or improving accuracy, thus offering a simple yet efficient TTS alternative for SC.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
Tuning Coupled Toroidic and Polar Orders in a Bilayer Antiferromagnet
Authors:
Chuangtang Wang,
Xiaoyu Guo,
Zixin Zhai,
Meixin Cheng,
Sang-Wook Cheong,
Adam W. Tsen,
Bing Lv,
Liuyan Zhao
Abstract:
Magnetic toroidal order features a loop-like arrangement of magnetic dipole moments, thus breaking both spatial inversion (P) and time-reversal (T) symmetries while preserving their combined PT sym-metry. This PT symmetry enables a linear magnetoelectric effect, allowing the coupling between magnetic toroidicity and electric polarity. However, the detection and control of two-dimensional (2D) magn…
▽ More
Magnetic toroidal order features a loop-like arrangement of magnetic dipole moments, thus breaking both spatial inversion (P) and time-reversal (T) symmetries while preserving their combined PT sym-metry. This PT symmetry enables a linear magnetoelectric effect, allowing the coupling between magnetic toroidicity and electric polarity. However, the detection and control of two-dimensional (2D) magnetic toroidal order and the investigation of its linear magnetoelectric response remain largely unexplored. Here, using bilayer CrSBr as a platform, which hosts an in-plane layer-antiferromagnetic (AFM) order and simultaneously exhibits a magnetic toroidal order, we show compelling evidence for tuning this 2D magnetic toroidicity and its induced electric polarity through magnetic-field-depend-ent second harmonic generation (SHG). Under an out-of-plane magnetic field, we decompose the SHG signal into a time-reversal-odd component that scales with the magnetic toroidal moment and a time-reversal-even component that is proportional to the electric polarization. When sweeping the magnetic field from positive to negative values, we observe that the magnetic toroidicity retains its sign but diminishes in magnitude at higher fields while the electric polarity flips its sign and increases in strength at increasing fields below a critical threshold. When applying an in-plane electric field along the Néel vector direction, together with an out-of-plane field, we find that the magnetic toroidal and electric polar domains are moved in a locked fashion. These findings underscore the promise of 2D magnetic toroidal order in realizing giant linear magnetoelectric effects, opening exciting possi-bilities for next-generation electronic, magnetic, optical, and photonic devices enabled by 2D mag-netoelectrics.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
StereoCarla: A High-Fidelity Driving Dataset for Generalizable Stereo
Authors:
Xianda Guo,
Chenming Zhang,
Ruilin Wang,
Youmin Zhang,
Wenzhao Zheng,
Matteo Poggi,
Hao Zhao,
Qin Zou,
Long Chen
Abstract:
Stereo matching plays a crucial role in enabling depth perception for autonomous driving and robotics. While recent years have witnessed remarkable progress in stereo matching algorithms, largely driven by learning-based methods and synthetic datasets, the generalization performance of these models remains constrained by the limited diversity of existing training data. To address these challenges,…
▽ More
Stereo matching plays a crucial role in enabling depth perception for autonomous driving and robotics. While recent years have witnessed remarkable progress in stereo matching algorithms, largely driven by learning-based methods and synthetic datasets, the generalization performance of these models remains constrained by the limited diversity of existing training data. To address these challenges, we present StereoCarla, a high-fidelity synthetic stereo dataset specifically designed for autonomous driving scenarios. Built on the CARLA simulator, StereoCarla incorporates a wide range of camera configurations, including diverse baselines, viewpoints, and sensor placements as well as varied environmental conditions such as lighting changes, weather effects, and road geometries. We conduct comprehensive cross-domain experiments across four standard evaluation datasets (KITTI2012, KITTI2015, Middlebury, ETH3D) and demonstrate that models trained on StereoCarla outperform those trained on 11 existing stereo datasets in terms of generalization accuracy across multiple benchmarks. Furthermore, when integrated into multi-dataset training, StereoCarla contributes substantial improvements to generalization accuracy, highlighting its compatibility and scalability. This dataset provides a valuable benchmark for developing and evaluating stereo algorithms under realistic, diverse, and controllable settings, facilitating more robust depth perception systems for autonomous vehicles. Code can be available at https://github.com/XiandaGuo/OpenStereo, and data can be available at https://xiandaguo.net/StereoCarla.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
OASIS: A Deep Learning Framework for Universal Spectroscopic Analysis Driven by Novel Loss Functions
Authors:
Chris Young,
Juejing Liu,
Marie L. Mortensen,
Yifu Feng,
Elizabeth Li,
Zheming Wang,
Xiaofeng Guo,
Kevin M. Rosso,
Xin Zhang
Abstract:
The proliferation of spectroscopic data across various scientific and engineering fields necessitates automated processing. We introduce OASIS (Omni-purpose Analysis of Spectra via Intelligent Systems), a machine learning (ML) framework for technique-independent, automated spectral analysis, encompassing denoising, baseline correction, and comprehensive peak parameter (location, intensity, FWHM) r…
▽ More
The proliferation of spectroscopic data across various scientific and engineering fields necessitates automated processing. We introduce OASIS (Omni-purpose Analysis of Spectra via Intelligent Systems), a machine learning (ML) framework for technique-independent, automated spectral analysis, encompassing denoising, baseline correction, and comprehensive peak parameter (location, intensity, FWHM) retrieval without human intervention. OASIS achieves its versatility through models trained on a strategically designed synthetic dataset incorporating features from numerous spectroscopy techniques. Critically, the development of innovative, task-specific loss functions-such as the vicinity peak response (ViPeR) for peak localization-enabled the creation of compact yet highly accurate models from this dataset, validated with experimental data from Raman, UV-vis, and fluorescence spectroscopy. OASIS demonstrates significant potential for applications including in situ experiments, high-throughput optimization, and online monitoring. This study underscores the optimization of the loss function as a key resource-efficient strategy to develop high-performance ML models.
△ Less
Submitted 14 September, 2025;
originally announced September 2025.
-
TUNI: Real-time RGB-T Semantic Segmentation with Unified Multi-Modal Feature Extraction and Cross-Modal Feature Fusion
Authors:
Xiaodong Guo,
Tong Liu,
Yike Li,
Zi'ang Lin,
Zhihong Deng
Abstract:
RGB-thermal (RGB-T) semantic segmentation improves the environmental perception of autonomous platforms in challenging conditions. Prevailing models employ encoders pre-trained on RGB images to extract features from both RGB and infrared inputs, and design additional modules to achieve cross-modal feature fusion. This results in limited thermal feature extraction and suboptimal cross-modal fusion,…
▽ More
RGB-thermal (RGB-T) semantic segmentation improves the environmental perception of autonomous platforms in challenging conditions. Prevailing models employ encoders pre-trained on RGB images to extract features from both RGB and infrared inputs, and design additional modules to achieve cross-modal feature fusion. This results in limited thermal feature extraction and suboptimal cross-modal fusion, while the redundant encoders further compromises the model's real-time efficiency. To address the above issues, we propose TUNI, with an RGB-T encoder consisting of multiple stacked blocks that simultaneously perform multi-modal feature extraction and cross-modal fusion. By leveraging large-scale pre-training with RGB and pseudo-thermal data, the RGB-T encoder learns to integrate feature extraction and fusion in a unified manner. By slimming down the thermal branch, the encoder achieves a more compact architecture. Moreover, we introduce an RGB-T local module to strengthen the encoder's capacity for cross-modal local feature fusion. The RGB-T local module employs adaptive cosine similarity to selectively emphasize salient consistent and distinct local features across RGB-T modalities. Experimental results show that TUNI achieves competitive performance with state-of-the-art models on FMB, PST900 and CART, with fewer parameters and lower computational cost. Meanwhile, it achieves an inference speed of 27 FPS on a Jetson Orin NX, demonstrating its real-time capability in deployment. Codes are available at https://github.com/xiaodonguo/TUNI.
△ Less
Submitted 12 September, 2025;
originally announced September 2025.
-
Optimal Inference of the Mean Outcome under Optimal Treatment Regime
Authors:
Shuoxun Xu,
Xinzhou Guo
Abstract:
When an optimal treatment regime (OTR) is considered, we need to evaluate the OTR in a valid and efficient way. The classical inference applied to the mean outcome under OTR, assuming the OTR is the same as the estimated OTR, might be biased when the regularity assumption that OTR is unique is violated. Although several methods have been proposed to allow nonregularity in such inference, its optim…
▽ More
When an optimal treatment regime (OTR) is considered, we need to evaluate the OTR in a valid and efficient way. The classical inference applied to the mean outcome under OTR, assuming the OTR is the same as the estimated OTR, might be biased when the regularity assumption that OTR is unique is violated. Although several methods have been proposed to allow nonregularity in such inference, its optimality is unclear due to challenges in deriving semiparametric efficiency bounds under potential nonregularity. In this paper, we address the bias issue via adaptive smoothing over the estimated OTR and develop a valid inference procedure on the mean outcome under OTR regardless of whether regularity is satisfied. We establish the optimality of the proposed method by deriving a lower bound of the asymptotic variance for the robust asymptotically linear unbiased estimator to the mean outcome under OTR and showing that our proposed estimator achieves the variance lower bound. The considered estimator class is general and the derived variance lower bound paves a novel way to establish efficiency optimality theories for OTR in a more general scenario allowing nonregularity. The merit of the proposed method is demonstrated by re-analyzing the ACTG 175 trial.
△ Less
Submitted 11 September, 2025;
originally announced September 2025.
-
Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
Authors:
Haoran Wu,
Can Xiao,
Jiayi Nie,
Xuan Guo,
Binglei Lou,
Jeffrey T. H. Wong,
Zhiwen Mo,
Cheng Zhang,
Przemyslaw Forys,
Wayne Luk,
Hongxiang Fan,
Jianyi Cheng,
Timothy M. Jones,
Rika Antonova,
Robert Mullins,
Aaron Zhao
Abstract:
LLMs now form the backbone of AI agents for a diverse array of applications, including tool use, command-line agents, and web or computer use agents. These agentic LLM inference tasks are fundamentally different from chatbot-focused inference -- they often have much larger context lengths to capture complex, prolonged inputs, such as entire webpage DOMs or complicated tool call trajectories. This,…
▽ More
LLMs now form the backbone of AI agents for a diverse array of applications, including tool use, command-line agents, and web or computer use agents. These agentic LLM inference tasks are fundamentally different from chatbot-focused inference -- they often have much larger context lengths to capture complex, prolonged inputs, such as entire webpage DOMs or complicated tool call trajectories. This, in turn, generates significant off-chip memory traffic for the underlying hardware at the inference stage and causes the workload to be constrained by two memory walls, namely the bandwidth and capacity memory walls, preventing the on-chip compute units from achieving high utilization.
In this paper, we introduce PLENA, a hardware-software co-designed system that applies three core optimization pathways to tackle these challenges. PLENA includes an efficient hardware implementation of compute and memory units supporting an asymmetric quantization scheme. PLENA also features a novel flattened systolic array architecture that has native support for FlashAttention to tackle these memory walls in the scenario of inference serving for long-context LLMs. Additionally, PLENA is developed with a complete stack, including a custom ISA, a compiler, a cycle-emulated simulator, and an automated design space exploration flow. The simulated results show that PLENA achieves up to 8.5x higher utilization than existing accelerators, and delivers 2.24x higher throughput than the A100 GPU and 3.85x higher throughput than the TPU v6e, under the same multiplier count and memory settings. The full PLENA system will also be open-sourced.
△ Less
Submitted 24 September, 2025; v1 submitted 11 September, 2025;
originally announced September 2025.
-
Intertwined polar, chiral, and ferro-rotational orders in a rotation-only insulator
Authors:
Weizhe Zhang,
June Ho Yeo,
Xiaoyu Guo,
Tony Chiang,
Nishkarsh Agarwal,
John T. Heron,
Kai Sun,
Junjie Yang,
Sang-Wook Cheong,
Youngjun Ahn,
Liuyan Zhao
Abstract:
Intertwined orders refer to strongly coupled and mutually dependent orders that coexist in correlated electron systems, often underpinning key physical properties of the host materials. Among them, polar, chiral, and ferro-rotational orders have been theoretically known to form a closed set of intertwined orders. However, experimental investigation into their mutual coupling and physical consequen…
▽ More
Intertwined orders refer to strongly coupled and mutually dependent orders that coexist in correlated electron systems, often underpinning key physical properties of the host materials. Among them, polar, chiral, and ferro-rotational orders have been theoretically known to form a closed set of intertwined orders. However, experimental investigation into their mutual coupling and physical consequences has remained elusive. In this work, we employ the polar-chiral insulator Ni$_3$TeO$_6$ as a platform and utilize a multimodal optical approach to directly probe and reveal the intertwining among polarity, chirality, and ferro-rotational order. We demonstrate how their coupling governs the formation of domains and dictates the nature of domain walls. Within the domains, we identify spatial inversion symmetry as the operation connecting two domain states of opposite polarity and chirality, with a common ferro-rotational state serving as the prerequisite for these interlocked configurations. At the domain walls, we observe a pronounced enhancement of in-plane polarization accompanied by a suppression of chirality. By combining with Ginzburg-Landau theory within the framework of a pre-existing ferro-rotational background, we uncover the emergence of mixed Néel- and Bloch-type domain walls. Our findings highlight the critical role of intertwined orders in defining domain and domain wall characteristics and open pathways for domain switching and domain wall control via intertwined order parameters.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference
Authors:
Xiyu Guo,
Shan Wang,
Chunfang Ji,
Xuefeng Zhao,
Wenhao Xi,
Yaoyao Liu,
Qinglan Li,
Chao Deng,
Junlan Feng
Abstract:
The rapid advancement of large language models (LLMs) and domain-specific AI agents has greatly expanded the ecosystem of AI-powered services. User queries, however, are highly diverse and often span multiple domains and task types, resulting in a complex and heterogeneous landscape. This diversity presents a fundamental routing challenge: how to accurately direct each query to an appropriate exec…
▽ More
The rapid advancement of large language models (LLMs) and domain-specific AI agents has greatly expanded the ecosystem of AI-powered services. User queries, however, are highly diverse and often span multiple domains and task types, resulting in a complex and heterogeneous landscape. This diversity presents a fundamental routing challenge: how to accurately direct each query to an appropriate execution unit while optimizing both performance and efficiency. To address this, we propose MoMA (Mixture of Models and Agents), a generalized routing framework that integrates both LLM and agent-based routing. Built upon a deep understanding of model and agent capabilities, MoMA effectively handles diverse queries through precise intent recognition and adaptive routing strategies, achieving an optimal balance between efficiency and cost. Specifically, we construct a detailed training dataset to profile the capabilities of various LLMs under different routing model structures, identifying the most suitable tasks for each LLM. During inference, queries are dynamically routed to the LLM with the best cost-performance efficiency. We also introduce an efficient agent selection strategy based on a context-aware state machine and dynamic masking. Experimental results demonstrate that the MoMA router offers superior cost-efficiency and scalability compared to existing approaches.
△ Less
Submitted 10 September, 2025; v1 submitted 9 September, 2025;
originally announced September 2025.
-
Double Machine Learning for Estimating Time-Varying Delayed and Instantaneous Effects Using Digital Phenotypes
Authors:
Xingche Guo,
Zexi Cai,
Yuanjia Wang,
Donglin Zeng
Abstract:
Mobile health (mHealth) leverages digital technologies, such as mobile phones, to capture objective, frequent, and real-world digital phenotypes from individuals, enabling the delivery of tailored interventions to accommodate substantial between-subject and temporal heterogeneity. However, evaluating heterogeneous treatment effects from digital phenotype data is challenging due to the dynamic natu…
▽ More
Mobile health (mHealth) leverages digital technologies, such as mobile phones, to capture objective, frequent, and real-world digital phenotypes from individuals, enabling the delivery of tailored interventions to accommodate substantial between-subject and temporal heterogeneity. However, evaluating heterogeneous treatment effects from digital phenotype data is challenging due to the dynamic nature of treatments and the presence of delayed effects that extend beyond immediate responses. Additionally, modeling observational data is complicated by confounding factors. To address these challenges, we propose a double machine learning (DML) method designed to estimate both time-varying instantaneous and delayed treatment effects using digital phenotypes. Our approach uses a sequential procedure to estimate the treatment effects based on a DML estimator to ensure Neyman orthogonality. We establish the asymptotic normality of the proposed estimator. Extensive simulation studies validate the finite-sample performance of our approach, demonstrating the advantages of DML and the decomposition of treatment effects. We apply our method to an mHealth study on Parkinson's disease (PD), where we find that the treatment is significantly more effective for younger PD patients and maintains greater stability over time for individuals with low motor fluctuations. These findings demonstrate the utility of our proposed method in advancing precision medicine in mHealth studies.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
UniSearch: Rethinking Search System with a Unified Generative Architecture
Authors:
Jiahui Chen,
Xiaoze Jiang,
Zhibo Wang,
Quanzhi Zhu,
Junyao Zhao,
Feng Hu,
Kang Pan,
Ao Xie,
Maohua Pei,
Zhiheng Qin,
Hongjing Zhang,
Zhixin Zhai,
Xiaobo Guo,
Runbin Zhou,
Kefeng Wang,
Mingyang Geng,
Cheng Chen,
Jingshan Lv,
Yupeng Huang,
Xiao Liang,
Han Li
Abstract:
Modern search systems play a crucial role in facilitating information acquisition. Traditional search engines typically rely on a cascaded architecture, where results are retrieved through recall, pre-ranking, and ranking stages. The complexity of designing and maintaining multiple modules makes it difficult to achieve holistic performance gains. Recent advances in generative recommendation have m…
▽ More
Modern search systems play a crucial role in facilitating information acquisition. Traditional search engines typically rely on a cascaded architecture, where results are retrieved through recall, pre-ranking, and ranking stages. The complexity of designing and maintaining multiple modules makes it difficult to achieve holistic performance gains. Recent advances in generative recommendation have motivated the exploration of unified generative search as an alternative. However, existing approaches are not genuinely end-to-end: they typically train an item encoder to tokenize candidates first and then optimize a generator separately, leading to objective inconsistency and limited generalization. To address these limitations, we propose UniSearch, a unified generative search framework for Kuaishou Search. UniSearch replaces the cascaded pipeline with an end-to-end architecture that integrates a Search Generator and a Video Encoder. The Generator produces semantic identifiers of relevant items given a user query, while the Video Encoder learns latent item embeddings and provides their tokenized representations. A unified training framework jointly optimizes both components, enabling mutual enhancement and improving representation quality and generation accuracy. Furthermore, we introduce Search Preference Optimization (SPO), which leverages a reward model and real user feedback to better align generation with user preferences. Extensive experiments on industrial-scale datasets, together with online A/B testing in both short-video and live search scenarios, demonstrate the strong effectiveness and deployment potential of UniSearch. Notably, its deployment in live search yields the largest single-experiment improvement in recent years of our product's history, highlighting its practical value for real-world applications.
△ Less
Submitted 10 September, 2025; v1 submitted 8 September, 2025;
originally announced September 2025.
-
SynthDrive: Scalable Real2Sim2Real Sensor Simulation Pipeline for High-Fidelity Asset Generation and Driving Data Synthesis
Authors:
Zhengqing Chen,
Ruohong Mei,
Xiaoyang Guo,
Qingjie Wang,
Yubin Hu,
Wei Yin,
Weiqiang Ren,
Qian Zhang
Abstract:
In the field of autonomous driving, sensor simulation is essential for generating rare and diverse scenarios that are difficult to capture in real-world environments. Current solutions fall into two categories: 1) CG-based methods, such as CARLA, which lack diversity and struggle to scale to the vast array of rare cases required for robust perception training; and 2) learning-based approaches, suc…
▽ More
In the field of autonomous driving, sensor simulation is essential for generating rare and diverse scenarios that are difficult to capture in real-world environments. Current solutions fall into two categories: 1) CG-based methods, such as CARLA, which lack diversity and struggle to scale to the vast array of rare cases required for robust perception training; and 2) learning-based approaches, such as NeuSim, which are limited to specific object categories (vehicles) and require extensive multi-sensor data, hindering their applicability to generic objects. To address these limitations, we propose a scalable real2sim2real system that leverages 3D generation to automate asset mining, generation, and rare-case data synthesis.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
MeanFlow-Accelerated Multimodal Video-to-Audio Synthesis via One-Step Generation
Authors:
Xiaoran Yang,
Jianxuan Yang,
Xinyue Guo,
Haoyu Wang,
Ningning Pan,
Gongping Huang
Abstract:
A key challenge in synthesizing audios from silent videos is the inherent trade-off between synthesis quality and inference efficiency in existing methods. For instance, flow matching based models rely on modeling instantaneous velocity, inherently require an iterative sampling process, leading to slow inference speeds. To address this efficiency bottleneck, we introduce a MeanFlow-accelerated mod…
▽ More
A key challenge in synthesizing audios from silent videos is the inherent trade-off between synthesis quality and inference efficiency in existing methods. For instance, flow matching based models rely on modeling instantaneous velocity, inherently require an iterative sampling process, leading to slow inference speeds. To address this efficiency bottleneck, we introduce a MeanFlow-accelerated model that characterizes flow fields using average velocity, enabling one-step generation and thereby significantly accelerating multimodal video-to-audio (VTA) synthesis while preserving audio quality, semantic alignment, and temporal synchronization. Furthermore, a scalar rescaling mechanism is employed to balance conditional and unconditional predictions when classifier-free guidance (CFG) is applied, effectively mitigating CFG-induced distortions in one step generation. Since the audio synthesis network is jointly trained with multimodal conditions, we further evaluate it on text-to-audio (TTA) synthesis task. Experimental results demonstrate that incorporating MeanFlow into the network significantly improves inference speed without compromising perceptual quality on both VTA and TTA synthesis tasks.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Theory of Localized States in Quasiperiodic Lattices
Authors:
Jin-Rong Chen,
Xin-Yu Guo,
Shi-Ping Ding,
Tian-Le Wu,
Miao Liang,
Jin-Hua Gao,
X. C. Xie
Abstract:
The physics of localized states in quasiperiodic lattices has been extensively studied for decades, but still lacks an comprehensive theoretical framework. Recently, we developed a incommensurate energy band (IEB) theory, which extends the concept of energy bands to quasiperiodic systems lacking translational symmetry, thereby achieving a breakthrough in elucidating extended states. Here, we demon…
▽ More
The physics of localized states in quasiperiodic lattices has been extensively studied for decades, but still lacks an comprehensive theoretical framework. Recently, we developed a incommensurate energy band (IEB) theory, which extends the concept of energy bands to quasiperiodic systems lacking translational symmetry, thereby achieving a breakthrough in elucidating extended states. Here, we demonstrate that, due to the inherent duality between momentum and real space, the IEB theory also offers a comprehensive framework for elucidating localized states. Specifically, via a so-called spiral (module) mapping, the energy spectrum of localized states can be represented as a function defined on a compact circular manifold-akin to the Brillouin zone-whose form resembles conventional energy bands. These localized state energy bands (LSEBs) fully characterize all the properties of the localized states. Moreover, we show that quasiperiodic systems with mobility edges exhibit a unique hybrid band structure: the IEB for extended states (momentum space) and LSEB for localized states (real space), separated by mobility edges. Our theory thus establishes a comprehensive framework for analyzing the localized states in quasiperiodic lattices.
△ Less
Submitted 7 September, 2025;
originally announced September 2025.
-
Preparation and measurement of an $\rm ^{37}$Ar source for liquid xenon detector calibration
Authors:
Xu-Nan Guo,
Chang Cai,
Fei Gao,
Yang Lei,
Kai-Hang Li,
Chun-Lei Su,
Ze-Peng Wu,
Xiang Xiao,
Ling-Feng Xie,
Yi-Fei Zhao,
Xiao-Peng Zhou
Abstract:
We present the preparation and measurement of the radioactive isotope $\rm ^{37}Ar$, which was produced using thermal neutrons from a reactor, as a calibration source for liquid xenon time projection chambers. $\rm ^{37}Ar$ is a low-energy calibration source with a half-life of 35.01 days, making it suitable for calibration in the low-energy region of liquid xenon dark-matter experiments. Radioact…
▽ More
We present the preparation and measurement of the radioactive isotope $\rm ^{37}Ar$, which was produced using thermal neutrons from a reactor, as a calibration source for liquid xenon time projection chambers. $\rm ^{37}Ar$ is a low-energy calibration source with a half-life of 35.01 days, making it suitable for calibration in the low-energy region of liquid xenon dark-matter experiments. Radioactive isotope $\rm ^{37}Ar$ was produced by irradiating $\rm ^{36}Ar$ with thermal neutrons. It was subsequently measured in a gaseous xenon time projection chamber (GXe TPC) to validate its radioactivity. Our results demonstrate that $\rm ^{37}Ar$ is an effective and viable calibration source that offers precise calibration capabilities in the low-energy domain of xenon-based detectors.
△ Less
Submitted 5 September, 2025;
originally announced September 2025.
-
Bright siren without electromagnetic counterpart by LISA-Taiji-TianQin network
Authors:
Yejing Zhan,
David Izquierdo-Villalba,
Xiao Guo,
Qing Yang,
Daniele Spinoso,
Fa-Yin Wang
Abstract:
Gravitational waves (GWs) with electromagnetic counterparts (EMc) offer a novel approach to measure the Hubble constant ($H_0$), known as bright sirens, enabling $H_0$ measurements by combining GW-derived distances with EM-derived redshifts. Host galaxy identification is essential for redshift determination but remains challenging due to poor GW sky localization and uncertainties in EMc models. To…
▽ More
Gravitational waves (GWs) with electromagnetic counterparts (EMc) offer a novel approach to measure the Hubble constant ($H_0$), known as bright sirens, enabling $H_0$ measurements by combining GW-derived distances with EM-derived redshifts. Host galaxy identification is essential for redshift determination but remains challenging due to poor GW sky localization and uncertainties in EMc models. To overcome these limitations, we exploit the ultra-high-precision localization ($ΔΩ_s \sim 10^{-4} \, \text{deg}^2$) with a space-based GW detector network (LISA-Taiji-TianQin), which permits unique host identification solely from GW signals. We integrate five massive black hole binary (MBHB) population models and two galaxy number density models to compute the redshift horizon for host galaxy identification and evaluate $H_0$ constraints. We find that (1) The network enhances localization by several orders of magnitude compared to single detectors; (2) The identification horizon reaches $z\sim 1.2$ for specific MBHBs in the most accurate localization case; (3) The population model choice critically impacts the outcomes: the most refined population models yield to independent EMc identification rate of 0.6-1 $\text{yr}^{-1}$ with $H_0$ constraints $< 1\%$ fractional uncertainty, the less refined models lead to the rate $<0.1\text{yr}^{-1}$ and $1-2\%$ uncertainty on $H_0$.
△ Less
Submitted 4 September, 2025;
originally announced September 2025.
-
OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction
Authors:
Bu Jin,
Songen Gu,
Xiaotao Hu,
Yupeng Zheng,
Xiaoyang Guo,
Qian Zhang,
Xiaoxiao Long,
Wei Yin
Abstract:
In this paper, we propose OccTENS, a generative occupancy world model that enables controllable, high-fidelity long-term occupancy generation while maintaining computational efficiency. Different from visual generation, the occupancy world model must capture the fine-grained 3D geometry and dynamic evolution of the 3D scenes, posing great challenges for the generative models. Recent approaches bas…
▽ More
In this paper, we propose OccTENS, a generative occupancy world model that enables controllable, high-fidelity long-term occupancy generation while maintaining computational efficiency. Different from visual generation, the occupancy world model must capture the fine-grained 3D geometry and dynamic evolution of the 3D scenes, posing great challenges for the generative models. Recent approaches based on autoregression (AR) have demonstrated the potential to predict vehicle movement and future occupancy scenes simultaneously from historical observations, but they typically suffer from \textbf{inefficiency}, \textbf{temporal degradation} in long-term generation and \textbf{lack of controllability}. To holistically address these issues, we reformulate the occupancy world model as a temporal next-scale prediction (TENS) task, which decomposes the temporal sequence modeling problem into the modeling of spatial scale-by-scale generation and temporal scene-by-scene prediction. With a \textbf{TensFormer}, OccTENS can effectively manage the temporal causality and spatial relationships of occupancy sequences in a flexible and scalable way. To enhance the pose controllability, we further propose a holistic pose aggregation strategy, which features a unified sequence modeling for occupancy and ego-motion. Experiments show that OccTENS outperforms the state-of-the-art method with both higher occupancy quality and faster inference time.
△ Less
Submitted 4 September, 2025;
originally announced September 2025.
-
PTA Frequency Band Individual Gravitational Wave Sources and Dark Energy Detection Based on Cosmological Simulation
Authors:
Qing Yang,
Gu-yue Zhang,
Yi Huang,
Xiao Guo
Abstract:
Nanohertz gravitational waves (GWs) from supermassive binary black holes (SMBBHs), detectable via pulsar timing arrays (PTAs), offer a novel avenue to constrain dark energy. Based on cosmological simulations and semi-analytic galaxy formation models, this study explores the detectability of individual nanohertz SMBBH sources using next-generation PTAs and their potential for constraining dark ener…
▽ More
Nanohertz gravitational waves (GWs) from supermassive binary black holes (SMBBHs), detectable via pulsar timing arrays (PTAs), offer a novel avenue to constrain dark energy. Based on cosmological simulations and semi-analytic galaxy formation models, this study explores the detectability of individual nanohertz SMBBH sources using next-generation PTAs and their potential for constraining dark energy under an optimistic scenario considering only the presence of white noise. By constructing light-cone SMBBH populations across hardening timescales ($τ_H = 0.1/5/10$Gyr) and computing signal-to-noise ratios (SNR), we find advanced PTAs can resolve $10^2$--$10^3$ sources with SNR $> 8$ (primarily at $z < 1$ with chirp masses of $10^8$--$10^{10}M_{\odot}$). If electromagnetic counterparts can be identified, optimal configurations ($σ_t = 50$ns, $N_p = 1000$, $T_{\text{obs}} = 30$yr with$ τ_H \leq 5$Gyr) could constrain the dark energy equation-of-state (EoS) parameter $w$ to $Δw \sim 0.023$--$0.048$, where the constraints only exhibit weak dependence on $τ_H$ within $0.1$--$5$Gyr. If only $10\%$ of GW sources have detectable electromagnetic counterparts, constraints weaken to $Δw = 0.075$ ($τ_H = 0.1$Gyr) and $Δw = 0.162$ ($τ_H = 5$Gyr) under the most optimal parameter configuration. What's more, conservative PTAs ($N_p = 500$, $σ_t = 100$--$200$ns) with additional $30$-year data accumulation could double resolvable source counts and improve $Δw$ precision by $\sim 40\%$.
△ Less
Submitted 3 September, 2025;
originally announced September 2025.
-
OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search
Authors:
Ben Chen,
Xian Guo,
Siyuan Wang,
Zihan Liang,
Yue Lv,
Yufei Ma,
Xinlong Xiao,
Bowen Xue,
Xuxin Zhang,
Ying Yang,
Huangyu Dai,
Xing Xu,
Tong Zhao,
Mingcan Peng,
Xiaoyang Zheng,
Chao Wang,
Qihang Zhao,
Zhixin Zhai,
Yang Zhao,
Bochao Liu,
Jingshan Lv,
Xiao Liang,
Yuqing Ding,
Jing Chen,
Chenyi Lei
, et al. (3 additional authors not shown)
Abstract:
Traditional e-commerce search systems employ multi-stage cascading architectures (MCA) that progressively filter items through recall, pre-ranking, and ranking stages. While effective at balancing computational efficiency with business conversion, these systems suffer from fragmented computation and optimization objective collisions across stages, which ultimately limit their performance ceiling.…
▽ More
Traditional e-commerce search systems employ multi-stage cascading architectures (MCA) that progressively filter items through recall, pre-ranking, and ranking stages. While effective at balancing computational efficiency with business conversion, these systems suffer from fragmented computation and optimization objective collisions across stages, which ultimately limit their performance ceiling. To address these, we propose \textbf{OneSearch}, the first industrial-deployed end-to-end generative framework for e-commerce search. This framework introduces three key innovations: (1) a Keyword-enhanced Hierarchical Quantization Encoding (KHQE) module, to preserve both hierarchical semantics and distinctive item attributes while maintaining strong query-item relevance constraints; (2) a multi-view user behavior sequence injection strategy that constructs behavior-driven user IDs and incorporates both explicit short-term and implicit long-term sequences to model user preferences comprehensively; and (3) a Preference-Aware Reward System (PARS) featuring multi-stage supervised fine-tuning and adaptive reward-weighted ranking to capture fine-grained user preferences. Extensive offline evaluations on large-scale industry datasets demonstrate OneSearch's superior performance for high-quality recall and ranking. The rigorous online A/B tests confirm its ability to enhance relevance in the same exposure position, achieving statistically significant improvements: +1.67% item CTR, +2.40% buyer, and +3.22% order volume. Furthermore, OneSearch reduces operational expenditure by 75.40% and improves Model FLOPs Utilization from 3.26% to 27.32%. The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users, generating tens of millions of PVs daily.
△ Less
Submitted 30 September, 2025; v1 submitted 3 September, 2025;
originally announced September 2025.
-
DroneSR: Rethinking Few-shot Thermal Image Super-Resolution from Drone-based Perspective
Authors:
Zhipeng Weng,
Xiaopeng Liu,
Ce Liu,
Xingyuan Guo,
Yukai Shi,
Liang Lin
Abstract:
Although large scale models achieve significant improvements in performance, the overfitting challenge still frequently undermines their generalization ability. In super resolution tasks on images, diffusion models as representatives of generative models typically adopt large scale architectures. However, few-shot drone-captured infrared training data frequently induces severe overfitting in large…
▽ More
Although large scale models achieve significant improvements in performance, the overfitting challenge still frequently undermines their generalization ability. In super resolution tasks on images, diffusion models as representatives of generative models typically adopt large scale architectures. However, few-shot drone-captured infrared training data frequently induces severe overfitting in large-scale architectures. To address this key challenge, our method proposes a new Gaussian quantization representation learning method oriented to diffusion models that alleviates overfitting and enhances robustness. At the same time, an effective monitoring mechanism tracks large scale architectures during training to detect signs of overfitting. By introducing Gaussian quantization representation learning, our method effectively reduces overfitting while maintaining architecture complexity. On this basis, we construct a multi source drone-based infrared image benchmark dataset for detection and use it to emphasize overfitting issues of large scale architectures in few sample, drone-based diverse drone-based image reconstruction scenarios. To verify the efficacy of the method in mitigating overfitting, experiments are conducted on the constructed benchmark. Experimental results demonstrate that our method outperforms existing super resolution approaches and significantly mitigates overfitting of large scale architectures under complex conditions. The code and DroneSR dataset will be available at: https://github.com/wengzp1/GARLSR.
△ Less
Submitted 1 September, 2025;
originally announced September 2025.
-
Integrated photonic neuromorphic computing: device, architecture, chip, algorithm
Authors:
Shuiying Xiang,
Chengyang Yu,
Yizhi Wang,
Xintao Zeng,
Yuna Zhang,
Dianzhuang Zheng,
Xinran Niu,
Haowen Zhao,
Hanxu Zhou,
Yanan Han,
Xingxing Guo,
Yahui Zhang,
Yue Hao
Abstract:
Artificial intelligence (AI) has experienced explosive growth in recent years. The large models have been widely applied in various fields, including natural language processing, image generation, and complex decision-making systems, revolutionizing technological paradigms across multiple industries. Nevertheless, the substantial data processing demands during model training and inference result i…
▽ More
Artificial intelligence (AI) has experienced explosive growth in recent years. The large models have been widely applied in various fields, including natural language processing, image generation, and complex decision-making systems, revolutionizing technological paradigms across multiple industries. Nevertheless, the substantial data processing demands during model training and inference result in the computing power bottleneck. Traditional electronic chips based on the von Neumann architecture struggle to meet the growing demands for computing power and power efficiency amid the continuous development of AI. Photonic neuromorphic computing, an emerging solution in the post-Moore era, exhibits significant development potential. Leveraging the high-speed and large-bandwidth characteristics of photons in signal transmission, as well as the low-power consumption advantages of optical devices, photonic integrated computing chips have the potential to overcome the memory wall and power wall issues of electronic chips. In recent years, remarkable advancements have been made in photonic neuromorphic computing. This article presents a systematic review of the latest research achievements. It focuses on fundamental principles and novel neuromorphic photonic devices, such as photonic neurons and photonic synapses. Additionally, it comprehensively summarizes the network architectures and photonic integrated neuromorphic chips, as well as the optimization algorithms of photonic neural networks. In addition, combining with the current status and challenges of this field, this article conducts an in-depth discussion on the future development trends of photonic neuromorphic computing in the directions of device integration, algorithm collaborative optimization, and application scenario expansion, providing a reference for subsequent research in the field of photonic neuromorphic computing.
△ Less
Submitted 1 September, 2025;
originally announced September 2025.
-
The Most Luminous Known Fast Blue Optical Transient AT 2024wpp: Unprecedented Evolution and Properties in the Ultraviolet to the Near-Infrared
Authors:
Natalie LeBaron,
Raffaella Margutti,
Ryan Chornock,
A. J. Nayana,
Olivia Aspegren,
Wenbin Lu,
Brian Metzger,
Daniel Kasen,
Thomas Brink,
Sergio Campana,
Paolo D'Avanzo,
Jakob Faber,
Matteo Ferro,
Alex Filippenko,
Ryan Foley,
Xinze Guo,
Erica Hammerstein,
Saurabh Jha,
Charles Kilpatrick,
Giulia Migliori,
Dan Milisavljevic,
Kishore Patra,
Huei Sears,
Jonathan Swift,
Samaporn Tinyanont
, et al. (23 additional authors not shown)
Abstract:
We present an extensive photometric and spectroscopic ultraviolet-optical-infrared campaign on the luminous fast blue optical transient (LFBOT) AT 2024wpp over the first ~100 d. AT 2024wpp is the most luminous LFBOT discovered to date, with $L_{\rm{pk}}\approx(2-4)\times10^{45}$ erg s$^{-1}$ (5-10 times that of the prototypical AT 2018cow). This extreme luminosity enabled the acquisition of the mo…
▽ More
We present an extensive photometric and spectroscopic ultraviolet-optical-infrared campaign on the luminous fast blue optical transient (LFBOT) AT 2024wpp over the first ~100 d. AT 2024wpp is the most luminous LFBOT discovered to date, with $L_{\rm{pk}}\approx(2-4)\times10^{45}$ erg s$^{-1}$ (5-10 times that of the prototypical AT 2018cow). This extreme luminosity enabled the acquisition of the most detailed LFBOT UV light curve thus far. In the first ~45 d, AT 2024wpp radiated $>10^{51}$ erg, surpassing AT 2018cow by an order of magnitude and requiring a power source beyond the radioactive $^{56}$Ni decay of traditional supernovae. Like AT 2018cow, the UV-optical spectrum of AT 2024wpp is dominated by a persistently blue thermal continuum throughout our monitoring, with blackbody parameters at peak of T>30,000 K and $R_{\rm{BB}}/t\approx0.2-0.3c$. A temperature of $\gtrsim$20,000 K is maintained thereafter without evidence for cooling. We interpret the featureless spectra as a consequence of continuous energy injection from a central source of high-energy emission which maintains high ejecta ionization. After 35 d, faint (equivalent width <10 Å) H and He spectral features with kinematically separate velocity components centered at 0 km s$^{-1}$ and -6400 km s$^{-1}$ emerge, implying spherical symmetry deviations. A near-infrared excess of emission above the optical blackbody emerges between 20-30 d with a power-law spectrum $F_{\rmν,NIR}\proptoν^{-0.3}$ at 30 d. We interpret this distinct emission component as either reprocessing of early UV emission in a dust echo or free-free emission in an extended medium above the optical photosphere. LFBOT asphericity and multiple outflow components (including mildly relativistic ejecta) together with the large radiated energy are naturally realized by super-Eddington accretion disks around neutron stars or black holes and their outflows.
△ Less
Submitted 31 August, 2025;
originally announced September 2025.
-
Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction
Authors:
Xu Guo
Abstract:
Recent advancements in large language models (LLMs) often rely on generating intermediate reasoning steps to enhance accuracy. However, little work has examined how reasoning utility contributes to the final answer's correctness. Due to the stochastic nature of autoregressive generation, generating more context does not guarantee increased confidence in the answer. If we could predict, during gene…
▽ More
Recent advancements in large language models (LLMs) often rely on generating intermediate reasoning steps to enhance accuracy. However, little work has examined how reasoning utility contributes to the final answer's correctness. Due to the stochastic nature of autoregressive generation, generating more context does not guarantee increased confidence in the answer. If we could predict, during generation, whether a reasoning step will be useful, we could stop early or prune ineffective steps, avoiding distractions in the final decision.
We present an oracle study on MATH dataset, using Qwen2.5-32B and GPT-4o to generate reasoning chains, and then employing a separate model (Qwen3-8B) to quantify the utility of these chains for final accuracy. Specifically, we measure the model's uncertainty on the answer span Y at each reasoning step using conditional entropy (expected negative log-likelihood over the vocabulary) with context expanding step by step. Our results show a clear pattern: conditional entropy that decreases over steps is strongly associated with correct answers, whereas flat or increasing entropy often results in wrong answers. We also corroborate that incorrect reasoning paths tend to be longer than correct ones, suggesting that longer reasoning does not necessarily yield better outcomes. These findings serve as a foundation to inspire future work on designing efficient reasoning pipelines that detect and avoid unproductive reasoning early.
△ Less
Submitted 27 August, 2025;
originally announced August 2025.
-
Study of the $χ_{cJ}\rightarrowΛ\barΛη^\prime$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (683 additional authors not shown)
Abstract:
Using a data sample of $(2.712\pm0.014)\times10^{9}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we investigate the decays $χ_{cJ} \rightarrow Λ\barΛ η^\prime$ for $J=0,~1,~2$ via the radiative transition $ψ(3686) \rightarrow γχ_{cJ}$. The decays $χ_{c0,2}\rightarrowΛ\barΛη^\prime$ are observed for the first time, with statistical significances of 6.7$\,σ$ and 6.4…
▽ More
Using a data sample of $(2.712\pm0.014)\times10^{9}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we investigate the decays $χ_{cJ} \rightarrow Λ\barΛ η^\prime$ for $J=0,~1,~2$ via the radiative transition $ψ(3686) \rightarrow γχ_{cJ}$. The decays $χ_{c0,2}\rightarrowΛ\barΛη^\prime$ are observed for the first time, with statistical significances of 6.7$\,σ$ and 6.4$\,σ$, respectively. Evidence for the decay $χ_{c1}\rightarrowΛ\barΛη^\prime$ is found with a statistical significance of 3.3$\,σ$. The corresponding branching fractions are measured to be $\mathscr{B}(χ_{c0}\rightarrowΛ\barΛη^\prime)=(7.56\pm1.42\pm0.90)\times10^{-5}$, $\mathscr{B}(χ_{c1}\rightarrowΛ\barΛη^\prime)=(1.54\pm0.51\pm0.16)\times10^{-5}$, and $\mathscr{B}(χ_{c2}\rightarrowΛ\barΛη^\prime)=(3.03\pm0.61\pm0.29)\times10^{-5}$, where the first uncertainties are statistical and the second systematic. No significant excited $Λ$ baryon states or $Λ\barΛ$ near-threshold enhancements are observed.
△ Less
Submitted 26 August, 2025;
originally announced August 2025.
-
Atomistic Structure of Transient Switching States in Ferroelectric AlScN
Authors:
Jiawei Huang,
Jinyang Li,
Xinyue Guo,
Tongqi Wen,
David J. Srolovitz,
Zhen Chen,
Zuhuang Chen,
Shi Liu
Abstract:
We resolve the microscopic mechanism of polarization switching in wurtzite ferroelectric AlScN by integrating advanced thin-film fabrication, ferroelectric switching dynamics characterizations, high-resolution scanning transmission electron microscopy (STEM), and large-scale molecular dynamics simulations enabled by a deep neural network-based interatomic potential. Contrary to earlier interpretat…
▽ More
We resolve the microscopic mechanism of polarization switching in wurtzite ferroelectric AlScN by integrating advanced thin-film fabrication, ferroelectric switching dynamics characterizations, high-resolution scanning transmission electron microscopy (STEM), and large-scale molecular dynamics simulations enabled by a deep neural network-based interatomic potential. Contrary to earlier interpretations proposing a transient nonpolar intermediate phase, we demonstrate that the broad transitional regions previously observed in STEM images are projection artifacts resulting from the intrinsic three-dimensional zigzag morphology of 180$^\circ$ domain walls, which are a characteristic form of inversion domain boundary. This is further confirmed by STEM imaging of strategically prepared, partially switched Al$_{0.75}$Sc$_{0.25}$N thin films. Our simulations reveal that switching proceeds through collective, column-by-column atomic displacements, directly explaining the emergence of zigzag-shaped domain walls, and is consistent with the nucleation-limited switching behavior observed in experimental switching dynamic measurements. Furthermore, we show that increasing Sc content systematically lowers domain wall energy and associated nucleation barrier, thereby reducing the switching field in agreement with experimental trends. These findings establish a direct connection between local domain wall structure, switching kinetics, and macroscopic ferroelectric behavior.
△ Less
Submitted 25 August, 2025;
originally announced August 2025.
-
Quadratic curvature corrections to 5-dimensional Kerr-AdS black hole thermodynamics by background subtraction method
Authors:
Gerui Chen,
Xiyao Guo,
Xin Lan,
Hongbao Zhang,
Wei Zhang
Abstract:
We justify the applicability of the background subtraction method to both Einstein's gravity and its higher derivative corrections in 5-dimensional asymptotically AdS spacetimes, where the corresponding higher derivative corrections to the expression for the ADM mass and angular momentum are also worked out. Then we further apply the background subtraction method to calculate the first order corre…
▽ More
We justify the applicability of the background subtraction method to both Einstein's gravity and its higher derivative corrections in 5-dimensional asymptotically AdS spacetimes, where the corresponding higher derivative corrections to the expression for the ADM mass and angular momentum are also worked out. Then we further apply the background subtraction method to calculate the first order corrected Gibbs free energy by the quadratic curvature terms for the 5-dimensional Kerr-AdS black hole, which is in exact agreement with the previous result obtained by the holographic renormalization method. Such an agreement in turn substantiates the applicability of the background subtraction method.
△ Less
Submitted 26 August, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization
Authors:
Junyuan Deng,
Heng Li,
Tao Xie,
Weiqiang Ren,
Qian Zhang,
Ping Tan,
Xiaoyang Guo
Abstract:
Scene regression methods, such as VGGT, solve the Structure-from-Motion (SfM) problem by directly regressing camera poses and 3D scene structures from input images. They demonstrate impressive performance in handling images under extreme viewpoint changes. However, these methods struggle to handle a large number of input images. To address this problem, we introduce SAIL-Recon, a feed-forward Tran…
▽ More
Scene regression methods, such as VGGT, solve the Structure-from-Motion (SfM) problem by directly regressing camera poses and 3D scene structures from input images. They demonstrate impressive performance in handling images under extreme viewpoint changes. However, these methods struggle to handle a large number of input images. To address this problem, we introduce SAIL-Recon, a feed-forward Transformer for large scale SfM, by augmenting the scene regression network with visual localization capabilities. Specifically, our method first computes a neural scene representation from a subset of anchor images. The regression network is then fine-tuned to reconstruct all input images conditioned on this neural scene representation. Comprehensive experiments show that our method not only scales efficiently to large-scale scenes, but also achieves state-of-the-art results on both camera pose estimation and novel view synthesis benchmarks, including TUM-RGBD, CO3Dv2, and Tanks & Temples. We will publish our model and code. Code and models are publicly available at: https://hkust-sail.github.io/ sail-recon/.
△ Less
Submitted 25 August, 2025;
originally announced August 2025.
-
Search for CP violation in e+e- -> psi(3770) -> DDbar via D -> KsPi0
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (707 additional authors not shown)
Abstract:
Utilizing data sample of electron-positron collisions recorded with the BESIII detector at the center-of-mass energies of 3.773~GeV, corresponding to an integrated luminosity of 20.28~fb$^{-1}$, we report the first search for the CP forbidden process $e^+e^- \to ψ(3773) \to D^0\bar{D}^0 \to (K^0_Sπ^0)(K^0_Sπ^0)$. No significant signal is observed. We set the upper limit on the observed cross secti…
▽ More
Utilizing data sample of electron-positron collisions recorded with the BESIII detector at the center-of-mass energies of 3.773~GeV, corresponding to an integrated luminosity of 20.28~fb$^{-1}$, we report the first search for the CP forbidden process $e^+e^- \to ψ(3773) \to D^0\bar{D}^0 \to (K^0_Sπ^0)(K^0_Sπ^0)$. No significant signal is observed. We set the upper limit on the observed cross section to be 7.37~fb, and the upper limit on the joint branching fraction of the C-odd correlated neutral $D$ pair $\mathcal{B}[(D^0\bar{D}^0)_{\text{C-odd}} \to (K^0_Sπ^0)(K^0_Sπ^0)]$ to be $2.04 \times 10^{-6}$ at the 90\% confidence level.
△ Less
Submitted 26 August, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
Authors:
Zizhuo Fu,
Xiaotian Guo,
Wenxuan Zeng,
Shuzhang Zhong,
Yadong Zhang,
Peiyu Chen,
Runsheng Wang,
Le Ye,
Meng Li
Abstract:
Large language models (LLMs) have demonstrated remarkable proficiency in a wide range of natural language processing applications. However, the high energy and latency overhead induced by the KV cache limits the edge deployment, especially for long contexts. Emerging hybrid bonding (HB) technology has been proposed as a promising alternative to conventional near-memory processing (NMP) architectur…
▽ More
Large language models (LLMs) have demonstrated remarkable proficiency in a wide range of natural language processing applications. However, the high energy and latency overhead induced by the KV cache limits the edge deployment, especially for long contexts. Emerging hybrid bonding (HB) technology has been proposed as a promising alternative to conventional near-memory processing (NMP) architectures, offering improved bandwidth efficiency and lower power consumption while exhibiting characteristics of distributed memory. In this paper, we propose H2EAL, a hybrid bonding-based accelerator with sparse attention algorithm-hardware co-design for efficient LLM inference at the edge. At the algorithm level, we propose a hybrid sparse attention scheme with static and dynamic sparsity for different heads to fully leverage the sparsity with high accuracy. At the hardware level, we co-design the hardware to support hybrid sparse attention and propose memory-compute co-placement to address the distributed memory bottleneck. Since different attention heads exhibit different sparse patterns and the attention structure often mismatches the HB architecture, we further develop a load-balancing scheduler with parallel tiled attention to address workload imbalance and optimize the mapping strategy. Extensive experiments demonstrate H2EAL achieves 5.20~48.21x speedup and 6.22~73.48x energy efficiency improvement over baseline HB implementation, with a negligible average accuracy drop of 0.87% on multiple benchmarks.
△ Less
Submitted 19 August, 2025;
originally announced August 2025.
-
Intern-S1: A Scientific Multimodal Foundation Model
Authors:
Lei Bai,
Zhongrui Cai,
Yuhang Cao,
Maosong Cao,
Weihan Cao,
Chiyu Chen,
Haojiong Chen,
Kai Chen,
Pengcheng Chen,
Ying Chen,
Yongkang Chen,
Yu Cheng,
Pei Chu,
Tao Chu,
Erfei Cui,
Ganqu Cui,
Long Cui,
Ziyun Cui,
Nianchen Deng,
Ning Ding,
Nanqing Dong,
Peijie Dong,
Shihan Dou,
Sinan Du,
Haodong Duan
, et al. (152 additional authors not shown)
Abstract:
In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared…
▽ More
In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to those in popular areas, far from sufficient for transforming scientific research and leaving substantial gap between open-source models and closed-source models in these scientific domains. To mitigate this gap and explore a step further toward Artificial General Intelligence (AGI), we introduce Intern-S1, a specialized generalist equipped with general understanding and reasoning capabilities with expertise to analyze multiple science modal data. Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the post-training stage, Intern-S1 undergoes offline and then online reinforcement learning (RL) in InternBootCamp, where we propose Mixture-of-Rewards (MoR) to synergize the RL training on more than 1000 tasks simultaneously. Through integrated innovations in algorithms, data, and training systems, Intern-S1 achieved top-tier performance in online RL training. On comprehensive evaluation benchmarks, Intern-S1 demonstrates competitive performance on general reasoning tasks among open-source models and significantly outperforms open-source models in scientific domains, surpassing closed-source state-of-the-art models in professional tasks, such as molecular synthesis planning, reaction condition prediction, predicting thermodynamic stabilities for crystals. Our models are available at https://huggingface.co/internlm/Intern-S1.
△ Less
Submitted 24 August, 2025; v1 submitted 21 August, 2025;
originally announced August 2025.
-
DriveSplat: Decoupled Driving Scene Reconstruction with Geometry-enhanced Partitioned Neural Gaussians
Authors:
Cong Wang,
Xianda Guo,
Wenbo Xu,
Wei Tian,
Ruiqi Song,
Chenming Zhang,
Lingxi Li,
Long Chen
Abstract:
In the realm of driving scenarios, the presence of rapidly moving vehicles, pedestrians in motion, and large-scale static backgrounds poses significant challenges for 3D scene reconstruction. Recent methods based on 3D Gaussian Splatting address the motion blur problem by decoupling dynamic and static components within the scene. However, these decoupling strategies overlook background optimizatio…
▽ More
In the realm of driving scenarios, the presence of rapidly moving vehicles, pedestrians in motion, and large-scale static backgrounds poses significant challenges for 3D scene reconstruction. Recent methods based on 3D Gaussian Splatting address the motion blur problem by decoupling dynamic and static components within the scene. However, these decoupling strategies overlook background optimization with adequate geometry relationships and rely solely on fitting each training view by adding Gaussians. Therefore, these models exhibit limited robustness in rendering novel views and lack an accurate geometric representation. To address the above issues, we introduce DriveSplat, a high-quality reconstruction method for driving scenarios based on neural Gaussian representations with dynamic-static decoupling. To better accommodate the predominantly linear motion patterns of driving viewpoints, a region-wise voxel initialization scheme is employed, which partitions the scene into near, middle, and far regions to enhance close-range detail representation. Deformable neural Gaussians are introduced to model non-rigid dynamic actors, whose parameters are temporally adjusted by a learnable deformation network. The entire framework is further supervised by depth and normal priors from pre-trained models, improving the accuracy of geometric structures. Our method has been rigorously evaluated on the Waymo and KITTI datasets, demonstrating state-of-the-art performance in novel-view synthesis for driving scenarios.
△ Less
Submitted 21 September, 2025; v1 submitted 21 August, 2025;
originally announced August 2025.
-
Synchronization driven acoustics: The nonlinear scattering of a self-oscillating meta-atom
Authors:
Alexander K. Stoychev,
Xinxin Guo,
Ulrich Kuhl,
Nicolas Noiray
Abstract:
In this study we demonstrate a self-oscillating acoustic meta-atom functioning as an amplifying transistor, where a steady external flow serves as a control signal to switch between reflective (off-state) and transmissive (on-state) regimes. In the on-state, an acoustic limit cycle synchronizes with incident sound waves. This process governs the energy transfer across the device, with a transmissi…
▽ More
In this study we demonstrate a self-oscillating acoustic meta-atom functioning as an amplifying transistor, where a steady external flow serves as a control signal to switch between reflective (off-state) and transmissive (on-state) regimes. In the on-state, an acoustic limit cycle synchronizes with incident sound waves. This process governs the energy transfer across the device, with a transmission bandwidth dictated by the synchronization region in parameter space (Arnold tongue). Our experimental measurements reveal nonlinear dependence on the incident wave amplitude, enabling perturbation filtering therein and stabilizing downstream acoustic power. All experimentally observed phenomena are quantitatively described by a nonlinear Liénard-type oscillator featuring saturable gain and linear loss, where the essential parameters can be estimated by independent measurements. This work may offer a paradigm shift in acoustic metamaterials research by leveraging self-oscillation and synchronization processes. Bridging those key concepts from nonlinear dynamics and complex systems with active metamaterial design in acoustics and related disciplines, may establish a broadly applicable framework of field-independent mechanisms for wave manipulation.
△ Less
Submitted 22 August, 2025; v1 submitted 20 August, 2025;
originally announced August 2025.