-
Wave Energy Is Conserved in a Spatially Varying and Inhomogeneously Moving Medium
Authors:
Zhaohua Wu,
Jie Sun,
Zhe-Min Tan,
Ming Cai,
Yongyun Hu,
Norden E. Huang
Abstract:
Waves are propagating disturbances that redistribute energy across space. Previous studies have shown that for waves propagating through an inhomogeneously moving mean flow, the conserved quantity is wave action rather than wave energy, raising questions about the validity of energy conservation, which is one of the foundational principles of physics. In this study, we prove that wave action conse…
▽ More
Waves are propagating disturbances that redistribute energy across space. Previous studies have shown that for waves propagating through an inhomogeneously moving mean flow, the conserved quantity is wave action rather than wave energy, raising questions about the validity of energy conservation, which is one of the foundational principles of physics. In this study, we prove that wave action conservation is, in fact, an apparent form of wave energy conservation in spatially varying and inhomogeneously moving media, where waves undergo deformation during propagation. We further show that wave action conservation can be derived directly from the law of energy conservation. This result holds universally across all isolated wave systems in varying media, including hydrodynamic and non-hydrodynamic waves.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Measurements of branching fractions of $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (693 additional authors not shown)
Abstract:
Utilizing $7.9\,\rm fb^{-1}$ of $e^+e^-$ collision data taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, we report the measurements of absolute branching fractions of the hadronic decays $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$. The $D^0\to K^- 3π^+2π^-$ decay is measured with improved precision, while the latter two decays are observed w…
▽ More
Utilizing $7.9\,\rm fb^{-1}$ of $e^+e^-$ collision data taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, we report the measurements of absolute branching fractions of the hadronic decays $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$. The $D^0\to K^- 3π^+2π^-$ decay is measured with improved precision, while the latter two decays are observed with statistical significance higher than $5σ$ for the first time. The absolute branching fractions of these decays are determined to be ${\mathcal B}(D^0\to K^- 3π^+2π^-)=( 1.35\pm 0.23\pm 0.08 )\times 10^{-4}$, ${\mathcal B}(D^0\to K^- 2π^+π^-2π^0)=( 19.0\pm 1.1\pm 1.5)\times 10^{-4}$, and ${\mathcal B}(D^+\to K^- 3π^+π^-π^0)=( 6.57\pm 0.69\pm 0.33)\times 10^{-4}$, where the first uncertainties are statistical and the second systematic.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction
Authors:
Ning Wang,
Bingkun Yao,
Jie Zhou,
Yuchen Hu,
Xi Wang,
Nan Guan,
Zhe Jiang
Abstract:
Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an…
▽ More
Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an embedding-based technique to accurately retrieve internal information, followed by bug-fixing. VeriDebug unifies Verilog bug detection and correction through a shared parameter space. By simultaneously learning bug patterns and fixes, it streamlines debugging via contrastive embedding and guided correction. Empirical results show the efficacy of VeriDebug in enhancing Verilog debugging. Our VeriDebugLoc, Type model achieves 64.7 accuracy in bug fixing (Acc1), a significant improvement from the existing open-source SOTAs 11.3. This performance not only outperforms open-source alternatives but also exceeds larger closed-source models like GPT-3.5-turbo (36.6), offering a more accurate alternative to conventional debugging methods.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Search for $η_{1}(1855)$ in $χ_{cJ}\toηηη^{\prime}$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (697 additional authors not shown)
Abstract:
Based on a sample of $2.7\times10^{9}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, an analysis of the decay $ψ(3686)\toγχ_{cJ}, χ_{cJ}\toηηη^{\prime}$ is performed. The decay modes $χ_{c1}$ and $χ_{c2}\toηηη^{\prime}$ are observed for the first time, and their corresponding branching fractions are determined to be…
▽ More
Based on a sample of $2.7\times10^{9}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, an analysis of the decay $ψ(3686)\toγχ_{cJ}, χ_{cJ}\toηηη^{\prime}$ is performed. The decay modes $χ_{c1}$ and $χ_{c2}\toηηη^{\prime}$ are observed for the first time, and their corresponding branching fractions are determined to be $\mathcal{B}(χ_{c1}\toηηη^{\prime}) = (1.39 \pm 0.13(\text{stat.}) \pm 0.09(\text{sys.})) \times 10^{-4}$ and $\mathcal{B}(χ_{c2}\toηηη^{\prime}) = (4.42 \pm 0.86(\text{stat.}) \pm 0.37(\text{sys.})) \times 10^{-5}$. An upper limit on the branching fraction of $χ_{c0}\toηηη^{\prime}$ is set as $2.64 \times 10^{-5}$ at 90\% confidence level (CL). A partial wave analysis (PWA) of the decay $χ_{c1}\toηηη^{\prime}$ is performed to search for the $1^{-+}$ exotic state $η_1(1855)$. The PWA result indicates that the structure in the $ηη^{\prime}$ mass spectrum is mainly attributed to the $f_0(1500)$, while in the $ηη$ mass spectrum, it is primarily the $0^{++}$ phase space. The upper limit of $\mathcal{B}(χ_{c1}\toη_{1}(1855)η) \cdot \mathcal{B}(η_{1}(1855)\toηη^{\prime})< 9.79 \times 10^{-5}$ is set based on the PWA at 90\% CL.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Critical Non-Hermitian Edge Modes
Authors:
Kunling Zhou,
Zihe Yang,
Bowen Zeng,
Yong Hu
Abstract:
We unveil a unique critical phenomenon of topological edge modes in non-Hermitian systems, dubbed the critical non-Hermitian edge modes (CNHEM). Specifically, in the thermodynamic limit, the eigenvectors of edge modes jump discontinuously under infinitesimal on-site staggered perturbations. The CNHEM arises from the competition between the introduced on-site staggered potentials and size-dependent…
▽ More
We unveil a unique critical phenomenon of topological edge modes in non-Hermitian systems, dubbed the critical non-Hermitian edge modes (CNHEM). Specifically, in the thermodynamic limit, the eigenvectors of edge modes jump discontinuously under infinitesimal on-site staggered perturbations. The CNHEM arises from the competition between the introduced on-site staggered potentials and size-dependent non-reciprocal coupling between edge modes, and are closely connected to the exceptional point (EP). As the system size increases, the coupling between edge modes decreases while the non-reciprocity is enhanced, causing the
eigenvectors to gradually collapse toward the EP. However, when the on-site potentials dominate, this weakened coupling assists the eigenvectors to stay away from the EP. Such a critical phenomenon is absent in Hermitian systems, where the coupling between edge modes is reciprocal.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Authors:
Shivam Duggal,
Yushi Hu,
Oscar Michel,
Aniruddha Kembhavi,
William T. Freeman,
Noah A. Smith,
Ranjay Krishna,
Antonio Torralba,
Ali Farhadi,
Wei-Chiu Ma
Abstract:
Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often ov…
▽ More
Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Authors:
Xu Ma,
Peize Sun,
Haoyu Ma,
Hao Tang,
Chih-Yao Ma,
Jialiang Wang,
Kunpeng Li,
Xiaoliang Dai,
Yujun Shi,
Xuan Ju,
Yushi Hu,
Artsiom Sanakoyeu,
Felix Juefei-Xu,
Ji Hou,
Junjiao Tian,
Tao Xu,
Tingbo Hou,
Yen-Cheng Liu,
Zecheng He,
Zijian He,
Matt Feiszli,
Peizhao Zhang,
Peter Vajda,
Sam Tsai,
Yun Fu
Abstract:
Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a n…
▽ More
Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a novel yet simple method that reduces the number of image tokens in Transformer. Our key insight is the dimensional redundancy of visual vocabularies in Multimodal Large Language Models (MLLMs), where low-dimensional visual codes from visual encoder are directly mapped to high-dimensional language vocabularies. Leveraging this, we consider two key operations: token-shuffle, which merges spatially local tokens along channel dimension to decrease the input token number, and token-unshuffle, which untangles the inferred tokens after Transformer blocks to restore the spatial arrangement for output. Jointly training with textual prompts, our strategy requires no additional pretrained text-encoder and enables MLLMs to support extremely high-resolution image synthesis in a unified next-token prediction way while maintaining efficient training and inference. For the first time, we push the boundary of AR text-to-image generation to a resolution of 2048x2048 with gratifying generation performance. In GenAI-benchmark, our 2.7B model achieves 0.77 overall score on hard prompts, outperforming AR models LlamaGen by 0.18 and diffusion models LDM by 0.15. Exhaustive large-scale human evaluations also demonstrate our prominent image generation ability in terms of text-alignment, visual flaw, and visual appearance. We hope that Token-Shuffle can serve as a foundational design for efficient high-resolution image generation within MLLMs.
△ Less
Submitted 27 April, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
LUIDA: Large-scale Unified Infrastructure for Digital Assessments based on Commercial Metaverse Platform
Authors:
Yong-Hao Hu,
Sotaro Yokoi,
Yuji Hatada,
Yuichi Hiroi,
Takuji Narumi,
Takefumi Hiraki
Abstract:
Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (…
▽ More
Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (Large-scale Unified Infrastructure for Digital Assessments), a metaverse-based framework that integrates these fragmented processes. LUIDA automatically allocates interconnected virtual environments for parallel experiment execution and provides implementation templates adaptable to various VR research domains, requiring minimal metaverse development expertise. Our evaluation included two studies using a prototype built on Cluster, the commercial metaverse platform. First, VR researchers using LUIDA to develop and run experiments reported high usability scores (SUS: 73.75) and moderate workload (NASA-TLX: 24.11) for overall usage, with interviews confirming streamlined workflows compared to traditional laboratory experiments. Second, we conducted three replicated experiments with public Cluster users, each recruiting approximately 200 participants within one week. These experiments produced results that closely matched the original studies, validating the experimental integrity of LUIDA across research domains. After technical refinements, we plan to release LUIDA as an open platform, providing a standardized protocol to improve research efficiency and experimental reproducibility in VR studies.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Relic gravitational waves from cosmological horizon radiation during de Sitter period: as zero-order approximation of inflation
Authors:
Chen-Hao Wu,
Xiao Liang,
Ya-Peng Hu
Abstract:
It is well known that the event horizon of the de Sitter universe can produce particles, and one can get sizable Hawking radiation by considering inflationary phases as de Sitter spacetimes with large Hubble rates. In this compact paper, we consider the graviton emission part of these radiations and assume that these graviton signals can exist in the current universe in the form of gravitational w…
▽ More
It is well known that the event horizon of the de Sitter universe can produce particles, and one can get sizable Hawking radiation by considering inflationary phases as de Sitter spacetimes with large Hubble rates. In this compact paper, we consider the graviton emission part of these radiations and assume that these graviton signals can exist in the current universe in the form of gravitational waves. We predict an energy density parameter of $\log_{10}(Ω_{\rm GW} h^2) \sim \mathscr{O}(-25) - \mathscr{O}(-30)$ and its associated peak frequency $\log_{10}(f_{\rm peak}^0) \sim \mathscr{O}(6)-\mathscr{O}(5)$, depending on the reheating temperature. These signals occupy a frequency band below the ultrahigh-frequency regime and possess a detectable energy density, offering a promising target for future gravitational wave observatories. We believe that the detection of such signals would provide a compelling test of Hawking's radiation theory in a cosmological context.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
An extremely soft and weak fast X-ray transient associated with a luminous supernova
Authors:
W. -X. Li,
Z. -P. Zhu,
X. -Z. Zou,
J. -J. Geng,
L. -D. Liu,
Y. -H. Wang,
R. -Z. Li,
D. Xu,
H. Sun,
X. -F. Wang,
Y. -W. Yu,
B. Zhang,
X. -F. Wu,
Y. Yang,
A. V. Filippenko,
X. -W. Liu,
W. -M. Yuan,
D. Aguado,
J. An,
T. An,
D. A. H. Buckley,
A. J. Castro-Tirado,
S. -Y. Fu,
J. P. U. Fynbo,
D. A. Howell
, et al. (80 additional authors not shown)
Abstract:
Long gamma-ray bursts (LGRBs), including their subclasses of low-luminosity GRBs (LL-GRBs) and X-ray flashes (XRFs) characterized by low spectral peak energies, are known to be associated with broad-lined Type Ic supernovae (SNe Ic-BL), which result from the core collapse of massive stars that lose their outer hydrogen and helium envelopes. However, the soft and weak end of the GRB/XRF population…
▽ More
Long gamma-ray bursts (LGRBs), including their subclasses of low-luminosity GRBs (LL-GRBs) and X-ray flashes (XRFs) characterized by low spectral peak energies, are known to be associated with broad-lined Type Ic supernovae (SNe Ic-BL), which result from the core collapse of massive stars that lose their outer hydrogen and helium envelopes. However, the soft and weak end of the GRB/XRF population remains largely unexplored, due to the limited sensitivity to soft X-ray emission. Here we report the discovery of a fast X-ray transient, EP250108a, detected by the Einstein Probe (EP) in the soft X-ray band at redshift $z = 0.176$, which was followed up by extensive multiband observations. EP250108a shares similar X-ray luminosity as XRF\,060218, the prototype of XRFs, but it extends GRBs/XRFs down to the unprecedentedly soft and weak regimes, with its $E_{\rm peak} \lesssim 1.8\,\mathrm{keV}$ and $E_{\rm iso} \lesssim 10^{49}\, \mathrm{erg}$, respectively. Meanwhile, EP250108a is found to be associated with SN\,2025kg, one of the most luminous and possibly magnetar-powered SNe Ic-BL detected so far. Modeling of the well-sampled optical light curves favors a mildly relativistic outflow as the origin of this event. This discovery demonstrates that EP, with its unique capability, is opening a new observational window into the diverse outcomes of death of massive stars.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
MLOps Monitoring at Scale for Digital Platforms
Authors:
Yu Jeffrey Hu,
Jeroen Rombouts,
Ines Wilms
Abstract:
Machine learning models are widely recognized for their strong performance in forecasting. To keep that performance in streaming data settings, they have to be monitored and frequently re-trained. This can be done with machine learning operations (MLOps) techniques under supervision of an MLOps engineer. However, in digital platform settings where the number of data streams is typically large and…
▽ More
Machine learning models are widely recognized for their strong performance in forecasting. To keep that performance in streaming data settings, they have to be monitored and frequently re-trained. This can be done with machine learning operations (MLOps) techniques under supervision of an MLOps engineer. However, in digital platform settings where the number of data streams is typically large and unstable, standard monitoring becomes either suboptimal or too labor intensive for the MLOps engineer. As a consequence, companies often fall back on very simple worse performing ML models without monitoring. We solve this problem by adopting a design science approach and introducing a new monitoring framework, the Machine Learning Monitoring Agent (MLMA), that is designed to work at scale for any ML model with reasonable labor cost. A key feature of our framework concerns test-based automated re-training based on a data-adaptive reference loss batch. The MLOps engineer is kept in the loop via key metrics and also acts, pro-actively or retrospectively, to maintain performance of the ML model in the production stage. We conduct a large-scale test at a last-mile delivery platform to empirically validate our monitoring framework.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
Authors:
Xiao Zhang,
Yaoyao Ding,
Yang Hu,
Gennady Pekhimenko
Abstract:
Deep learning (DL) workloads mainly run on accelerators like GPUs. Recent DL quantization techniques demand a new matrix multiplication operator with mixed input data types, further complicating GPU optimization. Prior high-level compilers like Triton lack the expressiveness to implement key optimizations like fine-grained data pipelines and hardware-friendly memory layouts for these operators, wh…
▽ More
Deep learning (DL) workloads mainly run on accelerators like GPUs. Recent DL quantization techniques demand a new matrix multiplication operator with mixed input data types, further complicating GPU optimization. Prior high-level compilers like Triton lack the expressiveness to implement key optimizations like fine-grained data pipelines and hardware-friendly memory layouts for these operators, while low-level programming models, such as Hidet, Graphene, and CUTLASS, require significant programming efforts. To balance expressiveness with engineering effort, we propose Hexcute, a tile-based programming language that exposes shared memory and register abstractions to enable fine-grained optimization for these operators. Additionally, Hexcute leverages task mapping to schedule the GPU program, and to reduce programming efforts, it automates layout and task mapping synthesis with a novel type-inference-based algorithm. Our evaluation shows that Hexcute generalizes to a wide range of DL operators, achieves 1.7-11.28$\times$ speedup over existing DL compilers for mixed-type operators, and brings up to 2.91$\times$ speedup in the end-to-end evaluation.
△ Less
Submitted 30 April, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Authors:
Shi Qiu,
Shaoyang Guo,
Zhuo-Yang Song,
Yunbo Sun,
Zeyu Cai,
Jiashen Wei,
Tianyu Luo,
Yixuan Yin,
Haoxu Zhang,
Yi Hu,
Chenyang Wang,
Chencheng Tang,
Haoling Chang,
Qi Liu,
Ziheng Zhou,
Tianyu Zhang,
Jingtian Zhang,
Zhangyi Liu,
Minghao Li,
Yuku Zhang,
Boxuan Jing,
Xianqi Yin,
Yutong Ren,
Zizhuo Fu,
Weike Wang
, et al. (27 additional authors not shown)
Abstract:
We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, t…
▽ More
We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, thermodynamics, optics, modern physics, and advanced physics, the benchmark spans difficulty levels from high school exercises to undergraduate problems and Physics Olympiad challenges. Additionally, we propose the Expression Edit Distance (EED) Score, a novel evaluation metric based on the edit distance between mathematical expressions, which effectively captures differences in model reasoning processes and results beyond traditional binary scoring methods. We evaluate various LLMs on PHYBench and compare their performance with human experts. Our results reveal that even state-of-the-art reasoning models significantly lag behind human experts, highlighting their limitations and the need for improvement in complex physical reasoning scenarios. Our benchmark results and dataset are publicly available at https://phybench-official.github.io/phybench-demo/.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
High-performance training and inference for deep equivariant interatomic potentials
Authors:
Chuin Wei Tan,
Marc L. Descoteaux,
Mit Kotak,
Gabriel de Miranda Nascimento,
Seán R. Kavanagh,
Laura Zichi,
Menghang Wang,
Aadit Saluja,
Yizhong R. Hu,
Tess Smidt,
Anders Johansson,
William C. Witt,
Boris Kozinsky,
Albert Musaelian
Abstract:
Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presen…
▽ More
Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Universal Approximation with Softmax Attention
Authors:
Jerry Yao-Chieh Hu,
Hude Liu,
Hong-Yu Chen,
Weimin Wu,
Han Liu
Abstract:
We prove that with linear transformations, both (i) two-layer self-attention and (ii) one-layer self-attention followed by a softmax function are universal approximators for continuous sequence-to-sequence functions on compact domains. Our main technique is a new interpolation-based method for analyzing attention's internal mechanism. This leads to our key insight: self-attention is able to approx…
▽ More
We prove that with linear transformations, both (i) two-layer self-attention and (ii) one-layer self-attention followed by a softmax function are universal approximators for continuous sequence-to-sequence functions on compact domains. Our main technique is a new interpolation-based method for analyzing attention's internal mechanism. This leads to our key insight: self-attention is able to approximate a generalized version of ReLU to arbitrary precision, and hence subsumes many known universal approximators. Building on these, we show that two-layer multi-head attention alone suffices as a sequence-to-sequence universal approximator. In contrast, prior works rely on feed-forward networks to establish universal approximation in Transformers. Furthermore, we extend our techniques to show that, (softmax-)attention-only layers are capable of approximating various statistical models in-context. We believe these techniques hold independent interest.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback
Authors:
Ning Wang,
Bingkun Yao,
Jie Zhou,
Yuchen Hu,
Xi Wang,
Nan Guan,
Zhe Jiang
Abstract:
Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of har…
▽ More
Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of hardware design: functional correctness. The main obstacle in using LLMs for Verilog code generation is the lack of sufficient functional verification data, particularly testbenches paired with design specifications and code. To address this problem, we introduce an automatic testbench generation pipeline that decomposes the process and uses feedback from the Verilog compiler simulator (VCS) to reduce hallucination and ensure correctness. We then use the testbench to evaluate the generated codes and collect them for further training, where verification insights are introduced. Our method applies reinforcement learning (RL), specifically direct preference optimization (DPO), to align Verilog code generation with functional correctness by training preference pairs based on testbench outcomes. In evaluations on VerilogEval-Machine, VerilogEval-Human, RTLLM v1.1, RTLLM v2, and VerilogEval v2, our approach consistently outperforms state-of-the-art baselines in generating functionally correct Verilog code. We open source all training code, data, and models at https://anonymous.4open.science/r/VeriPrefer-E88B.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Affine isoperimetric type inequalities for static convex domains in hyperbolic space
Authors:
Yingxiang Hu,
Haizhong Li,
Yao Wan,
Botong Xu
Abstract:
In this paper, the notion of hyperbolic ellipsoids in hyperbolic space is introduced. Using a natural orthogonal projection from hyperbolic space to Euclidean space, we establish affine isoperimetric type inequalities for static convex domains in hyperbolic space. Moreover, equality of such inequalities is characterized by these hyperbolic ellipsoids.
In this paper, the notion of hyperbolic ellipsoids in hyperbolic space is introduced. Using a natural orthogonal projection from hyperbolic space to Euclidean space, we establish affine isoperimetric type inequalities for static convex domains in hyperbolic space. Moreover, equality of such inequalities is characterized by these hyperbolic ellipsoids.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians
Authors:
Cailin Zhuang,
Yaoqi Hu,
Xuanyang Zhang,
Wei Cheng,
Jiacheng Bao,
Shengqi Liu,
Yiying Yang,
Xianfang Zeng,
Gang Yu,
Ming Li
Abstract:
3D Gaussian Splatting (3DGS) excels in photorealistic scene reconstruction but struggles with stylized scenarios (e.g., cartoons, games) due to fragmented textures, semantic misalignment, and limited adaptability to abstract aesthetics. We propose StyleMe3D, a holistic framework for 3D GS style transfer that integrates multi-modal style conditioning, multi-level semantic alignment, and perceptual…
▽ More
3D Gaussian Splatting (3DGS) excels in photorealistic scene reconstruction but struggles with stylized scenarios (e.g., cartoons, games) due to fragmented textures, semantic misalignment, and limited adaptability to abstract aesthetics. We propose StyleMe3D, a holistic framework for 3D GS style transfer that integrates multi-modal style conditioning, multi-level semantic alignment, and perceptual quality enhancement. Our key insights include: (1) optimizing only RGB attributes preserves geometric integrity during stylization; (2) disentangling low-, medium-, and high-level semantics is critical for coherent style transfer; (3) scalability across isolated objects and complex scenes is essential for practical deployment. StyleMe3D introduces four novel components: Dynamic Style Score Distillation (DSSD), leveraging Stable Diffusion's latent space for semantic alignment; Contrastive Style Descriptor (CSD) for localized, content-aware texture transfer; Simultaneously Optimized Scale (SOS) to decouple style details and structural coherence; and 3D Gaussian Quality Assessment (3DG-QA), a differentiable aesthetic prior trained on human-rated data to suppress artifacts and enhance visual harmony. Evaluated on NeRF synthetic dataset (objects) and tandt db (scenes) datasets, StyleMe3D outperforms state-of-the-art methods in preserving geometric details (e.g., carvings on sculptures) and ensuring stylistic consistency across scenes (e.g., coherent lighting in landscapes), while maintaining real-time rendering. This work bridges photorealistic 3D GS and artistic stylization, unlocking applications in gaming, virtual worlds, and digital art.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
A Review on the Applications of Density Functional Theory to the FQH System
Authors:
Yi Yang,
Yayun Hu,
Zi-Xiang Hu
Abstract:
The fractional quantum Hall (FQH) effect remains a captivating area in condensed matter physics, characterized by strongly correlated topological order, fractionalized excitations, and anyonic statistics. Numerical simulations, such as exact diagonalization, density matrix renormalization group, matrix product states, and Monte Carlo methods, are essential to examine the properties of strongly cor…
▽ More
The fractional quantum Hall (FQH) effect remains a captivating area in condensed matter physics, characterized by strongly correlated topological order, fractionalized excitations, and anyonic statistics. Numerical simulations, such as exact diagonalization, density matrix renormalization group, matrix product states, and Monte Carlo methods, are essential to examine the properties of strongly correlated systems. Recently, density functional theory (DFT) has been employed in this field within the framework of composite fermion (CF) theory. In this paper, we assess how DFT addresses major challenges in FQH system, such as computing ground state and low-energy excitations. We emphasize the critical insights provided by DFT-based methods into the CF model, edge effects, and the nature of fractional charge and magnetoroton excitations. Furthermore, we examine the advantages and limitations of DFT approaches, highlight the interplay between numerical simulations and theoretical models. We finally discuss the future potential of time-dependent DFT (TDDFT) for modeling non-equilibrium dynamics.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
The Schur complements for $SDD_{1}$ matrices and their application to linear complementarity problems
Authors:
Yang Hu,
Jianzhou Liu,
Wenlong Zeng
Abstract:
In this paper we propose a new scaling method to study the Schur complements of $SDD_{1}$ matrices. Its core is related to the non-negative property of the inverse $M$-matrix, while numerically improving the Quotient formula. Based on the Schur complement and a novel norm splitting manner, we establish an upper bound for the infinity norm of the inverse of $SDD_{1}$ matrices, which depends solely…
▽ More
In this paper we propose a new scaling method to study the Schur complements of $SDD_{1}$ matrices. Its core is related to the non-negative property of the inverse $M$-matrix, while numerically improving the Quotient formula. Based on the Schur complement and a novel norm splitting manner, we establish an upper bound for the infinity norm of the inverse of $SDD_{1}$ matrices, which depends solely on the original matrix entries. We apply the new bound to derive an error bound for linear complementarity problems of $B_{1}$-matrices. Additionally, new lower and upper bounds for the determinant of $SDD_{1}$ matrices are presented. Numerical experiments validate the effectiveness and superiority of our results.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences
Authors:
Jing Han,
Hanting Chen,
Kai Han,
Xiaomeng Huang,
Yongyun Hu,
Wenjun Xu,
Dacheng Tao,
Ping Zhang
Abstract:
With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporat…
▽ More
With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporate physical signals, such as temperature, pressure, and wind speed, to further enhance accuracy and interpretability. In this paper, we review several representative AI + Weather/Climate algorithms and propose a new paradigm where observational data from different perspectives, each with distinct physical meanings, are treated as multimodal data and integrated via transformers. Furthermore, key weather and climate knowledge can be incorporated through regularization techniques to further strengthen the model's capabilities. This new paradigm is versatile and can address a variety of tasks, offering strong generalizability. We also discuss future directions for improving model accuracy and interpretability.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Authors:
Zhongxi Qiu,
Zhang Zhang,
Yan Hu,
Heng Li,
Jiang Liu
Abstract:
This paper explores optimal data selection strategies for Reinforcement Learning with Verified Rewards (RLVR) training in the medical domain. While RLVR has shown exceptional potential for enhancing reasoning capabilities in large language models, most prior implementations have focused on mathematics and logical puzzles, with limited exploration of domain-specific applications like medicine. We i…
▽ More
This paper explores optimal data selection strategies for Reinforcement Learning with Verified Rewards (RLVR) training in the medical domain. While RLVR has shown exceptional potential for enhancing reasoning capabilities in large language models, most prior implementations have focused on mathematics and logical puzzles, with limited exploration of domain-specific applications like medicine. We investigate four distinct data sampling strategies from MedQA-USMLE: random sampling (baseline), and filtering using Phi-4, Gemma-3-27b-it, and Gemma-3-12b-it models. Using Gemma-3-12b-it as our base model and implementing Group Relative Policy Optimization (GRPO), we evaluate performance across multiple benchmarks including MMLU, GSM8K, MMLU-Pro, and CMMLU. Our findings demonstrate that models trained on filtered data generally outperform those trained on randomly selected samples. Notably, training on self-filtered samples (using Gemma-3-12b-it for filtering) achieved superior performance in medical domains but showed reduced robustness across different benchmarks, while filtering with larger models from the same series yielded better overall robustness. These results provide valuable insights into effective data organization strategies for RLVR in specialized domains and highlight the importance of thoughtful data selection in achieving optimal performance. You can access our repository (https://github.com/Qsingle/open-medical-r1) to get the codes.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Task Matters: Investigating Human Questioning Behavior in Different Household Service for Learning by Asking Robots
Authors:
Yuanda Hu,
Hou Jiani,
Zhang Junyu,
Yate Ge,
Xiaohua Sun,
Weiwei Guo
Abstract:
Learning by Asking (LBA) enables robots to identify knowledge gaps during task execution and acquire the missing information by asking targeted questions. However, different tasks often require different types of questions, and how to adapt questioning strategies accordingly remains underexplored. This paper investigates human questioning behavior in two representative household service tasks: a G…
▽ More
Learning by Asking (LBA) enables robots to identify knowledge gaps during task execution and acquire the missing information by asking targeted questions. However, different tasks often require different types of questions, and how to adapt questioning strategies accordingly remains underexplored. This paper investigates human questioning behavior in two representative household service tasks: a Goal-Oriented task (refrigerator organization) and a Process-Oriented task (cocktail mixing). Through a human-human study involving 28 participants, we analyze the questions asked using a structured framework that encodes each question along three dimensions: acquired knowledge, cognitive process, and question form. Our results reveal that participants adapt both question types and their temporal ordering based on task structure. Goal-Oriented tasks elicited early inquiries about user preferences, while Process-Oriented tasks led to ongoing, parallel questioning of procedural steps and preferences. These findings offer actionable insights for developing task-sensitive questioning strategies in LBA-enabled robots for more effective and personalized human-robot collaboration.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Search for $J/ψ\rightarrow K^{0}_{S}K^{0}_{S}$ and $ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using data samples of $(10087\pm 44)\times10^{6}$ $J/ψ$ events and $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we search for the CP violating decays $J/ψ\rightarrow K^{0}_{S}K^{0}_{S}$ and $ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}$. No significant signals are observed over the expected background yields. The upper limits on their branchin…
▽ More
Using data samples of $(10087\pm 44)\times10^{6}$ $J/ψ$ events and $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we search for the CP violating decays $J/ψ\rightarrow K^{0}_{S}K^{0}_{S}$ and $ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}$. No significant signals are observed over the expected background yields. The upper limits on their branching fractions are set as $\mathcal{B}(J/ψ\rightarrow K^{0}_{S}K^{0}_{S}) <4.7\times 10^{-9}$ and $\mathcal{B}(ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}) <1.1\times 10^{-8}$ at the 90% confidence level. These results improve the previous limits by a factor of three for $J/ψ\rightarrow K^{0}_{S} K^{0}_{S}$ and two orders of magnitude for $ψ(3686)\rightarrow K^{0}_{S} K^{0}_{S}$.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Realizing exceptional points by Floquet dissipative couplings in thermal atoms
Authors:
Zimo Zhang,
Fengbo Zhang,
Zhongxiao Xu,
Ying Hu,
Han Bao,
Heng Shen
Abstract:
Exceptional degeneracies and generically complex spectra of non-Hermitian systems are at the heart of numerous phenomena absent in the Hermitian realm. Recently, it was suggested that Floquet dissipative coupling in the space-time domain may provide a novel mechanism to drive intriguing spectral topology with no static analogues, though its experimental investigation in quantum systems remains elu…
▽ More
Exceptional degeneracies and generically complex spectra of non-Hermitian systems are at the heart of numerous phenomena absent in the Hermitian realm. Recently, it was suggested that Floquet dissipative coupling in the space-time domain may provide a novel mechanism to drive intriguing spectral topology with no static analogues, though its experimental investigation in quantum systems remains elusive. We demonstrate such Floquet dissipative coupling in an ensemble of thermal atoms interacting with two spatially separated optical beams, and observe an anomalous anti-parity-time symmetry phase transition at an exception point far from the phase-transition threshold of the static counterpart. Our protocol sets the stage for Floquet engineering of non-Hermitian topological spectra, and for engineering new quantum phases that cannot exist in static systems.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Search for $1^{-+}$ charmonium-like hybrid via $e^{+}e^{-}\rightarrow γη^{(\prime)} η_{c}$ at center-of-mass energies between 4.258 and 4.681 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (696 additional authors not shown)
Abstract:
Using $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 10.6 fb$^{-1}$ collected at center-of-mass energies between 4.258 and 4.681 GeV with the BESIII detector at the BEPCII collider, we search for the $1^{- +}$ charmonium-like hybrid via $e^{+}e^{-}\rightarrowγηη_{c}$ and $e^{+}e^{-}\rightarrowγη^{\prime}η_{c}$ decays for the first time. No significant signal is observed a…
▽ More
Using $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 10.6 fb$^{-1}$ collected at center-of-mass energies between 4.258 and 4.681 GeV with the BESIII detector at the BEPCII collider, we search for the $1^{- +}$ charmonium-like hybrid via $e^{+}e^{-}\rightarrowγηη_{c}$ and $e^{+}e^{-}\rightarrowγη^{\prime}η_{c}$ decays for the first time. No significant signal is observed and the upper limits on the Born cross sections for both processes are set at the 90% confidence level.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
A Model-Based Approach to Imitation Learning through Multi-Step Predictions
Authors:
Haldun Balim,
Yang Hu,
Yuyang Zhang,
Na Li
Abstract:
Imitation learning is a widely used approach for training agents to replicate expert behavior in complex decision-making tasks. However, existing methods often struggle with compounding errors and limited generalization, due to the inherent challenge of error correction and the distribution shift between training and deployment. In this paper, we present a novel model-based imitation learning fram…
▽ More
Imitation learning is a widely used approach for training agents to replicate expert behavior in complex decision-making tasks. However, existing methods often struggle with compounding errors and limited generalization, due to the inherent challenge of error correction and the distribution shift between training and deployment. In this paper, we present a novel model-based imitation learning framework inspired by model predictive control, which addresses these limitations by integrating predictive modeling through multi-step state predictions. Our method outperforms traditional behavior cloning numerical benchmarks, demonstrating superior robustness to distribution shift and measurement noise both in available data and during execution. Furthermore, we provide theoretical guarantees on the sample complexity and error bounds of our method, offering insights into its convergence properties.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Adaptive Modeling of Correlated Noise in Space-Based Gravitational Wave Detectors
Authors:
Ya-Nan Li,
Yi-Ming Hu,
En-Kun Li
Abstract:
Accurately estimating the statistical properties of noise is important in space-based gravitational wave data analysis. Traditional methods often assume uncorrelated noise or impose restrictive parametric forms on cross-channel correlations, which could lead to biased estimation in complex instrumental noise. This paper introduces a spline-based framework with trans-dimensional Bayesian inference…
▽ More
Accurately estimating the statistical properties of noise is important in space-based gravitational wave data analysis. Traditional methods often assume uncorrelated noise or impose restrictive parametric forms on cross-channel correlations, which could lead to biased estimation in complex instrumental noise. This paper introduces a spline-based framework with trans-dimensional Bayesian inference to reconstruct the full noise covariance matrix, including frequency-dependent auto- and cross-power spectral densities, without prior assumptions on noise shapes. The developed software $\mathtt{NOISAR}$ can recover the features of the noise power spectrum curves with a relative error $\leq 10\%$ for both auto- and cross-one.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
A tight consecutive measurement theorem and its applications
Authors:
Chen-Xun Weng,
Minglong Qin,
Yanglin Hu,
Marco Tomamichel
Abstract:
In many cryptographic tasks, we encounter scenarios where information about two incompatible observables must be retrieved. A natural approach is to perform consecutive measurements, raising a key question: How does the information gained from the first measurement compare to that from both? The consecutive measurement theorem provides a general relation between these quantities and has been used…
▽ More
In many cryptographic tasks, we encounter scenarios where information about two incompatible observables must be retrieved. A natural approach is to perform consecutive measurements, raising a key question: How does the information gained from the first measurement compare to that from both? The consecutive measurement theorem provides a general relation between these quantities and has been used in quantum proofs of knowledge and nonlocal games. However, its previous formulations are often too loose to yield meaningful bounds, especially in quantum nonlocal games. Here, we establish a tight version of the theorem and apply it to improve the best-known bounds on the quantum value of $\text{CHSH}_q(p)$ games and their parallel repetition. We also present a novel application of the theorem to obtain a tighter trade-offs bound in quantum oblivious transfer for certain regimes. These results enhance the theoretical tools for analyzing quantum advantage and have concrete implications for nonlocal games and quantum cryptographic protocols.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
BitNet b1.58 2B4T Technical Report
Authors:
Shuming Ma,
Hongyu Wang,
Shaohan Huang,
Xingxing Zhang,
Ying Hu,
Ting Song,
Yan Xia,
Furu Wei
Abstract:
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performanc…
▽ More
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performance on par with leading open-weight, full-precision LLMs of similar size, while offering significant advantages in computational efficiency, including substantially reduced memory footprint, energy consumption, and decoding latency. To facilitate further research and adoption, the model weights are released via Hugging Face along with open-source inference implementations for both GPU and CPU architectures.
△ Less
Submitted 24 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions
Authors:
Yifei Dong,
Fengyi Wu,
Sanjian Zhang,
Guangyu Chen,
Yuzhi Hu,
Masumi Yano,
Jingdong Sun,
Siyu Huang,
Feng Liu,
Qi Dai,
Zhi-Qi Cheng
Abstract:
Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide-ranging examination of the anti-UAV domain, centering on three core objectives-classification, detection, and tracking-while detailing emerging methodologies such as diffusion-based data synthesis, multi-…
▽ More
Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide-ranging examination of the anti-UAV domain, centering on three core objectives-classification, detection, and tracking-while detailing emerging methodologies such as diffusion-based data synthesis, multi-modal fusion, vision-language modeling, self-supervised learning, and reinforcement learning. We systematically evaluate state-of-the-art solutions across both single-modality and multi-sensor pipelines (spanning RGB, infrared, audio, radar, and RF) and discuss large-scale as well as adversarially oriented benchmarks. Our analysis reveals persistent gaps in real-time performance, stealth detection, and swarm-based scenarios, underscoring pressing needs for robust, adaptive anti-UAV systems. By highlighting open research directions, we aim to foster innovation and guide the development of next-generation defense strategies in an era marked by the extensive use of UAVs.
△ Less
Submitted 17 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Authors:
Haiming Wang,
Mert Unsal,
Xiaohan Lin,
Mantas Baksys,
Junqi Liu,
Marco Dos Santos,
Flood Sung,
Marina Vinyes,
Zhenzhe Ying,
Zekai Zhu,
Jianqiao Lu,
Hugues de Saxcé,
Bolton Bailey,
Chendong Song,
Chenjun Xiao,
Dehao Zhang,
Ebony Zhang,
Frederick Pu,
Han Zhu,
Jiawei Liu,
Jonas Bayer,
Julien Michel,
Longhui Yu,
Léo Dreyfus-Schmidt,
Lewis Tunstall
, et al. (15 additional authors not shown)
Abstract:
We introduce Kimina-Prover Preview, a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term \textit{forma…
▽ More
We introduce Kimina-Prover Preview, a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term \textit{formal reasoning pattern}. This approach allows the model to emulate human problem-solving strategies in Lean, iteratively generating and refining proof steps. Kimina-Prover sets a new state-of-the-art on the miniF2F benchmark, reaching 80.7% with pass@8192. Beyond improved benchmark performance, our work yields several key insights: (1) Kimina-Prover exhibits high sample efficiency, delivering strong results even with minimal sampling (pass@1) and scaling effectively with computational budget, stemming from its unique reasoning pattern and RL training; (2) we demonstrate clear performance scaling with model size, a trend previously unobserved for neural theorem provers in formal mathematics; (3) the learned reasoning style, distinct from traditional search algorithms, shows potential to bridge the gap between formal verification and informal mathematical intuition. We open source distilled versions with 1.5B and 7B parameters of Kimina-Prover
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
An Inexact Variable Metric Proximal Gradient-subgradient Algorithm for a Class of Fractional Optimization Problems
Authors:
Lei Yang,
Xiangrui Kong,
Min Zhang,
Yaohua Hu
Abstract:
In this paper, we study a class of fractional optimization problems, in which the numerator of the objective is the sum of a convex function and a differentiable function with a Lipschitz continuous gradient, while the denominator is a nonsmooth convex function. This model has broad applicability and encompasses several important optimization problems in the literature. To address these problems,…
▽ More
In this paper, we study a class of fractional optimization problems, in which the numerator of the objective is the sum of a convex function and a differentiable function with a Lipschitz continuous gradient, while the denominator is a nonsmooth convex function. This model has broad applicability and encompasses several important optimization problems in the literature. To address these problems, we propose an inexact variable metric proximal gradient-subgradient algorithm (iVPGSA), which, to our knowledge, is the first inexact proximal algorithm specifically designed for solving such type of fractional problems. By incorporating a variable metric proximal term and allowing for inexact solutions to the subproblem under a flexible error criterion, the proposed algorithm is highly adaptable to a broader range of problems while achieving favorable computational efficiency. Under mild assumptions, we establish that any accumulation point of the sequence generated by the iVPGSA is a critical point of the target problem. Moreover, we develop an improved Kurdyka-Łojasiewicz (KL)-based analysis framework to prove the global convergence of the entire sequence and characterize its convergence rate, \textit{without} requiring a strict sufficient descent property. Our results offer detailed insights into how the KL exponent and inexactness influence the convergence rate. The proposed analysis framework also has the potential to serve as a theoretical tool for studying the convergence rates of a wide range of inexact algorithms beyond the iVPGSA. Finally, some numerical experiments on the $\ell_1/\ell_2$ Lasso problem and the constrained $\ell_1/\ell_2$ sparse optimization problem are conducted to show the superior performance of the iVPGSA in comparison to existing algorithms.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Dynamic Compressing Prompts for Efficient Inference of Large Language Models
Authors:
Jinwu Hu,
Wei Zhang,
Yufeng Wang,
Yu Hu,
Bin Xiao,
Mingkui Tan,
Qing Du
Abstract:
Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can hinder performance because of the limited context windows of LLMs. While prompt compression is a straightforward solution, existing methods confront the challenges…
▽ More
Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can hinder performance because of the limited context windows of LLMs. While prompt compression is a straightforward solution, existing methods confront the challenges of retaining essential information, adapting to context changes, and remaining effective across different tasks. To tackle these issues, we propose a task-agnostic method called Dynamic Compressing Prompts (LLM-DCP). Our method reduces the number of prompt tokens while aiming to preserve the performance as much as possible. We model prompt compression as a Markov Decision Process (MDP), enabling the DCP-Agent to sequentially remove redundant tokens by adapting to dynamic contexts and retaining crucial content. We develop a reward function for training the DCP-Agent that balances the compression rate, the quality of the LLM output, and the retention of key information. This allows for prompt token reduction without needing an external black-box LLM. Inspired by the progressive difficulty adjustment in curriculum learning, we introduce a Hierarchical Prompt Compression (HPC) training strategy that gradually increases the compression difficulty, enabling the DCP-Agent to learn an effective compression method that maintains information integrity. Experiments demonstrate that our method outperforms state-of-the-art techniques, especially at higher compression rates. The code for our approach will be available at https://github.com/Fhujinwu/DCP.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Precise measurement of the form factors in $D^0\rightarrow K^*(892)^-μ^+ν_μ$ and test of lepton universality with $D^0\rightarrow K^*(892)^-\ell^+ν_{\ell}$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (696 additional authors not shown)
Abstract:
We report a study of the semileptonic decay $D^0 \rightarrow \bar{K}^0π^-μ^+ν_μ$ based on a sample of $7.9~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773~GeV with the BESIII detector at the BEPCII collider. The branching fraction of the decay is measured for the first time to be…
▽ More
We report a study of the semileptonic decay $D^0 \rightarrow \bar{K}^0π^-μ^+ν_μ$ based on a sample of $7.9~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773~GeV with the BESIII detector at the BEPCII collider. The branching fraction of the decay is measured for the first time to be $\mathcal{B}(D^0\rightarrow \bar{K}^0π^-μ^+ν_μ) = (1.373 \pm 0.020_{\rm stat} \pm 0.023_{\rm syst})\%$, where the first uncertainty is statistical and the second is systematic. Based on the investigation of the decay dynamics, we find that the decay is dominated by the $K^{*}(892)^-$ resonance with the branching fraction measured to be $\mathcal{B}(D^0\rightarrow K^{*}(892)^-μ^+ν_μ) = (1.948 \pm 0.033_{\rm stat} \pm 0.036_{\rm syst})\%$. We also determine the hadronic form factors for the $D^0\rightarrow K^{*}(892)^-μ^+ν_μ$ decay to be $r_{V} = V(0)/A_1(0) = 1.46 \pm 0.11_{\rm stat} \pm 0.04_{\rm syst}$, $r_{2} = A_2(0)/A_1(0) = 0.71 \pm 0.08_{\rm stat} \pm 0.03_{\rm syst}$, and $A_1(0)=0.609 \pm 0.008_{\rm stat} \pm 0.008_{\rm syst}$, where $V(0)$ is the vector form factor and $A_{1,2}(0)$ are the axial form factors evaluated at $q^2=0$. The $A_1(0)$ is measured for the first time in $D^0\rightarrow K^{*}(892)^-μ^+ν_μ$ decay. Averaging the form-factor parameters that we reported previously in $D^0\rightarrow K^*(892)^-(\rightarrow \bar{K}^0π^-)e^+ν_{e}$ and $D^0\rightarrow K^*(892)^-(\rightarrow K^-π^0)μ^+ν_μ$ decays, we obtain $r_{V}=1.456\pm0.040_{\rm stat}\pm0.016_{\rm syst}$, $r_{2}=0.715\pm0.031_{\rm stat}\pm0.014_{\rm stat}$, and $A_1(0)=0.614\pm0.005_{\rm stat}\pm0.004_{\rm syst}$. This is the most precise determination of the form-factor parameters to date measured in $D\rightarrow K^*(892)$ transition, which provide the most stringent test on various theoretical models.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation
Authors:
Jing Chen,
Zhihua Wei,
Wei Zhang,
Yingying Hu,
Qiong Zhang
Abstract:
Large language models (LLMs) hold great promise for assisting clinical interviews due to their fluent interactive capabilities and extensive medical knowledge. However, the lack of high-quality interview dialogue data and widely accepted evaluation methods has significantly impeded this process. So we propose CliniChat, a framework that integrates multi-source knowledge to enable LLMs to simulate…
▽ More
Large language models (LLMs) hold great promise for assisting clinical interviews due to their fluent interactive capabilities and extensive medical knowledge. However, the lack of high-quality interview dialogue data and widely accepted evaluation methods has significantly impeded this process. So we propose CliniChat, a framework that integrates multi-source knowledge to enable LLMs to simulate real-world clinical interviews. It consists of two modules: Clini-Recon and Clini-Eval, each responsible for reconstructing and evaluating interview dialogues, respectively. By incorporating three sources of knowledge, Clini-Recon transforms clinical notes into systematic, professional, and empathetic interview dialogues. Clini-Eval combines a comprehensive evaluation metric system with a two-phase automatic evaluation approach, enabling LLMs to assess interview performance like experts. We contribute MedQA-Dialog, a high-quality synthetic interview dialogue dataset, and CliniChatGLM, a model specialized for clinical interviews. Experimental results demonstrate that CliniChatGLM's interview capabilities undergo a comprehensive upgrade, particularly in history-taking, achieving state-of-the-art performance.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Authors:
Yifan Yang,
Shujie Liu,
Jinyu Li,
Yuxuan Hu,
Haibin Wu,
Hui Wang,
Jianwei Yu,
Lingwei Meng,
Haiyang Sun,
Yanqing Liu,
Yan Lu,
Kai Yu,
Xie Chen
Abstract:
Recent zero-shot text-to-speech (TTS) systems face a common dilemma: autoregressive (AR) models suffer from slow generation and lack duration controllability, while non-autoregressive (NAR) models lack temporal modeling and typically require complex designs. In this paper, we introduce a novel pseudo-autoregressive (PAR) codec language modeling approach that unifies AR and NAR modeling. Combining…
▽ More
Recent zero-shot text-to-speech (TTS) systems face a common dilemma: autoregressive (AR) models suffer from slow generation and lack duration controllability, while non-autoregressive (NAR) models lack temporal modeling and typically require complex designs. In this paper, we introduce a novel pseudo-autoregressive (PAR) codec language modeling approach that unifies AR and NAR modeling. Combining explicit temporal modeling from AR with parallel generation from NAR, PAR generates dynamic-length spans at fixed time steps. Building on PAR, we propose PALLE, a two-stage TTS system that leverages PAR for initial generation followed by NAR refinement. In the first stage, PAR progressively generates speech tokens along the time dimension, with each step predicting all positions in parallel but only retaining the left-most span. In the second stage, low-confidence tokens are iteratively refined in parallel, leveraging the global contextual information. Experiments demonstrate that PALLE, trained on LibriTTS, outperforms state-of-the-art systems trained on large-scale data, including F5-TTS, E2-TTS, and MaskGCT, on the LibriSpeech test-clean set in terms of speech quality, speaker similarity, and intelligibility, while achieving up to ten times faster inference speed. Audio samples are available at https://anonymous-palle.github.io.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
InstructEngine: Instruction-driven Text-to-Image Alignment
Authors:
Xingyu Lu,
Yuhang Hu,
YiFan Zhang,
Kaiyu Jiang,
Changyi Liu,
Tianke Zhang,
Jinpeng Wang,
Chun Yuan,
Bin Wen,
Fan Yang,
Tingting Gao,
Di Zhang
Abstract:
Reinforcement Learning from Human/AI Feedback (RLHF/RLAIF) has been extensively utilized for preference alignment of text-to-image models. Existing methods face certain limitations in terms of both data and algorithm. For training data, most approaches rely on manual annotated preference data, either by directly fine-tuning the generators or by training reward models to provide training signals. H…
▽ More
Reinforcement Learning from Human/AI Feedback (RLHF/RLAIF) has been extensively utilized for preference alignment of text-to-image models. Existing methods face certain limitations in terms of both data and algorithm. For training data, most approaches rely on manual annotated preference data, either by directly fine-tuning the generators or by training reward models to provide training signals. However, the high annotation cost makes them difficult to scale up, the reward model consumes extra computation and cannot guarantee accuracy. From an algorithmic perspective, most methods neglect the value of text and only take the image feedback as a comparative signal, which is inefficient and sparse. To alleviate these drawbacks, we propose the InstructEngine framework. Regarding annotation cost, we first construct a taxonomy for text-to-image generation, then develop an automated data construction pipeline based on it. Leveraging advanced large multimodal models and human-defined rules, we generate 25K text-image preference pairs. Finally, we introduce cross-validation alignment method, which refines data efficiency by organizing semantically analogous samples into mutually comparable pairs. Evaluations on DrawBench demonstrate that InstructEngine improves SD v1.5 and SDXL's performance by 10.53% and 5.30%, outperforming state-of-the-art baselines, with ablation study confirming the benefits of InstructEngine's all components. A win rate of over 50% in human reviews also proves that InstructEngine better aligns with human preferences.
△ Less
Submitted 21 April, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials
Authors:
Jingyun Yang,
Ruoyan Avery Yin,
Chi Jiang,
Yuepeng Hu,
Xiaokai Zhu,
Xingjian Hu,
Sutharsika Kumar,
Xiao Wang,
Xiaohua Zhai,
Keran Rong,
Yunyue Zhu,
Tianyi Zhang,
Zongyou Yin,
Jing Kong,
Neil Zhenqiang Gong,
Zhichu Ren,
Haozhe Wang
Abstract:
Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehendin…
▽ More
Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehending research objectives without requiring large training datasets. In this work, we present ATOMIC (Autonomous Technology for Optical Microscopy & Intelligent Characterization), an end-to-end framework that integrates foundation models to enable fully autonomous, zero-shot characterization of 2D materials. Our system integrates the vision foundation model (i.e., Segment Anything Model), large language models (i.e., ChatGPT), unsupervised clustering, and topological analysis to automate microscope control, sample scanning, image segmentation, and intelligent analysis through prompt engineering, eliminating the need for additional training. When analyzing typical MoS2 samples, our approach achieves 99.7% segmentation accuracy for single layer identification, which is equivalent to that of human experts. In addition, the integrated model is able to detect grain boundary slits that are challenging to identify with human eyes. Furthermore, the system retains robust accuracy despite variable conditions including defocus, color temperature fluctuations, and exposure variations. It is applicable to a broad spectrum of common 2D materials-including graphene, MoS2, WSe2, SnSe-regardless of whether they were fabricated via chemical vapor deposition or mechanical exfoliation. This work represents the implementation of foundation models to achieve autonomous analysis, establishing a scalable and data-efficient characterization paradigm that fundamentally transforms the approach to nanoscale materials research.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
Authors:
Zhaopeng Feng,
Shaosheng Cao,
Jiahan Ren,
Jiayuan Su,
Ruizhe Chen,
Yan Zhang,
Zhe Xu,
Yao Hu,
Jian Wu,
Zuozhu Liu
Abstract:
Large-scale reinforcement learning (RL) methods have proven highly effective in enhancing the reasoning abilities of large language models (LLMs), particularly for tasks with verifiable solutions such as mathematics and coding. However, applying this idea to machine translation (MT), where outputs are flexibly formatted and difficult to automatically evaluate with explicit rules, remains underexpl…
▽ More
Large-scale reinforcement learning (RL) methods have proven highly effective in enhancing the reasoning abilities of large language models (LLMs), particularly for tasks with verifiable solutions such as mathematics and coding. However, applying this idea to machine translation (MT), where outputs are flexibly formatted and difficult to automatically evaluate with explicit rules, remains underexplored. In this work, we introduce MT-R1-Zero, the first open-source adaptation of the R1-Zero RL framework for MT without supervised fine-tuning or cold-start. We propose a rule-metric mixed reward mechanism to guide LLMs towards improved translation quality via emergent reasoning. On the WMT 24 English-Chinese benchmark, our MT-R1-Zero-3B-Mix achieves competitive performance, surpassing TowerInstruct-7B-v0.2 by an average of 1.26 points. Meanwhile, our MT-R1-Zero-7B-Mix attains a high average score of 62.25 across all metrics, placing it on par with advanced proprietary models such as GPT-4o and Claude-3.5-Sonnet, while the MT-R1-Zero-7B-Sem variant achieves state-of-the-art scores on semantic metrics. Moreover, our work exhibits strong generalization capabilities on out-of-distribution MT tasks, robustly supporting multilingual and low-resource settings. Extensive analysis of model behavior across different initializations and reward metrics offers pioneering insight into the critical role of reward design, LLM adaptability, training dynamics, and emergent reasoning patterns within the R1-Zero paradigm for MT. Our code is available at https://github.com/fzp0424/MT-R1-Zero.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
Authors:
Xinnong Zhang,
Jiayu Lin,
Xinyi Mou,
Shiyue Yang,
Xiawei Liu,
Libo Sun,
Hanjia Lyu,
Yihang Yang,
Weihong Qi,
Yue Chen,
Guanying Li,
Ling Yan,
Yao Hu,
Siming Chen,
Yu Wang,
Xuanjing Huang,
Jiebo Luo,
Shiping Tang,
Libo Wu,
Baohua Zhou,
Zhongyu Wei
Abstract:
Social simulation is transforming traditional social science research by modeling human behavior through interactions between virtual individuals and their environments. With recent advances in large language models (LLMs), this approach has shown growing potential in capturing individual differences and predicting group behaviors. However, existing methods face alignment challenges related to the…
▽ More
Social simulation is transforming traditional social science research by modeling human behavior through interactions between virtual individuals and their environments. With recent advances in large language models (LLMs), this approach has shown growing potential in capturing individual differences and predicting group behaviors. However, existing methods face alignment challenges related to the environment, target users, interaction mechanisms, and behavioral patterns. To this end, we introduce SocioVerse, an LLM-agent-driven world model for social simulation. Our framework features four powerful alignment components and a user pool of 10 million real individuals. To validate its effectiveness, we conducted large-scale simulation experiments across three distinct domains: politics, news, and economics. Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness through standardized procedures and minimal manual adjustments.
△ Less
Submitted 23 April, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
$\mathbb{Z}_N$ generalizations of three-dimensional stabilizer codes
Authors:
Chanbeen Lee,
Yaozong Hu,
Gil Young Cho,
Haruki Watanabe
Abstract:
In this work, we generalize several three-dimensional Z2 stabilizer models--including the X-cube model, the three-dimensional toric code, and Haah's code--to their ZN counterparts. Under periodic boundary conditions, we analyze their ground state degeneracies and topological excitations, and uncover behaviors that strongly depend on system size. For the X-cube model, we identify excitations with m…
▽ More
In this work, we generalize several three-dimensional Z2 stabilizer models--including the X-cube model, the three-dimensional toric code, and Haah's code--to their ZN counterparts. Under periodic boundary conditions, we analyze their ground state degeneracies and topological excitations, and uncover behaviors that strongly depend on system size. For the X-cube model, we identify excitations with mobility restricted under local operations but relaxed under nonlocal ones derived from global topology. These excitations, previously confined to open boundaries in the Z2 model, now appear even under periodic boundaries. In the toric code, we observe nontrivial braiding between string and point excitations despite the absence of ground state degeneracy, indicating long-range entanglement independent of topological degeneracy. Again, this effect extends from open to periodic boundaries in the generalized models. For Haah's code, we find new excitations--fracton tripoles and monopoles--that remain globally constrained, along with a relaxation of immobility giving rise to lineons and planons. These results reveal new forms of topological order and suggest a broader framework for understanding fracton phases beyond the conventional Z2 setting.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Authors:
Juntao Zhao,
Qi Lu,
Wei Jia,
Borui Wan,
Lei Zuo,
Junda Feng,
Jianyu Jiang,
Yangrui Chen,
Shuaishuai Cao,
Jialing He,
Kaihua Jiang,
Yuanzhe Hu,
Yanghua Peng,
Haibin Lin,
Xin Liu,
Chuan Wu
Abstract:
Modern frameworks for training large foundation models (LFMs) employ data loaders in a data parallel paradigm. While this design offers implementation simplicity, it introduces two fundamental challenges. First, due to the quadratic computational complexity of the attention operator, the non-uniform sample distribution over data-parallel ranks leads to a significant workload imbalance among loader…
▽ More
Modern frameworks for training large foundation models (LFMs) employ data loaders in a data parallel paradigm. While this design offers implementation simplicity, it introduces two fundamental challenges. First, due to the quadratic computational complexity of the attention operator, the non-uniform sample distribution over data-parallel ranks leads to a significant workload imbalance among loaders, which degrades the training efficiency. This paradigm also impedes the implementation of data mixing algorithms (e.g., curriculum learning) over different datasets. Second, to acquire a broad range of capability, LFMs training ingests data from diverse sources, each with distinct file access states. Colocating massive datasets within loader instances can easily exceed local pod memory capacity. Additionally, heavy sources with higher transformation latency require larger worker pools, further exacerbating memory consumption.
We present OVERLORD, an industrial-grade distributed data loading architecture with three innovations: (1) A centralized and declarative data plane, which facilitates elastic data orchestration strategy, such as long-short context, multimodal, and curriculum learning; (2) Disaggregated multisource preprocessing through role-specific actors, i.e., Source Loaders and Data Constructors, leveraging autoscaling for Source Loaders towards heterogeneous and evolving source preprocessing cost; (3) Shadow Loaders with differential checkpointing for uninterrupted fault recovery. Deployed on production clusters scaling to multi-thousand GPU, OVERLORD achieves: (1) 4.5x end-to-end training throughput improvement, (2) a minimum 3.6x reduction in CPU memory usage, with further improvements to be added in later experiments.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Nash Social Welfare with Submodular Valuations: Approximation Algorithms and Integrality Gaps
Authors:
Xiaohui Bei,
Yuda Feng,
Yang Hu,
Shi Li,
Ruilong Zhang
Abstract:
We study the problem of allocating items to agents such that the (un)weighted Nash social welfare (NSW) is maximized under submodular valuations. The best-known results for unweighted and weighted problems are the $(4+ε)$ approximation given by Garg, Husic, Li, Vega, and Vondrak~\cite{stoc/GargHLVV23} and the $(233+ε)$ approximation given by Feng, Hu, Li, and Zhang~\cite{stoc/FHLZ25}, respectively…
▽ More
We study the problem of allocating items to agents such that the (un)weighted Nash social welfare (NSW) is maximized under submodular valuations. The best-known results for unweighted and weighted problems are the $(4+ε)$ approximation given by Garg, Husic, Li, Vega, and Vondrak~\cite{stoc/GargHLVV23} and the $(233+ε)$ approximation given by Feng, Hu, Li, and Zhang~\cite{stoc/FHLZ25}, respectively.
For the weighted NSW problem, we present a $(5.18+ε)$-approximation algorithm, significantly improving the previous approximation ratio and simplifying the analysis. Our algorithm is based on the same configuration LP in~\cite{stoc/FHLZ25}, but with a modified rounding algorithm. For the unweighted NSW problem, we show that the local search-based algorithm in~\cite{stoc/GargHLVV23} is an approximation of $(3.914+ε)$ by more careful analysis.
On the negative side, we prove that the configuration LP for weighted NSW with submodular valuations has an integrality gap at least $2^{\ln 2}-ε\approx 1.617 - ε$, which is slightly larger than the current best-known $e/(e-1)-ε\approx 1.582-ε$ hardness of approximation~\cite{talg/GargKK23}. For the additive valuation case, we show an integrality gap of $(e^{1/e}-ε)$, which proves that the ratio of $(e^{1/e}+ε)$~\cite{icalp/FengLi24} is tight for algorithms based on the configuration LP. For unweighted NSW with additive valuations, we show a gap of $(2^{1/4}-ε) \approx 1.189-ε$, slightly larger than the current best-known $\sqrt{8/7} \approx 1.069$-hardness for the problem~\cite{mor/Garg0M24}.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Predicting the critical behavior of complex dynamic systems via learning the governing mechanisms
Authors:
Xiangrong Wang,
Dan Lu,
Zongze Wu,
Weina Xu,
Hongru Hou,
Yanqing Hu,
Yamir Moreno
Abstract:
Critical points separate distinct dynamical regimes of complex systems, often delimiting functional or macroscopic phases in which the system operates. However, the long-term prediction of critical regimes and behaviors is challenging given the narrow set of parameters from which they emerge. Here, we propose a framework to learn the rules that govern the dynamic processes of a system. The learned…
▽ More
Critical points separate distinct dynamical regimes of complex systems, often delimiting functional or macroscopic phases in which the system operates. However, the long-term prediction of critical regimes and behaviors is challenging given the narrow set of parameters from which they emerge. Here, we propose a framework to learn the rules that govern the dynamic processes of a system. The learned governing rules further refine and guide the representative learning of neural networks from a series of dynamic graphs. This combination enables knowledge-based prediction for the critical behaviors of dynamical networked systems. We evaluate the performance of our framework in predicting two typical critical behaviors in spreading dynamics on various synthetic and real-world networks. Our results show that governing rules can be learned effectively and significantly improve prediction accuracy. Our framework demonstrates a scenario for facilitating the representability of deep neural networks through learning the underlying mechanism, which aims to steer applications for predicting complex behavior that learnable physical rules can drive.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Authors:
Haotian Xu,
Yue Hu,
Chen Gao,
Zhengqiu Zhu,
Yong Zhao,
Yong Li,
Quanjun Yin
Abstract:
Language-goal aerial navigation is a critical challenge in embodied AI, requiring UAVs to localize targets in complex environments such as urban blocks based on textual specification. Existing methods, often adapted from indoor navigation, struggle to scale due to limited field of view, semantic ambiguity among objects, and lack of structured spatial reasoning. In this work, we propose GeoNav, a g…
▽ More
Language-goal aerial navigation is a critical challenge in embodied AI, requiring UAVs to localize targets in complex environments such as urban blocks based on textual specification. Existing methods, often adapted from indoor navigation, struggle to scale due to limited field of view, semantic ambiguity among objects, and lack of structured spatial reasoning. In this work, we propose GeoNav, a geospatially aware multimodal agent to enable long-range navigation. GeoNav operates in three phases-landmark navigation, target search, and precise localization-mimicking human coarse-to-fine spatial strategies. To support such reasoning, it dynamically builds two different types of spatial memory. The first is a global but schematic cognitive map, which fuses prior textual geographic knowledge and embodied visual cues into a top-down, annotated form for fast navigation to the landmark region. The second is a local but delicate scene graph representing hierarchical spatial relationships between blocks, landmarks, and objects, which is used for definite target localization. On top of this structured representation, GeoNav employs a spatially aware, multimodal chain-of-thought prompting mechanism to enable multimodal large language models with efficient and interpretable decision-making across stages. On the CityNav urban navigation benchmark, GeoNav surpasses the current state-of-the-art by up to 12.53% in success rate and significantly improves navigation efficiency, even in hard-level tasks. Ablation studies highlight the importance of each module, showcasing how geospatial representations and coarse-to-fine reasoning enhance UAV navigation.
△ Less
Submitted 11 May, 2025; v1 submitted 13 April, 2025;
originally announced April 2025.
-
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
Authors:
Weixiang Zhao,
Jiahe Guo,
Yulin Hu,
Yang Deng,
An Zhang,
Xingyu Sui,
Xinyang Han,
Yanyan Zhao,
Bing Qin,
Tat-Seng Chua,
Ting Liu
Abstract:
Despite extensive efforts in safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks. Activation steering offers a training-free defense method but relies on fixed steering coefficients, resulting in suboptimal protection and increased false rejections of benign inputs. To address this, we propose AdaSteer, an adaptive activation steering method that dynamically adjus…
▽ More
Despite extensive efforts in safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks. Activation steering offers a training-free defense method but relies on fixed steering coefficients, resulting in suboptimal protection and increased false rejections of benign inputs. To address this, we propose AdaSteer, an adaptive activation steering method that dynamically adjusts model behavior based on input characteristics. We identify two key properties: Rejection Law (R-Law), which shows that stronger steering is needed for jailbreak inputs opposing the rejection direction, and Harmfulness Law (H-Law), which differentiates adversarial and benign inputs. AdaSteer steers input representations along both the Rejection Direction (RD) and Harmfulness Direction (HD), with adaptive coefficients learned via logistic regression, ensuring robust jailbreak defense while preserving benign input handling. Experiments on LLaMA-3.1, Gemma-2, and Qwen2.5 show that AdaSteer outperforms baseline methods across multiple jailbreak attacks with minimal impact on utility. Our results highlight the potential of interpretable model internals for real-time, flexible safety enforcement in LLMs.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Authors:
Yi Hu,
Jinhang Zuo,
Eddie Zhang,
Bob Iannucci,
Carlee Joe-Wong
Abstract:
Recent advancements in machine learning (ML) have enabled its deployment on resource-constrained edge devices, fostering innovative applications such as intelligent environmental sensing. However, these devices, particularly microcontrollers (MCUs), face substantial challenges due to limited memory, computing capabilities, and the absence of dedicated floating-point units (FPUs). These constraints…
▽ More
Recent advancements in machine learning (ML) have enabled its deployment on resource-constrained edge devices, fostering innovative applications such as intelligent environmental sensing. However, these devices, particularly microcontrollers (MCUs), face substantial challenges due to limited memory, computing capabilities, and the absence of dedicated floating-point units (FPUs). These constraints hinder the deployment of complex ML models, especially those requiring lifelong learning capabilities. To address these challenges, we propose Tin-Tin, an integer-based on-device training framework designed specifically for low-power MCUs. Tin-Tin introduces novel integer rescaling techniques to efficiently manage dynamic ranges and facilitate efficient weight updates using integer data types. Unlike existing methods optimized for devices with FPUs, GPUs, or FPGAs, Tin-Tin addresses the unique demands of tiny MCUs, prioritizing energy efficiency and optimized memory utilization. We validate the effectiveness of Tin-Tin through end-to-end application examples on real-world tiny devices, demonstrating its potential to support energy-efficient and sustainable ML applications on edge platforms.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Capillary Christoffel-Minkowski problem
Authors:
Yingxiang Hu,
Mohammad N. Ivaki,
Julian Scheuer
Abstract:
The result of Guan and Ma (Invent. Math. 151 (2003)) states that if $φ^{-1/k} : \mathbb{S}^n \to (0,\infty)$ is spherically convex, then $φ$ arises as the $σ_k$ curvature (the $k$-th elementary symmetric function of the principal radii of curvature) of a strictly convex hypersurface. In this paper, we establish an analogous result in the capillary setting in the half-space for $θ\in(0,π/2)$: if…
▽ More
The result of Guan and Ma (Invent. Math. 151 (2003)) states that if $φ^{-1/k} : \mathbb{S}^n \to (0,\infty)$ is spherically convex, then $φ$ arises as the $σ_k$ curvature (the $k$-th elementary symmetric function of the principal radii of curvature) of a strictly convex hypersurface. In this paper, we establish an analogous result in the capillary setting in the half-space for $θ\in(0,π/2)$: if $φ^{-1/k} : \mathcal{C}_θ \to (0,\infty)$ is a capillary function and spherically convex, then $φ$ is the $σ_k$ curvature of a strictly convex capillary hypersurface.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
Authors:
Qingyuan Wang,
Rui Song,
Jiaojiao Li,
Kerui Cheng,
David Ferstl,
Yinlin Hu
Abstract:
We introduce SCFlow2, a plug-and-play refinement framework for 6D object pose estimation. Most recent 6D object pose methods rely on refinement to get accurate results. However, most existing refinement methods either suffer from noises in establishing correspondences, or rely on retraining for novel objects. SCFlow2 is based on the SCFlow model designed for refinement with shape constraint, but f…
▽ More
We introduce SCFlow2, a plug-and-play refinement framework for 6D object pose estimation. Most recent 6D object pose methods rely on refinement to get accurate results. However, most existing refinement methods either suffer from noises in establishing correspondences, or rely on retraining for novel objects. SCFlow2 is based on the SCFlow model designed for refinement with shape constraint, but formulates the additional depth as a regularization in the iteration via 3D scene flow for RGBD frames. The key design of SCFlow2 is an introduction of geometry constraints into the training of recurrent matching network, by combining the rigid-motion embeddings in 3D scene flow and 3D shape prior of the target. We train SCFlow2 on a combination of dataset Objaverse, GSO and ShapeNet, and evaluate on BOP datasets with novel objects. After using our method as a post-processing, most state-of-the-art methods produce significantly better results, without any retraining or fine-tuning. The source code is available at https://scflow2.github.io.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.