Skip to main content

Showing 101–150 of 43,810 results for author: Huang

.
  1. arXiv:2506.14165  [pdf, ps, other

    eess.SP

    A Comprehensive Survey on Underwater Acoustic Target Positioning and Tracking: Progress, Challenges, and Perspectives

    Authors: Zhong Yang, Zhengqiu Zhu, Yong Zhao, Yonglin Tian, Changjun Fan, Runkang Guo, Wenhao Lu, Jingwei Ge, Bin Chen, Yin Zhang, Guohua Wu, Rui Wang, Gyorgy Eigner, Guangquan Cheng, Jincai Huang, Zhong Liu, Jun Zhang, Imre J. Rudas, Fei-Yue Wang

    Abstract: Underwater target tracking technology plays a pivotal role in marine resource exploration, environmental monitoring, and national defense security. Given that acoustic waves represent an effective medium for long-distance transmission in aquatic environments, underwater acoustic target tracking has become a prominent research area of underwater communications and networking. Existing literature re… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2506.14158  [pdf, ps, other

    cs.CL cs.AI

    S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models

    Authors: Tao He, Guang Huang, Yu Yang, Tianshi Xu, Sicheng Zhao, Guiguang Ding, Pengyang Wang, Feng Tian

    Abstract: Large language models (LLMs) exhibit remarkable reasoning capabilities across diverse downstream tasks. However, their autoregressive nature leads to substantial inference latency, posing challenges for real-time applications. Speculative sampling mitigates this issue by introducing a drafting phase followed by a parallel validation phase, enabling faster token generation and verification. Existin… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  3. arXiv:2506.14157  [pdf, ps, other

    cs.CL

    DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization

    Authors: Chengyu Huang, Tanya Goyal

    Abstract: Recent research has attempted to associate preference optimization (PO) performance with the underlying preference datasets. In this work, our observation is that the differences between the preferred response $y^+$ and dispreferred response $y^-$ influence what LLMs can learn, which may not match the desirable differences to learn. Therefore, we use distance and reward margin to quantify these di… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  4. arXiv:2506.14128  [pdf, ps, other

    quant-ph

    Tunable Hybrid-Mode Coupler Enabling Strong Interactions between Transmons at Centimeter-Scale Distance

    Authors: Jianwen Xu, Xiang Deng, Wen Zheng, Wenchang Yan, Tao Zhang, Zhenchuan Zhang, Wanli Huang, Xiaoyu Xia, Xudong Liao, Yu Zhang, Jie Zhao, Shaoxiong Li, Xinsheng Tan, Dong Lan, Yang Yu

    Abstract: The transmon, a fabrication-friendly superconducting qubit, remains a leading candidate for scalable quantum computing. Recent advances in tunable couplers have accelerated progress toward high-performance quantum processors. However, extending coherent interactions beyond millimeter scales to enhance quantum connectivity presents a critical challenge. Here, we introduce a hybrid-mode coupler expl… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages, 7 figures

  5. arXiv:2506.14091  [pdf, ps, other

    math.CA

    A Chebyshev criterion for at most two non-zero limit cycles in Abel equations

    Authors: Jianfeng Huang, Renhao Tian, Yulin Zhao

    Abstract: This paper investigates the Abel equation $\dot{x}=A(t)x^{3}+B(t)x^{2}$ on an interval $[0,T]$. The Smale-Pugh problem asks whether the maximum number of limit cycles of the equation is bounded in terms of a given class of coefficients. We establish for the first time a Chebyshev criterion, providing a positive answer to the problem when this class spanned by an extended Chebyshev system (ET-syste… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  6. arXiv:2506.14070  [pdf, ps, other

    cs.AI

    Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited Places

    Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Ilya Ilyankou, Junyuan Liu, Lu Yin, Weiming Huang, Natchapon Jongwiriyanurak

    Abstract: Predicting individuals' next locations is a core task in human mobility modelling, with wide-ranging implications for urban planning, transportation, public policy and personalised mobility services. Traditional approaches largely depend on location embeddings learned from historical mobility patterns, limiting their ability to encode explicit spatial information, integrate rich urban semantic con… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures

  7. arXiv:2506.14028  [pdf, ps, other

    cs.CL

    MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

    Authors: Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu, Peng Lu, Jerry Huang, Suyuchen Wang , et al. (19 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global finan… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  8. arXiv:2506.14015  [pdf, ps, other

    cs.CV

    Disentangling 3D from Large Vision-Language Models for Controlled Portrait Generation

    Authors: Nick Yiwen Huang, Akin Caliskan, Berkay Kicanaoglu, James Tompkin, Hyeongwoo Kim

    Abstract: We consider the problem of disentangling 3D from large vision-language models, which we show on generative 3D portraits. This allows free-form text control of appearance attributes like age, hair style, and glasses, and 3D geometry control of face expression and camera pose. In this setting, we assume we use a pre-trained large vision-language model (LVLM; CLIP) to generate from a smaller 2D datas… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  9. arXiv:2506.14009  [pdf, ps, other

    cs.RO

    GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

    Authors: Qianzhong Chen, Naixiang Gao, Suning Huang, JunEn Low, Timothy Chen, Jiankai Sun, Mac Schwager

    Abstract: Autonomous drones capable of interpreting and executing high-level language instructions in unstructured environments remain a long-standing goal. Yet existing approaches are constrained by their dependence on hand-crafted skills, extensive parameter tuning, or computationally intensive models unsuitable for onboard use. We introduce GRaD-Nav++, a lightweight Vision-Language-Action (VLA) framework… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  10. arXiv:2506.13999  [pdf, ps, other

    physics.ins-det hep-ex

    Development of non amplified Depleted MAPS sensors towards 50 ps timing resolution on charged particles

    Authors: Raimon Casanova, Yavuz Degerli, Yujing Gan, Sebastian Grinstein, Fabrice Guilloux, Tomasz Hemperek, G. Huang, Jean-Pierre Meyer, Philippe Schwemling

    Abstract: The MiniCactus sensors are demonstrator sensors designed in LFoundry LF15A 150 nm technology, intended to study the performance of non amplified High Voltage High Resistivity CMOS sensors for measurement of time of arrival of charged particles. This paper presents the context, design features and some of the first test-beam results obtained with the latest MiniCactus sensor version, MiniCactus V2.… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Presented at the 11th International Workshop on Semiconductor Pixel Detectors for Particles and Imaging, Strasbourg, France, 18-22 November 2024

  11. arXiv:2506.13977  [pdf, ps, other

    cs.SE cs.CL

    CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios

    Authors: Shiting Huang, Zhen Fang, Zehui Chen, Siyu Yuan, Junjie Ye, Yu Zeng, Lin Chen, Qi Mao, Feng Zhao

    Abstract: The ability of large language models (LLMs) to utilize external tools has enabled them to tackle an increasingly diverse range of tasks. However, as the tasks become more complex and long-horizon, the intricate tool utilization process may trigger various unexpected errors. Therefore, how to effectively handle such errors, including identifying, diagnosing, and recovering from them, has emerged as… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  12. arXiv:2506.13833  [pdf, ps, other

    cs.SD cs.AI cs.RO eess.AS physics.app-ph

    A Survey on World Models Grounded in Acoustic Physical Information

    Authors: Xiaoliang Chen, Le Chang, Xin Yu, Yunhe Huang, Xianling Tu

    Abstract: This survey provides a comprehensive overview of the emerging field of world models grounded in the foundation of acoustic physical information. It examines the theoretical underpinnings, essential methodological frameworks, and recent technological advancements in leveraging acoustic signals for high-fidelity environmental perception, causal physical reasoning, and predictive simulation of dynami… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 28 pages,11 equations

    MSC Class: 68T07; 35L05; 78A45 ACM Class: I.2.6; H.5.5; I.2.9

  13. arXiv:2506.13824  [pdf, ps, other

    cs.SE cs.AI

    MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios

    Authors: Jinyang Huang, Xiachong Feng, Qiguang Chen, Hanjie Zhao, Zihui Cheng, Jiesong Bai, Jingxuan Zhou, Min Li, Libo Qin

    Abstract: Code debugging is a crucial task in software engineering, which attracts increasing attention. While remarkable success has been made in the era of large language models (LLMs), current research still focuses on the simple no-library or single-library setting, ignoring the complex multi-library scenario in real-world applications. To address this limitation, we make the first attempt to introduce… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  14. arXiv:2506.13793  [pdf, ps, other

    cs.AI

    Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection

    Authors: Zongxian Yang, Jiayu Qian, Zegao Peng, Haoyu Zhang, Zhi-An Huang

    Abstract: Large reasoning models have recently made significant strides in mathematical and code reasoning, yet their success has not transferred smoothly to the medical domain. While multiple factors contribute to this disparity, a critical issue is the inadequate focus on the quality of intermediate reflection steps, which is particularly crucial in high-stakes medical scenarios. To address this challenge… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  15. arXiv:2506.13757  [pdf, ps, other

    cs.CV

    AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

    Authors: Zewei Zhou, Tianhui Cai, Seth Z. Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, Jiaqi Ma

    Abstract: Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities. However, current VLA models often struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning. In this paper, we propose AutoVLA, a novel VLA model that unifies reasoning and actio… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Website link:https://autovla.github.io/

  16. arXiv:2506.13754  [pdf, ps, other

    cs.LG cs.AI cs.CV

    VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models

    Authors: Edward Li, Zichen Wang, Jiahe Huang, Jeong Joon Park

    Abstract: We present a unified framework for solving partial differential equations (PDEs) using video-inpainting diffusion transformer models. Unlike existing methods that devise specialized strategies for either forward or inverse problems under full or partial observation, our approach unifies these tasks under a single, flexible generative framework. Specifically, we recast PDE-solving as a generalized… ▽ More

    Submitted 16 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: Project page: https://videopde.github.io/

  17. arXiv:2506.13751  [pdf, ps, other

    cs.RO cs.AI

    LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

    Authors: Haoru Xue, Xiaoyu Huang, Dantong Niu, Qiayuan Liao, Thomas Kragerud, Jan Tommy Gravdahl, Xue Bin Peng, Guanya Shi, Trevor Darrell, Koushil Screenath, Shankar Sastry

    Abstract: Vision-language-action (VLA) models have demonstrated strong semantic understanding and zero-shot generalization, yet most existing systems assume an accurate low-level controller with hand-crafted action "vocabulary" such as end-effector pose or root velocity. This assumption confines prior work to quasi-static tasks and precludes the agile, whole-body behaviors required by humanoid whole-body co… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: https://ember-lab-berkeley.github.io/LeVERB-Website/

  18. arXiv:2506.13725  [pdf, ps, other

    cs.RO

    CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

    Authors: Wenxuan Song, Jiayi Chen, Pengxiang Ding, Yuxin Huang, Han Zhao, Donglin Wang, Haoang Li

    Abstract: In recent years, Vision-Language-Action (VLA) models have become a vital research direction in robotics due to their impressive multimodal understanding and generalization capabilities. Despite the progress, their practical deployment is severely constrained by inference speed bottlenecks, particularly in high-frequency and dexterous manipulation tasks. While recent studies have explored Jacobi de… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 16 pages

  19. arXiv:2506.13724  [pdf, ps, other

    quant-ph physics.atom-ph

    Leveraging erasure errors in logical qubits with metastable $^{171}$Yb atoms

    Authors: Bichen Zhang, Genyue Liu, Guillaume Bornet, Sebastian P. Horvath, Pai Peng, Shuo Ma, Shilin Huang, Shruti Puri, Jeff D. Thompson

    Abstract: Implementing large-scale quantum algorithms with practical advantage will require fault-tolerance achieved through quantum error correction, but the associated overhead is a significant cost. The overhead can be reduced by engineering physical qubits with fewer errors, and by shaping the residual errors to be more easily correctable. In this work, we demonstrate quantum error correcting codes and… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  20. arXiv:2506.13695  [pdf, ps, other

    cs.IR

    OneRec Technical Report

    Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

    Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Authors are listed alphabetically by their first name

  21. arXiv:2506.13671  [pdf, ps, other

    stat.ME

    Do more observations bring more information in rare events?

    Authors: Danyang Huang, Liyuan Wang, Liping Zhu

    Abstract: It is generally believed that more observations provide more information. However, we observe that in the independence test for rare events, the power of the test is, surprisingly, determined by the number of rare events rather than the total sample size. Moreover, the correlations tend to shrink to zero even as the total sample size increases, as long as the proportion of rare events decreases. W… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  22. arXiv:2506.13612  [pdf, ps, other

    cs.CR cs.AI cs.DC

    EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning

    Authors: Zhiqiang Li, Haiyong Bao, Menghong Guan, Hao Pan, Cheng Huang, Hong-Ning Dai

    Abstract: Despite federated learning (FL)'s potential in collaborative learning, its performance has deteriorated due to the data heterogeneity of distributed users. Recently, clustered federated learning (CFL) has emerged to address this challenge by partitioning users into clusters according to their similarity. However, CFL faces difficulties in training when users are unwilling to share their cluster id… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by AAAI 25

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 39(17), 18593-18601, 2025

  23. arXiv:2506.13606  [pdf, ps, other

    math.CO cs.CG

    Largest dyadic dual VC-dimension of non-piercing families

    Authors: Xinqi Huang, Yuzhen Qi, Mingyuan Rong, Zixiang Xu

    Abstract: The dyadic dual VC-dimension of a set system \( \mathcal{F} \) is the largest integer \( \ell \) such that there exist \( \ell \) sets \( F_1, F_{2}, \dots, F_\ell \in \mathcal{F} \), where every pair \( \{i, j\} \in \binom{[\ell]}{2} \) is witnessed by an element \( a_{i,j} \in F_i \cap F_j \) that does not belong to any other set \( F_k \) with \( k \in [\ell] \setminus \{i, j\} \). In this pape… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 5 pages, 2 figures

    MSC Class: 52A35; 52C45

  24. arXiv:2506.13590  [pdf, ps, other

    cs.AI cs.CR cs.MA

    Agent Capability Negotiation and Binding Protocol (ACNBP)

    Authors: Ken Huang, Akram Sheriff, Vineeth Sai Narajala, Idan Habler

    Abstract: As multi-agent systems evolve to encompass increasingly diverse and specialized agents, the challenge of enabling effective collaboration between heterogeneous agents has become paramount, with traditional agent communication protocols often assuming homogeneous environments or predefined interaction patterns that limit their applicability in dynamic, open-world scenarios. This paper presents the… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures

  25. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  26. arXiv:2506.13516  [pdf, ps, other

    cs.CV

    Micro-macro Gaussian Splatting with Enhanced Scalability for Unconstrained Scene Reconstruction

    Authors: Yihui Li, Chengxin Lv, Hongyu Yang, Di Huang

    Abstract: Reconstructing 3D scenes from unconstrained image collections poses significant challenges due to variations in appearance. In this paper, we propose Scalable Micro-macro Wavelet-based Gaussian Splatting (SMW-GS), a novel method that enhances 3D reconstruction across diverse scales by decomposing scene representations into global, refined, and intrinsic components. SMW-GS incorporates the followin… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  27. arXiv:2506.13504  [pdf, ps, other

    gr-qc hep-th

    Lorentz violation signatures in the low-energy sector of Hořava gravity from black hole shadow observations

    Authors: Wentao Liu, Hongxia Huang, Di Wu, Jieci Wang

    Abstract: In this paper, we use the Hořava gravity model and EHT observations of supermassive black holes (BHs) to investigate signatures of Lorentz violation in real astrophysical environments. The Lorentz violation in the rotating Hořava BH spacetime are confined to the strong gravitational field region, being induced by the BH's rotation. Due to the non-separability of the photon motion equations in this… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages; 10 figures

  28. arXiv:2506.13503  [pdf, ps, other

    astro-ph.HE

    Fast Transitions of X-ray Variability in the Neutron Star Low Mass X-ray Binary Cygnus X-2

    Authors: Liang Zhang, Mariano Méndez, Hua Feng, Diego Altamirano, Zi-xu Yang, Qing-chang Zhao, Shuang-nan Zhang, Lian Tao, Yue Huang, Xiang Ma, Shu-mei Jia, Ming-yu Ge, Li-ming Song, Jin-lu Qu, Shu Zhang

    Abstract: We present a spectral-timing analysis of two NICER observations of the weakly magnetized neutron star low-mass X-ray binary Cygnus X-2. During these observations, we detect a rapid transition from a narrow 50-Hz horizontal-branch oscillation to a broad 5-Hz normal-branch oscillation, accompanied by an increase in source flux and a decrease in spectral hardness. Thanks to the large effective area o… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages, 7 figures, accepted for publication in ApJ

  29. arXiv:2506.13497  [pdf, ps, other

    cs.DC

    DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving

    Authors: Heyang Huang, Cunchen Hu, Jiaqi Zhu, Ziyuan Gao, Liangliang Xu, Yizhou Shan, Yungang Bao, Sun Ninghui, Tianwei Zhang, Sa Wang

    Abstract: The Text-to-Video (T2V) model aims to generate dynamic and expressive videos from textual prompts. The generation pipeline typically involves multiple modules, such as language encoder, Diffusion Transformer (DiT), and Variational Autoencoders (VAE). Existing serving systems often rely on monolithic model deployment, while overlooking the distinct characteristics of each module, leading to ineffic… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  30. arXiv:2506.13492  [pdf, ps, other

    cs.CV

    GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

    Authors: Chengrui Zhang, Maizhen Ning, Zihao Zhou, Jie Sun, Kaizhu Huang, Qiufeng Wang

    Abstract: Plane Geometry Diagram Synthesis has been a crucial task in computer graphics, with applications ranging from educational tools to AI-driven mathematical reasoning. Traditionally, we rely on computer tools (e.g., Matplotlib and GeoGebra) to manually generate precise diagrams, but it usually requires huge, complicated calculations cost. Recently, researchers start to work on learning-based methods… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  31. arXiv:2506.13444  [pdf, ps, other

    cs.CV

    Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular Images

    Authors: Laiyan Ding, Hualie Jiang, Jiwei Chen, Rui Huang

    Abstract: Depth map enhancement using paired high-resolution RGB images offers a cost-effective solution for improving low-resolution depth data from lightweight ToF sensors. Nevertheless, naively adopting a depth estimation pipeline to fuse the two modalities requires groundtruth depth maps for supervision. To address this, we propose a self-supervised learning framework, SelfToF, which generates detailed… ▽ More

    Submitted 17 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: accepted by IROS 2025

  32. arXiv:2506.13443  [pdf

    eess.IV cs.CV

    PRO: Projection Domain Synthesis for CT Imaging

    Authors: Kang Chen, Bin Huang, Xuebin Yang, Junyan Zhang, Qiegen Liu

    Abstract: Synthesizing high quality CT projection data remains a significant challenge due to the limited availability of annotated data and the complex nature of CT imaging. In this work, we present PRO, a projection domain synthesis foundation model for CT imaging. To the best of our knowledge, this is the first study that performs CT synthesis in the projection domain. Unlike previous approaches that ope… ▽ More

    Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  33. arXiv:2506.13428  [pdf, ps, other

    cs.RO

    VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation

    Authors: Jiaming Chen, Yiyu Jiang, Aoshen Huang, Yang Li, Wei Pan

    Abstract: Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two obje… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  34. arXiv:2506.13387  [pdf, ps, other

    cs.CV

    TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast

    Authors: Beilei Cui, Yiming Huang, Long Bai, Hongliang Ren

    Abstract: This work presents a generalizable framework to transfer relative depth to metric depth. Current monocular depth estimation methods are mainly divided into metric depth estimation (MMDE) and relative depth estimation (MRDE). MMDEs estimate depth in metric scale but are often limited to a specific domain. MRDEs generalize well across different domains, but with uncertain scales which hinders downst… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  35. arXiv:2506.13301  [pdf, ps, other

    cs.CV

    AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing

    Authors: Biao Yang, Muqi Huang, Yuhui Zhang, Yun Xiong, Kun Zhou, Xi Chen, Shiyang Zhou, Huishuai Bao, Chuan Li, Feng Shi, Hualei Liu

    Abstract: Traditional point-based image editing methods rely on iterative latent optimization or geometric transformations, which are either inefficient in their processing or fail to capture the semantic relationships within the image. These methods often overlook the powerful yet underutilized image editing capabilities inherent in pre-trained diffusion models. In this work, we propose a novel one-step po… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  36. arXiv:2506.13289  [pdf, ps, other

    nucl-th

    $0νββ$ decay nuclear matrix elements under Left-Right symmetric model from the spherical quasi-particle random phase approximation method with realistic force

    Authors: Ri-Guang Huang, You-Cai Chen, Dong-Liang Fang

    Abstract: We perform the calculation of nuclear matrix elements for the neutrinoless double beta decays under a Left-Right symmetric model mediated by light neutrino, and we adopt the spherical quasi-particle random-phase approximation (QRPA) approach with realistic force. For eight nuclei: $^{76}$Ge, $^{82}$Se, $^{96}$Zr, $^{100}$Mo, $^{116}$Cd, $^{128}$Te, $^{130}$Te and $^{136}$Xe, related nuclear matrix… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 10 pages, 1 figure

  37. arXiv:2506.13222  [pdf, ps, other

    cs.AI cs.LG

    NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification

    Authors: Zhenyu Xia, Xinlei Huang, Suvash C. Saha

    Abstract: Electroencephalography (EEG) is extensively employed in medical diagnostics and brain-computer interface (BCI) applications due to its non-invasive nature and high temporal resolution. However, EEG analysis faces significant challenges, including noise, nonstationarity, and inter-subject variability, which hinder its clinical utility. Traditional neural networks often lack integration with biophys… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  38. arXiv:2506.13185  [pdf, ps, other

    quant-ph

    Quantum Recurrent Embedding Neural Network

    Authors: Mingrui Jing, Erdong Huang, Xiao Shi, Shengyu Zhang, Xin Wang

    Abstract: Quantum neural networks have emerged as promising quantum machine learning models, leveraging the properties of quantum systems and classical optimization to solve complex problems in physics and beyond. However, previous studies have demonstrated inevitable trainability issues that severely limit their capabilities in the large-scale regime. In this work, we propose a quantum recurrent embedding… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 39 pages including appendix

  39. arXiv:2506.13138  [pdf, ps, other

    cs.CV

    STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation

    Authors: Jiamin Wang, Yichen Yao, Xiang Feng, Hang Wu, Yaming Wang, Qingqiu Huang, Yuexin Ma, Xinge Zhu

    Abstract: The generation of temporally consistent, high-fidelity driving videos over extended horizons presents a fundamental challenge in autonomous driving world modeling. Existing approaches often suffer from error accumulation and feature misalignment due to inadequate decoupling of spatio-temporal dynamics and limited cross-frame feature propagation mechanisms. To address these limitations, we present… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  40. arXiv:2506.13131  [pdf, ps, other

    cs.AI cs.LG cs.NE

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Authors: Alexander Novikov, Ngân Vũ, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, Matej Balog

    Abstract: In this white paper, we present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs on highly challenging tasks such as tackling open scientific problems or optimizing critical pieces of computational infrastructure. AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  41. arXiv:2506.13120  [pdf, ps, other

    cs.LG

    Accelerating PDE-Constrained Optimization by the Derivative of Neural Operators

    Authors: Ze Cheng, Zhuoyu Li, Xiaoqiang Wang, Jianing Huang, Zhizhou Zhang, Zhongkai Hao, Hang Su

    Abstract: PDE-Constrained Optimization (PDECO) problems can be accelerated significantly by employing gradient-based methods with surrogate models like neural operators compared to traditional numerical solvers. However, this approach faces two key challenges: (1) **Data inefficiency**: Lack of efficient data sampling and effective training for neural operators, particularly for optimization purpose. (2) **… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  42. arXiv:2506.13112  [pdf, ps, other

    cond-mat.stat-mech

    First-passage and extreme value statistics for overdamped Brownian motion in a linear potential

    Authors: Feng Huang, Hanshuang Chen

    Abstract: We investigate the first-passage properties and extreme-value statistics of an overdamped Brownian particle confined by an external linear potential $V(x)=μ|x-x_0|$, where $μ>0$ is the strength of the potential and $x_0>0$ is the position of the lowest point of the potential, which coincides with the starting position of the particle. The Brownian motion terminates whenever the particle passes thr… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 8 pages, 4 figures

    Journal ref: Physica A 672 (2025) 130673

  43. arXiv:2506.13092  [pdf, ps, other

    cs.AI cs.LG

    A Memetic Walrus Algorithm with Expert-guided Strategy for Adaptive Curriculum Sequencing

    Authors: Qionghao Huang, Lingnuo Lu, Xuemei Wu, Fan Jiang, Xizhe Wang, Xun Wang

    Abstract: Adaptive Curriculum Sequencing (ACS) is essential for personalized online learning, yet current approaches struggle to balance complex educational constraints and maintain optimization stability. This paper proposes a Memetic Walrus Optimizer (MWO) that enhances optimization performance through three key innovations: (1) an expert-guided strategy with aging mechanism that improves escape from loca… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: The article has been accepted and published by Human-centric Computing and Information Sciences

  44. arXiv:2506.13079  [pdf, ps, other

    cs.RO cs.HC

    CHARM: Considering Human Attributes for Reinforcement Modeling

    Authors: Qidi Fang, Hang Yu, Shijie Fang, Jindan Huang, Qiuyu Chen, Reuben M. Aronson, Elaine S. Short

    Abstract: Reinforcement Learning from Human Feedback has recently achieved significant success in various fields, and its performance is highly related to feedback quality. While much prior work acknowledged that human teachers' characteristics would affect human feedback patterns, there is little work that has closely investigated the actual effects. In this work, we designed an exploratory study investiga… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Journal ref: ROMAN 2025

  45. arXiv:2506.13061  [pdf, ps, other

    cs.LG math.CA math.NA

    Fast Convergence for High-Order ODE Solvers in Diffusion Probabilistic Models

    Authors: Daniel Zhengyu Huang, Jiaoyang Huang, Zhengjiang Lin

    Abstract: Diffusion probabilistic models generate samples by learning to reverse a noise-injection process that transforms data into noise. Reformulating this reverse process as a deterministic probability flow ordinary differential equation (ODE) enables efficient sampling using high-order solvers, often requiring only $\mathcal{O}(10)$ steps. Since the score function is typically approximated by a neural… ▽ More

    Submitted 18 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

    Comments: 64 pages, 7 figures

  46. arXiv:2506.13039  [pdf, ps, other

    cs.CV

    Evolution of ReID: From Early Methods to LLM Integration

    Authors: Amran Bhuiyan, Mizanur Rahman, Md Tahmid Rahman Laskar, Aijun An, Jimmy Xiangji Huang

    Abstract: Person re-identification (ReID) has evolved from handcrafted feature-based methods to deep learning approaches and, more recently, to models incorporating large language models (LLMs). Early methods struggled with variations in lighting, pose, and viewpoint, but deep learning addressed these issues by learning robust visual features. Building on this, LLMs now enable ReID systems to integrate sema… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  47. arXiv:2506.13038  [pdf, ps, other

    cs.CV cs.MM

    HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs

    Authors: Zijian Zhang, Xuecheng Wu, Danlei Huang, Siyu Yan, Chong Peng, Xuezhi Cao

    Abstract: Driven by the rapid progress in vision-language models (VLMs), the responsible behavior of large-scale multimodal models has become a prominent research area, particularly focusing on hallucination detection and factuality checking. In this paper, we present the solution for the two tracks of Responsible AI challenge. Inspirations from the general domain demonstrate that a smaller distilled VLM ca… ▽ More

    Submitted 17 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

  48. arXiv:2506.13035  [pdf, ps, other

    gr-qc

    Probing Dark Matter's Gravitational Effects Locally with TianQin

    Authors: Zheng-Cheng Liang, Fa-Peng Huang, Xuefeng Zhang, Yi-Ming Hu

    Abstract: In this study, we explore the potential of using TianQin missions to probe the local gravitational effects of dark matter. The TianQin project plans to launch satellites at both low and high orbits. High-precision orbit determination is expected to assist in the Earth's gravity or gravitational waves detection. By comparing the derived masses in low and high orbits, it is possible to constrain the… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 5 pages, 1 figure

  49. arXiv:2506.12992  [pdf, ps, other

    cs.CV

    SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models

    Authors: Xinyi Zhao, Congjing Zhang, Pei Guo, Wei Li, Lin Chen, Chaoyue Zhao, Shuai Huang

    Abstract: Video anomaly detection (VAD) is essential for enhancing safety and security by identifying unusual events across different environments. Existing VAD benchmarks, however, are primarily designed for general-purpose scenarios, neglecting the specific characteristics of smart home applications. To bridge this gap, we introduce SmartHome-Bench, the first comprehensive benchmark specially designed for… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 Workshop: VAND 3.0 - Visual Anomaly and Novelty Detection

  50. arXiv:2506.12986  [pdf, ps, other

    q-bio.GN

    Improving spliced alignment by modeling splice sites with deep learning

    Authors: Siying Yang, Neng Huang, Heng Li

    Abstract: Motivation: Spliced alignment refers to the alignment of messenger RNA (mRNA) or protein sequences to eukaryotic genomes. It plays a critical role in gene annotation and the study of gene functions. Accurate spliced alignment demands sophisticated modeling of splice sites, but current aligners use simple models, which may affect their accuracy given dissimilar sequences. Results: We implemented… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.