Skip to main content

Showing 51–100 of 2,660 results for author: Huang, W

.
  1. arXiv:2506.06250  [pdf, ps, other

    nucl-ex hep-ex

    Coherent photoproduction of $ρ^0, ω$ and excited vector mesons in ultraperipheral PbPb collisions

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1127 additional authors not shown)

    Abstract: The invariant-mass distribution for the coherent photoproduction of dipions in ultraperipheral PbPb collisions is measured using data, corresponding to an integrated luminosity of $ 224.6 \pm 9.6\ μ$b$^{-1}$, collected by the LHCb experiment in 2018 at a nucleon-nucleon centre-of-mass energy $\sqrt{s_{\rm NN}}=5.02$ TeV. The dominant contribution is due to the $ρ^0$ meson but a consistent descript… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 22 pages, 7 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-042.html (LHCb public pages)

    Report number: CERN-EP-2025-106, LHCb-PAPER-2024-042

  2. arXiv:2506.05918  [pdf, ps, other

    cs.LG

    Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs

    Authors: Wenxuan Huo, Qiang He, Gang Zhu, Weifeng Huang

    Abstract: Partial differential equations (PDEs) serve as the cornerstone of mathematical physics. In recent years, Physics-Informed Neural Networks (PINNs) have significantly reduced the dependence on large datasets by embedding physical laws directly into the training of neural networks. However, when dealing with complex problems, the accuracy of PINNs still has room for improvement. To address this issue… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2506.05685  [pdf, ps, other

    cs.IR

    NGA: Non-autoregressive Generative Auction with Global Externalities for Advertising Systems

    Authors: Zuowu Zheng, Ze Wang, Fan Yang, Wenqing Ye, Weihua Huang, Wenqiang He, Teng Zhang, Xingxing Wang

    Abstract: Online advertising auctions are fundamental to internet commerce, demanding solutions that not only maximize revenue but also ensure incentive compatibility, high-quality user experience, and real-time efficiency. While recent learning-based auction frameworks have improved context modeling by capturing intra-list dependencies among ads, they remain limited in addressing global externalities and o… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2506.05137  [pdf, other

    q-fin.GN

    Neural Jumps for Option Pricing

    Authors: Duosi Zheng, Hanzhong Guo, Yanchu Liu, Wei Huang

    Abstract: Recognizing the importance of jump risk in option pricing, we propose a neural jump stochastic differential equation model in this paper, which integrates neural networks as parameter estimators in the conventional jump diffusion model. To overcome the problem that the backpropagation algorithm is not compatible with the jump process, we use the Gumbel-Softmax method to make the jump parameter gra… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  5. arXiv:2506.05083  [pdf, ps, other

    cs.CV

    SeedEdit 3.0: Fast and High-Quality Generative Image Editing

    Authors: Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang

    Abstract: We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline wit… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: Website: https://seed.bytedance.com/tech/seededit

  6. arXiv:2506.03157  [pdf, ps, other

    q-bio.BM cs.LG

    UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

    Authors: Ziyang Yu, Wenbing Huang, Yang Liu

    Abstract: Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar… ▽ More

    Submitted 5 June, 2025; v1 submitted 20 May, 2025; originally announced June 2025.

    Comments: ICML 2025 poster

  7. arXiv:2506.03107  [pdf, ps, other

    cs.CV

    ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions

    Authors: Di Chang, Mingdeng Cao, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang

    Abstract: Editing images with instructions to reflect non-rigid motions, camera viewpoint shifts, object deformations, human articulations, and complex interactions, poses a challenging yet underexplored problem in computer vision. Existing approaches and datasets predominantly focus on static scenes or rigid transformations, limiting their capacity to handle expressive edits involving dynamic motion. To ad… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: Website: https://boese0601.github.io/bytemorph Dataset: https://huggingface.co/datasets/ByteDance-Seed/BM-6M Benchmark: https://huggingface.co/datasets/ByteDance-Seed/BM-Bench Code: https://github.com/ByteDance-Seed/BM-code Demo: https://huggingface.co/spaces/Boese0601/ByteMorph-Demo

  8. arXiv:2506.02334  [pdf, ps, other

    cs.CV

    Generalized Category Discovery via Reciprocal Learning and Class-Wise Distribution Regularization

    Authors: Duo Liu, Zhiquan Tan, Linglan Zhao, Zhongqiang Zhang, Xiangzhong Fang, Weiran Huang

    Abstract: Generalized Category Discovery (GCD) aims to identify unlabeled samples by leveraging the base knowledge from labeled ones, where the unlabeled set consists of both base and novel classes. Since clustering methods are time-consuming at inference, parametric-based approaches have become more popular. However, recent parametric-based methods suffer from inferior base discrimination due to unreliable… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: ICML2025 Poster

  9. arXiv:2506.01977  [pdf, ps, other

    cs.LG cs.AI

    Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN

    Authors: Wei Huang, Hanchen Wang, Dong Wen, Shaozhen Ma, Wenjie Zhang, Xuemin Lin

    Abstract: Graph Edit Distance (GED) is a fundamental graph similarity metric widely used in various applications. However, computing GED is an NP-hard problem. Recent state-of-the-art hybrid GED solver has shown promising performance by formulating GED as a bipartite graph matching problem, then leveraging a generative diffusion model to predict node matching between two graphs, from which both the GED and… ▽ More

    Submitted 15 May, 2025; originally announced June 2025.

  10. arXiv:2506.01701  [pdf, ps, other

    cs.CV cs.AI

    Data Pruning by Information Maximization

    Authors: Haoru Tan, Sitong Wu, Wei Huang, Shizhen Zhao, Xiaojuan Qi

    Abstract: In this paper, we present InfoMax, a novel data pruning method, also known as coreset selection, designed to maximize the information content of selected samples while minimizing redundancy. By doing so, InfoMax enhances the overall informativeness of the coreset. The information of individual samples is measured by importance scores, which capture their influence or difficulty in model learning.… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: ICLR 2025

  11. arXiv:2506.01657  [pdf, ps, other

    quant-ph

    State Similarity in Modular Superconducting Quantum Processors with Classical Communications

    Authors: Bujiao Wu, Changrong Xie, Peng Mi, Zhiyi Wu, Zechen Guo, Peisheng Huang, Wenhui Huang, Xuandong Sun, Jiawei Zhang, Libo Zhang, Jiawei Qiu, Xiayu Linpeng, Ziyu Tao, Ji Chu, Ji Jiang, Song Liu, Jingjing Niu, Yuxuan Zhou, Yuxuan Du, Wenhui Ren, Youpeng Zhong, Tongliang Liu, Dapeng Yu

    Abstract: As quantum devices continue to scale, distributed quantum computing emerges as a promising strategy for executing large-scale tasks across modular quantum processors. A central challenge in this paradigm is verifying the correctness of computational outcomes when subcircuits are executed independently following circuit cutting. Here we propose a cross-platform fidelity estimation algorithm tailore… ▽ More

    Submitted 11 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: 10 pages, 3 figures, 27-page appendix, reference citation typos corrected

  12. arXiv:2506.01011  [pdf, other

    cs.CR

    Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack

    Authors: Siqi Hui, Yiren Song, Sanping Zhou, Ye Deng, Wenli Huang, Jinjun Wang

    Abstract: Autoregressive (AR) image generation models have gained increasing attention for their breakthroughs in synthesis quality, highlighting the need for robust watermarking to prevent misuse. However, existing in-generation watermarking techniques are primarily designed for diffusion models, where watermarks are embedded within diffusion latent states. This design poses significant challenges for dire… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  13. arXiv:2506.00866  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Projection Pursuit Density Ratio Estimation

    Authors: Meilin Wang, Wei Huang, Mingming Gong, Zheng Zhang

    Abstract: Density ratio estimation (DRE) is a paramount task in machine learning, for its broad applications across multiple domains, such as covariate shift adaptation, causal inference, independence tests and beyond. Parametric methods for estimating the density ratio possibly lead to biased results if models are misspecified, while conventional non-parametric methods suffer from the curse of dimensionali… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  14. arXiv:2505.24677  [pdf, other

    eess.SY

    Robust Distribution Network Reconfiguration Using Mapping-based Column-and-Constraint Generation

    Authors: Runjie Zhang, Kaiping Qu, Changhong Zhao, Wanjun Huang

    Abstract: The integration of intermittent renewable energy sources into distribution networks introduces significant uncertainties and fluctuations, challenging their operational security, stability, and efficiency. This paper considers robust distribution network reconfiguration (RDNR) with renewable generator resizing, modeled as a two-stage robust optimization (RO) problem with decision-dependent uncerta… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  15. arXiv:2505.24586  [pdf, ps, other

    astro-ph.HE

    All-sky search for individual Primordial Black Hole bursts with LHAASO

    Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen , et al. (293 additional authors not shown)

    Abstract: Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for… ▽ More

    Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 8 pages, 2 figures

  16. arXiv:2505.23922  [pdf, ps, other

    cs.CV cs.CL

    ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

    Authors: David Ma, Huaqing Yuan, Xingjian Wang, Qianbo Zang, Tianci Liu, Xinyang He, Yanbin Wei, Jiawei Guo, Ni Jiahui, Zhenzhu Yang, Meng Cao, Shanghaoran Quan, Yizhi Li, Wangchunshu Zhou, Jiaheng Liu, Wenhao Huang, Ge Zhang, Shiwen Ni, Xiaojie Jin

    Abstract: Although long-video understanding demands that models capture hierarchical temporal information -- from clip (seconds) and shot (tens of seconds) to event (minutes) and story (hours) -- existing benchmarks either neglect this multi-scale design or scatter scale-specific questions across different videos, preventing direct comparison of model performance across timescales on the same content. To ad… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  17. arXiv:2505.23810  [pdf, ps, other

    cs.CL cs.AI

    MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

    Authors: Chenghao Yang, Yinbo Luo, Zhoufutu Wen, Qi Chu, Tao Gong, Longxiang Liu, Kaiyuan Zhang, Jianpeng Jiao, Ge Zhang, Wenhao Huang, Nenghai Yu

    Abstract: Large Language Models (\textbf{LLMs}), e.g. ChatGPT, have been widely adopted in real-world dialogue applications. However, LLMs' robustness, especially in handling long complex dialogue sessions, including frequent motivation transfer, sophisticated cross-turn dependency, is criticized all along. Nevertheless, no existing benchmarks can fully reflect these weaknesses. We present \textbf{MARS-Benc… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 29 pages, 13 figures

  18. arXiv:2505.23053  [pdf, ps, other

    cs.IR cs.AI

    Augment or Not? A Comparative Study of Pure and Augmented Large Language Model Recommenders

    Authors: Wei-Hsiang Huang, Chen-Wei Ke, Wei-Ning Chiu, Yu-Xuan Su, Chun-Chun Yang, Chieh-Yuan Cheng, Yun-Nung Chen, Pu-Jen Cheng

    Abstract: Large language models (LLMs) have introduced new paradigms for recommender systems by enabling richer semantic understanding and incorporating implicit world knowledge. In this study, we propose a systematic taxonomy that classifies existing approaches into two categories: (1) Pure LLM Recommenders, which rely solely on LLMs, and (2) Augmented LLM Recommenders, which integrate additional non-LLM t… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  19. arXiv:2505.23024  [pdf, ps, other

    cs.LG

    An Empirical Study of Federated Prompt Learning for Vision Language Model

    Authors: Zhihao Wang, Wenke Huang, Tian Chen, Zekun Shi, Guancheng Wan, Yu Qiao, Bin Yang, Jian Wang, Bing Li, Mang Ye

    Abstract: The Vision Language Model (VLM) excels in aligning vision and language representations, and prompt learning has emerged as a key technique for adapting such models to downstream tasks. However, the application of prompt learning with VLM in federated learning (\fl{}) scenarios remains underexplored. This paper systematically investigates the behavioral differences between language prompt learning… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  20. arXiv:2505.22453  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

    Authors: Lai Wei, Yuting Li, Chen Wang, Yue Wang, Linghe Kong, Weiran Huang, Lichao Sun

    Abstract: Improving Multi-modal Large Language Models (MLLMs) in the post-training stage typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL). However, these supervised methods require expensive and manually annotated multi-modal data--an ultimately unsustainable resource. While recent efforts have explored unsupervised post-training, their methods are complex and difficult to ite… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  21. Parental Collaboration and Closeness: Envisioning with New Couple Parents

    Authors: Ya-Fang Lin, Xiaotian Li, Wan-Hsuan Huang, Charan Pushpanathan Prabavathi, Jie Cai, John M. Carroll

    Abstract: Couples often experience a decrease in closeness as they cope with the demands of parenthood. Existing technologies have supported parenting and parental collaboration. However, these technologies do not adequately support closeness in co-parenting. We use scenarios and design probes to brainstorm with 10 new parent couples to explore and envision possibilities for technologies to support closenes… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: DIS 2025

  22. arXiv:2505.22334  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

    Authors: Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang

    Abstract: Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While "aha moment" patterns--where models exhibit self-correction through reflection--are often attributed to emergent properties from RL, we first demonstrate that these patterns exist in multimodal LLMs… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  23. arXiv:2505.22195  [pdf, other

    cs.CV

    S2AFormer: Strip Self-Attention for Efficient Vision Transformer

    Authors: Guoan Xu, Wenfeng Huang, Wenjing Jia, Jiamao Li, Guangwei Gao, Guo-Jun Qi

    Abstract: Vision Transformer (ViT) has made significant advancements in computer vision, thanks to its token mixer's sophisticated ability to capture global dependencies between all tokens. However, the quadratic growth in computational demands as the number of tokens increases limits its practical efficiency. Although recent methods have combined the strengths of convolutions and self-attention to achieve… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 12 pages, 6 figures, 8 tables

  24. arXiv:2505.22098  [pdf, ps, other

    cs.CV

    UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images

    Authors: Junhuan Liu, San Jiang, Wei Ge, Wei Huang, Bingxuan Guo, Qingquan Li

    Abstract: The primary contribution of this paper is a challenging benchmark dataset, UAVPairs, and a training pipeline designed for match pair retrieval of large-scale UAV images. First, the UAVPairs dataset, comprising 21,622 high-resolution images across 30 diverse scenes, is constructed; the 3D points and tracks generated by SfM-based 3D reconstruction are employed to define the geometric similarity of i… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  25. arXiv:2505.21868  [pdf, other

    cs.CV

    Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection

    Authors: Guiping Cao, Wenjian Huang, Xiangyuan Lan, Jianguo Zhang, Dongmei Jiang, Yaowei Wang

    Abstract: Small Object Detection (SOD) poses significant challenges due to limited information and the model's low class prediction score. While Transformer-based detectors have shown promising performance, their potential for SOD remains largely unexplored. In typical DETR-like frameworks, the CNN backbone network, specialized in aggregating local information, struggles to capture the necessary contextual… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: IEEE TRANSACTIONS ON MULTIMEDIA

  26. arXiv:2505.20639  [pdf, other

    cs.CV

    Open-Det: An Efficient Learning Framework for Open-Ended Detection

    Authors: Guiping Cao, Tao Wang, Wenjian Huang, Xiangyuan Lan, Jianguo Zhang, Dongmei Jiang

    Abstract: Open-Ended object Detection (OED) is a novel and challenging task that detects objects and generates their category names in a free-form manner, without requiring additional vocabularies during inference. However, the existing OED models, such as GenerateU, require large-scale datasets for training, suffer from slow convergence, and exhibit limited performance. To address these issues, we present… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  27. arXiv:2505.19714  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning

    Authors: Zhaopeng Feng, Yupu Liang, Shaosheng Cao, Jiayuan Su, Jiahan Ren, Zhe Xu, Yao Hu, Wenxuan Huang, Jian Wu, Zuozhu Liu

    Abstract: Text Image Machine Translation (TIMT)-the task of translating textual content embedded in images-is critical for applications in accessibility, cross-lingual information access, and real-world document understanding. However, TIMT remains a complex challenge due to the need for accurate optical character recognition (OCR), robust visual-text reasoning, and high-quality translation, often requiring… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Work in progress

  28. arXiv:2505.18909  [pdf, ps, other

    stat.ML cs.LG

    On the Role of Label Noise in the Feature Learning Process

    Authors: Andi Han, Wei Huang, Zhanpeng Zhou, Gang Niu, Wuyang Chen, Junchi Yan, Akiko Takeda, Taiji Suzuki

    Abstract: Deep learning with noisy labels presents significant challenges. In this work, we theoretically characterize the role of label noise from a feature learning perspective. Specifically, we consider a signal-noise data distribution, where each sample comprises a label-dependent signal and label-independent noise, and rigorously analyze the training dynamics of a two-layer convolutional neural network… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  29. arXiv:2505.18640  [pdf, other

    cs.LG cs.AI

    ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation

    Authors: Jian Liang, Wenke Huang, Xianda Guo, Guancheng Wan, Bo Du, Mang Ye

    Abstract: Low-Rank Adaptation (LoRA) is widely adopted for downstream fine-tuning of foundation models due to its efficiency and zero additional inference cost. Many real-world applications require foundation models to specialize in multiple tasks simultaneously, motivating the need for efficient multi-task adaptation. While recent approaches integrate LoRA with mixture-of-experts (MoE) to address this, the… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  30. arXiv:2505.17746  [pdf, ps, other

    cs.CL

    Fast Quiet-STaR: Thinking Without Thought Tokens

    Authors: Wei Huang, Yizhe Xiong, Xin Ye, Zhijie Deng, Hui Chen, Zijia Lin, Guiguang Ding

    Abstract: Large Language Models (LLMs) have achieved impressive performance across a range of natural language processing tasks. However, recent advances demonstrate that further gains particularly in complex reasoning tasks require more than merely scaling up model sizes or training data. One promising direction is to enable models to think during the reasoning process. Recently, Quiet STaR significantly i… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 10 pages, 6 figures

    MSC Class: 68T50 ACM Class: I.2.7

  31. arXiv:2505.17104  [pdf, ps, other

    cs.CL cs.MM

    P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

    Authors: Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, Zhoujun Li

    Abstract: Academic posters are vital for scholarly communication, yet their manual creation is time-consuming. However, automated academic poster generation faces significant challenges in preserving intricate scientific details and achieving effective visual-textual integration. Existing approaches often struggle with semantic richness and structural nuances, and lack standardized benchmarks for evaluating… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  32. arXiv:2505.16916  [pdf, ps, other

    cs.CR cs.CV

    Backdoor Cleaning without External Guidance in MLLM Fine-tuning

    Authors: Xuankun Rong, Wenke Huang, Jian Liang, Jinhe Bi, Xun Xiao, Yiming Li, Bo Du, Mang Ye

    Abstract: Multimodal Large Language Models (MLLMs) are increasingly deployed in fine-tuning-as-a-service (FTaaS) settings, where user-submitted datasets adapt general-purpose models to downstream tasks. This flexibility, however, introduces serious security risks, as malicious fine-tuning can implant backdoors into MLLMs with minimal effort. In this paper, we observe that backdoor triggers systematically di… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  33. A Shape-Aware Total Body Photography System for In-focus Surface Coverage Optimization

    Authors: Wei-Lun Huang, Joshua Liu, Davood Tashayyod, Jun Kang, Amir Gandjbakhche, Misha Kazhdan, Mehran Armand

    Abstract: Total Body Photography (TBP) is becoming a useful screening tool for patients at high risk for skin cancer. While much progress has been made, existing TBP systems can be further improved for automatic detection and analysis of suspicious skin lesions, which is in part related to the resolution and sharpness of acquired images. This paper proposes a novel shape-aware TBP system automatically captu… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted to JBHI

  34. A pulsar-helium star compact binary system formed by common envelope evolution

    Authors: Z. L. Yang, J. L. Han, D. J. Zhou, W. C. Jing, W. C. Chen, T. Wang, X. D. Li, S. Wang, B. Wang, H. W. Ge, Y. L. Guo, L. H. Li, Y. Shao, J. F. Liu, W. Q. Su, L. G. Hou, W. J. Huang, J. C. Jiang, P. Jiang, J. H. Sun, B. J. Wang, C. Wang, H. G. Wang, J. B. Wang, N. Wang , et al. (11 additional authors not shown)

    Abstract: A stellar common envelope occurs in a binary system when the atmosphere of an evolving star expands to encompass an orbiting companion object. Such systems are predicted to evolve rapidly, ejecting the stellar envelope and leaving the companion in a tighter orbit around a stripped star. We used radio timing to identify a pulsar, PSR J1928+1815, with a spin period of 10.55 ms in a compact binary sy… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 26+25 pages, 4+8 figures, 1+3 tables. Published on Science in the 14 May issue of Science. Authors' version

    Journal ref: Science, 388, 859-863 (2025)

  35. arXiv:2505.15804  [pdf, ps, other

    cs.CV

    STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

    Authors: Zongzhao Li, Zongyang Ma, Mingze Li, Songyou Li, Yu Rong, Tingyang Xu, Ziqi Zhang, Deli Zhao, Wenbing Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across diverse tasks, yet they lag significantly behind humans in spatial reasoning. We investigate this gap through Transformation-Driven Visual Reasoning (TVR), a challenging task requiring identification of object transformations across images under varying viewpoints. While traditional Supervised Fine-Tuning (SF… ▽ More

    Submitted 26 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  36. arXiv:2505.15270  [pdf, other

    cs.LG cs.AI cs.CV

    Scaling Diffusion Transformers Efficiently via $μ$P

    Authors: Chenyu Zheng, Xinyu Zhang, Rongzhen Wang, Wei Huang, Zhi Tian, Weilin Huang, Jun Zhu, Chongxuan Li

    Abstract: Diffusion Transformers have emerged as the foundation for vision generative models, but their scalability is limited by the high cost of hyperparameter (HP) tuning at large scales. Recently, Maximal Update Parametrization ($μ$P) was proposed for vanilla Transformers, which enables stable HP transfer from small to large language models, and dramatically reduces tuning costs. However, it remains unc… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 35 pages, 10 figures, 15 tables

  37. arXiv:2505.15061  [pdf, ps, other

    cs.SD eess.AS

    SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit

    Authors: Wen-Chin Huang, Erica Cooper, Tomoki Toda

    Abstract: We introduce SHEET, a multi-purpose open-source toolkit designed to accelerate subjective speech quality assessment (SSQA) research. SHEET stands for the Speech Human Evaluation Estimation Toolkit, which focuses on data-driven deep neural network-based models trained to predict human-labeled quality scores of speech samples. SHEET provides comprehensive training and evaluation scripts, multi-datas… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: INTERSPEECH 2025. Codebase: https://github.com/unilight/sheet

  38. arXiv:2505.14552  [pdf, other

    cs.CL cs.AI cs.LG

    KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

    Authors: Jiajun Shi, Jian Yang, Jiaheng Liu, Xingyuan Bu, Jiangjie Chen, Junting Zhou, Kaijing Ma, Zhoufutu Wen, Bingli Wang, Yancheng He, Liang Song, Hualei Zhu, Shilong Li, Xingjian Wang, Wei Zhang, Ruibin Yuan, Yifan Yao, Wenjun Yang, Yunli Wang, Siyuan Fang, Siyu Yuan, Qianyu He, Xiangru Tang, Yingshui Tan, Wangchunshu Zhou , et al. (4 additional authors not shown)

    Abstract: Recent advancements in large language models (LLMs) underscore the need for more comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing benchmarks are often domain-specific and thus cannot fully capture an LLM's general reasoning potential. To address this limitation, we introduce the Knowledge Orthogonal Reasoning Gymnasium (KORGym), a dynamic evaluation plat… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 22 pages

  39. arXiv:2505.14494  [pdf, other

    hep-ex

    Measurements of charmed meson and antimeson production asymmetries at $\sqrt{s} =13.6$ TeV

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1126 additional authors not shown)

    Abstract: This article presents doubly differential measurements of the asymmetries in production rates between mesons containing a charm quark and those containing an anticharm quark in proton-proton collisions at a centre-of-mass energy of $\sqrt{s}=13.6$ TeV using data recorded by the LHCb experiment. The asymmetries of $D^0$, $D^+$ and $D_s^+$ mesons are measured for two-dimensional intervals in transve… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3757/ (LHCb public pages)

    Report number: LHCb-PAPER-2024-052, CERN-EP-2025-087

  40. arXiv:2505.14447  [pdf, ps, other

    astro-ph.HE hep-ex

    First Identification and Precise Spectral Measurement of the Proton Component in the Cosmic-Ray `Knee'

    Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (292 additional authors not shown)

    Abstract: We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and syst… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  41. arXiv:2505.13921  [pdf, ps, other

    cs.RO cs.AI

    APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight

    Authors: Wanjing Huang, Weixiang Yan, Zhen Zhang, Ambuj Singh

    Abstract: Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or adaptive decision-making through Reinforcement Learning (RL), but they fail to capture dynamic object interactions or require task-specific training, limiting their r… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  42. arXiv:2505.13786  [pdf

    cond-mat.mtrl-sci physics.chem-ph

    Active-Spin-State-Derived Descriptor for Hydrogen Evolution Reaction Catalysis

    Authors: Yu Tan, Lei Li, Zi-Xuan Yang, Tao Huang, Qiao-Ling Wang, Tao Zhang, Jing-Chun Luo, Gui-Fang Huang, Wangyu Hu, Wei-Qing Huang

    Abstract: Spin states are pivotal in modulating the electrocatalytic activity of transition-metal (TM)-based compounds, yet quantitatively evaluating the activity-spin state correlation remains a formidable challenge. Here, we propose an 'activity index n' as a descriptor, to assess the activity of the spin states for the hydrogen evolution reaction (HER). n descriptor integrates three key electronic parame… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 17 pages, 5 figures

  43. arXiv:2505.13408  [pdf, other

    cs.AI cs.CL

    CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process

    Authors: Jinhe Bi, Danqi Yan, Yifan Wang, Wenke Huang, Haokun Chen, Guancheng Wan, Mang Ye, Xun Xiao, Hinrich Schuetze, Volker Tresp, Yunpu Ma

    Abstract: Recent Large Reasoning Models significantly improve the reasoning ability of Large Language Models by learning to reason, exhibiting the promising performance in solving complex tasks. LRMs solve tasks that require complex reasoning by explicitly generating reasoning trajectories together with answers. Nevertheless, judging the quality of such an output answer is not easy because only considering… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  44. arXiv:2505.13328  [pdf, other

    cs.CL

    Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges

    Authors: Hongru Wang, Wenyu Huang, Yufei Wang, Yuanhao Xi, Jianqiao Lu, Huan Zhang, Nan Hu, Zeming Liu, Jeff Z. Pan, Kam-Fai Wong

    Abstract: Existing benchmarks that assess Language Models (LMs) as Language Agents (LAs) for tool use primarily focus on stateless, single-turn interactions or partial evaluations, such as tool selection in a single turn, overlooking the inherent stateful nature of interactions in multi-turn applications. To fulfill this gap, we propose \texttt{DialogTool}, a multi-turn dialogue dataset with stateful tool i… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  45. arXiv:2505.13031  [pdf, ps, other

    cs.AI

    MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

    Authors: Yicheng Xiao, Lin Song, Yukang Chen, Yingmin Luo, Yuxin Chen, Yukang Gan, Wei Huang, Xiu Li, Xiaojuan Qi, Ying Shan

    Abstract: Recent text-to-image systems face limitations in handling multimodal inputs and complex reasoning tasks. We introduce MindOmni, a unified multimodal large language model that addresses these challenges by incorporating reasoning generation through reinforcement learning. MindOmni leverages a three-phase training strategy: i) design of a unified vision language model with a decoder-only diffusion m… ▽ More

    Submitted 11 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/TencentARC/MindOmni

  46. arXiv:2505.12302  [pdf, other

    cs.LG

    SenseFlow: A Physics-Informed and Self-Ensembling Iterative Framework for Power Flow Estimation

    Authors: Zhen Zhao, Wenqi Huang, Zicheng Wang, Jiaxuan Hou, Peng Li, Lei Bai

    Abstract: Power flow estimation plays a vital role in ensuring the stability and reliability of electrical power systems, particularly in the context of growing network complexities and renewable energy integration. However, existing studies often fail to adequately address the unique characteristics of power systems, such as the sparsity of network connections and the critical importance of the unique Slac… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  47. arXiv:2505.12251  [pdf, other

    cs.CV

    SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis

    Authors: Haozhe Xiang, Han Zhang, Yu Cheng, Xiongwen Quan, Wanwan Huang

    Abstract: Multimodal medical image fusion plays a crucial role in medical diagnosis by integrating complementary information from different modalities to enhance image readability and clinical applicability. However, existing methods mainly follow computer vision standards for feature extraction and fusion strategy formulation, overlooking the rich semantic information inherent in medical images. To address… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  48. arXiv:2505.12229  [pdf

    cs.AI

    Sentience Quest: Towards Embodied, Emotionally Adaptive, Self-Evolving, Ethically Aligned Artificial General Intelligence

    Authors: David Hanson, Alexandre Varcoe, Fabio Senna, Vytas Krisciunas, Wenwei Huang, Jakub Sura, Katherine Yeung, Mario Rodriguez, Jovanka Wilsdorf, Kathy Smith

    Abstract: Previous artificial intelligence systems, from large language models to autonomous robots, excel at narrow tasks but lacked key qualities of sentient beings: intrinsic motivation, affective interiority, autobiographical sense of self, deep creativity, and abilities to autonomously evolve and adapt over time. Here we introduce Sentience Quest, an open research initiative to develop more capable art… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  49. arXiv:2505.12200  [pdf, other

    cs.CV

    CompBench: Benchmarking Complex Instruction-guided Image Editing

    Authors: Bohan Jia, Wenxuan Huang, Yuntian Tang, Junbo Qiao, Jincheng Liao, Shaosheng Cao, Fei Zhao, Zhaopeng Feng, Zhouhong Gu, Zhenfei Yin, Lei Bai, Wanli Ouyang, Lin Chen, Fei Zhao, Zihan Wang, Yuan Xie, Shaohui Lin

    Abstract: While real-world applications increasingly demand intricate scene manipulation, existing instruction-guided image editing benchmarks often oversimplify task complexity and lack comprehensive, fine-grained instructions. To bridge this gap, we introduce, a large-scale benchmark specifically designed for complex instruction-guided image editing. CompBench features challenging editing scenarios that i… ▽ More

    Submitted 20 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

  50. arXiv:2505.11754  [pdf, ps, other

    cs.CL

    Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation

    Authors: Wenyu Huang, Pavlos Vougiouklis, Mirella Lapata, Jeff Z. Pan

    Abstract: Multi-hop Question Answering (MHQA) adds layers of complexity to question answering, making it more challenging. When Language Models (LMs) are prompted with multiple search results, they are tasked not only with retrieving relevant information but also employing multi-hop reasoning across the information sources. Although LMs perform well on traditional question-answering tasks, the causal mask c… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: ACL 2025 main