Skip to main content

Showing 1–50 of 9,834 results for author: Li, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10483  [pdf, ps, other

    cs.CV cs.AI

    UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation

    Authors: Yi Li, Haonan Wang, Qixiang Zhang, Boyu Xiao, Chenchang Hu, Hualiang Wang, Xiaomeng Li

    Abstract: The emergence of unified multimodal understanding and generation models is rapidly attracting attention because of their ability to enhance instruction-following capabilities while minimizing model redundancy. However, there is a lack of a unified evaluation framework for these models, which would enable an elegant, simplified, and overall evaluation. Current models conduct evaluations on multiple… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: UniEval is the first evaluation framework designed for unified multimodal models, including a holistic benchmark UniBench and the UniScore metric

  2. arXiv:2505.10315  [pdf, other

    cs.CR cs.AI

    Private Transformer Inference in MLaaS: A Survey

    Authors: Yang Li, Xinyu Zhou, Yitong Wang, Liangxin Qian, Jun Zhao

    Abstract: Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-pa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.10289  [pdf, ps, other

    cs.CV

    MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

    Authors: Yue Wang, Shuai Xu, Xuelin Zhu, Yicong Li

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen state-object combinations by leveraging known combinations. Existing studies basically rely on the cross-modal alignment capabilities of CLIP but tend to overlook its limitations in capturing fine-grained local features, which arise from its architectural and training paradigm. To address this issue, we propose a Multi-Stage Cross-mo… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 9 pages, 5 figures

  4. arXiv:2505.10218  [pdf, ps, other

    cs.CL

    RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward

    Authors: Zongsheng Wang, Kaili Sun, Bowen Wu, Qun Yu, Ying Li, Baoxun Wang

    Abstract: Role-playing conversational agents (RPCAs) face persistent challenges in maintaining role consistency. To address this, we propose RAIDEN-R1, a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR). The method introduces both singular and multi-term mining strategies to generate quantifiable rewards by assessing role-specific keys. Additionally, we construc… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  5. arXiv:2505.10176  [pdf, ps, other

    cs.NE

    Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence

    Authors: Xiang He, Dongcheng Zhao, Yang Li, Qingqun Kong, Xin Yang, Yi Zeng

    Abstract: Multimodal learning enhances the perceptual capabilities of cognitive systems by integrating information from different sensory modalities. However, existing multimodal fusion research typically assumes static integration, not fully incorporating key dynamic mechanisms found in the brain. Specifically, the brain exhibits an inverse effectiveness phenomenon, wherein weaker unimodal cues yield stron… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: The manuscript is under review and the code is available at https://github.com/Brain-Cog-Lab/IEMF

  6. arXiv:2505.10118  [pdf, ps, other

    cs.CV cs.CL

    Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

    Authors: Yangfu Li, Hongjian Zhan, Tianyi Chen, Qi Liu, Yue Lu

    Abstract: Existing visual token pruning methods target prompt alignment and visual preservation with static strategies, overlooking the varying relative importance of these objectives across tasks, which leads to inconsistent performance. To address this, we derive the first closed-form error bound for visual token pruning based on the Hausdorff distance, uniformly characterizing the contributions of both o… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 31 pages,9 figures,conference

  7. arXiv:2505.10105  [pdf, ps, other

    cs.RO cs.AI

    EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation

    Authors: Zibin Dong, Fei Ni, Yifu Yuan, Yinchuan Li, Jianye Hao

    Abstract: We present EmbodiedMAE, a unified 3D multi-modal representation for robot manipulation. Current approaches suffer from significant domain gaps between training datasets and robot manipulation tasks, while also lacking model architectures that can effectively incorporate 3D information. To overcome these limitations, we enhance the DROID dataset with high-quality depth maps and point clouds, constr… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  8. arXiv:2505.09977  [pdf, other

    cs.CE

    Physical regularized Hierarchical Generative Model for Metallic Glass Structural Generation and Energy Prediction

    Authors: Qiyuan Chen, Ajay Annamareddy, Ying-Fei Li, Dane Morgan, Bu Wang

    Abstract: Disordered materials such as glasses, unlike crystals, lack long range atomic order and have no periodic unit cells, yielding a high dimensional configuration space with widely varying properties. The complexity not only increases computational costs for atomistic simulations but also makes it difficult for generative AI models to deliver accurate property predictions and realistic structure gener… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  9. arXiv:2505.09616  [pdf, other

    cs.SD cs.AI eess.AS

    SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

    Authors: Yuqi Li, Yuanzhong Zheng, Zhongtian Guo, Yaoxuan Wang, Jianjun Yin, Haojun Fei

    Abstract: This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2Vec2 for feature extraction and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and empha… ▽ More

    Submitted 10 January, 2025; originally announced May 2025.

    Comments: 2 pages,3 figures,1 chart

    MSC Class: I.2.0

  10. arXiv:2505.09558  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

    Authors: Shengpeng Ji, Tianle Liang, Yangzhuo Li, Jialong Zuo, Minghui Fang, Jinzheng He, Yifu Chen, Zhengqing Liu, Ziyue Jiang, Xize Cheng, Siqi Zheng, Jin Xu, Junyang Lin, Zhou Zhao

    Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models' conversational performance has largely been overlooked. This is primarily due to the intelligent chatbots convey a wealth of non-textual information which cannot be easily measured using text-based language models like ChatGPT.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  11. arXiv:2505.09065  [pdf

    cs.HC cs.IR

    Display Content, Display Methods and Evaluation Methods of the HCI in Explainable Recommender Systems: A Survey

    Authors: Weiqing Li, Yue Xu, Yuefeng Li, Yinghui Huang

    Abstract: Explainable Recommender Systems (XRS) aim to provide users with understandable reasons for the recommendations generated by these systems, representing a crucial research direction in artificial intelligence (AI). Recent research has increasingly focused on the algorithms, display, and evaluation methodologies of XRS. While current research and reviews primarily emphasize the algorithmic aspects,… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 2 Tables, 29 figures

  12. arXiv:2505.08844  [pdf, other

    q-bio.GN cs.AI

    CellTypeAgent: Trustworthy cell type annotation with Large Language Models

    Authors: Jiawen Chen, Jianghao Zhang, Huaxiu Yao, Yun Li

    Abstract: Cell type annotation is a critical yet laborious step in single-cell RNA sequencing analysis. We present a trustworthy large language model (LLM)-agent, CellTypeAgent, which integrates LLMs with verification from relevant databases. CellTypeAgent achieves higher accuracy than existing methods while mitigating hallucinations. We evaluated CellTypeAgent across nine real datasets involving 303 cell t… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    MSC Class: 68T20 ACM Class: I.2.1

  13. arXiv:2505.08765  [pdf, other

    cs.CV cs.AI

    Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

    Authors: Yatai Ji, Zhengqiu Zhu, Yong Zhao, Beidan Liu, Chen Gao, Yihao Zhao, Sihang Qiu, Yue Hu, Quanjun Yin, Yong Li

    Abstract: Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects using visual and textual cues without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object distinction, and the exploration-exploitation dilemma. To bridge this gap… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  14. arXiv:2505.08744  [pdf, other

    cs.AI

    DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

    Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang , et al. (6 additional authors not shown)

    Abstract: To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures

  15. arXiv:2505.08616  [pdf

    eess.IV cs.CV cs.LG

    A portable diagnosis model for Keratoconus using a smartphone

    Authors: Yifan Li, Peter Ho, Jo Woon Chong

    Abstract: Keratoconus (KC) is a corneal disorder that results in blurry and distorted vision. Traditional diagnostic tools, while effective, are often bulky, costly, and require professional operation. In this paper, we present a portable and innovative methodology for diagnosing. Our proposed approach first captures the image reflected on the eye's cornea when a smartphone screen-generated Placido disc she… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  16. arXiv:2505.08536  [pdf, ps, other

    eess.SP cs.IT

    Short Wins Long: Short Codes with Language Model Semantic Correction Outperform Long Codes

    Authors: Jiafu Hao, Chentao Yue, Hao Chang, Branka Vucetic, Yonghui Li

    Abstract: This paper presents a novel semantic-enhanced decoding scheme for transmitting natural language sentences with multiple short block codes over noisy wireless channels. After ASCII source coding, the natural language sentence message is divided into segments, where each is encoded with short block channel codes independently before transmission. At the receiver, each short block of codewords is dec… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures

  17. arXiv:2505.08527  [pdf, other

    cs.CV

    Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting

    Authors: Zheang Huai, Hui Tang, Yi Li, Zhuangzhuang Chen, Xiaomeng Li

    Abstract: Source-free domain adaptation (SFDA) for segmentation aims at adapting a model trained in the source domain to perform well in the target domain with only the source model and unlabeled target data.Inspired by the recent success of Segment Anything Model (SAM) which exhibits the generality of segmenting images of various modalities and in different domains given human-annotated prompts like boundi… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  18. arXiv:2505.08517  [pdf

    cs.CV cs.LG

    A Deep Learning-Driven Inhalation Injury Grading Assistant Using Bronchoscopy Images

    Authors: Yifan Li, Alan W Pang, Jo Woon Chong

    Abstract: Inhalation injuries present a challenge in clinical diagnosis and grading due to Conventional grading methods such as the Abbreviated Injury Score (AIS) being subjective and lacking robust correlation with clinical parameters like mechanical ventilation duration and patient mortality. This study introduces a novel deep learning-based diagnosis assistant tool for grading inhalation injuries using b… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  19. arXiv:2505.08255  [pdf, ps, other

    cs.CR cs.CV

    Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted

    Authors: Shuaiwei Yuan, Junyu Dong, Yuezun Li

    Abstract: With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been developed as reliable tools for assessing face authenticity. These detectors are typically developed on Deep Neural Networks (DNNs) and trained using third-party datasets. However, this protocol raises a new s… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  20. arXiv:2505.07916  [pdf, ps, other

    eess.AS cs.SD

    MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

    Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He

    Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech. A key innovation is our learnable speaker encoder, which extracts timbre features from a reference audio without requiring its transcription. This enables MiniMax-Speech to produce highly expressive speech with timbre consistent with the reference in a zero-shot manner, w… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  21. arXiv:2505.07895  [pdf, ps, other

    cs.LG cs.AI

    Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

    Authors: Jiafan Li, Jiaqi Zhu, Liang Chang, Yilin Li, Miaomiao Li, Yang Wang, Hongan Wang

    Abstract: Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban's movie networks and Amazon's product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either ea… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  22. arXiv:2505.07834  [pdf, other

    cs.NI cs.AI cs.CR cs.PL

    ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet

    Authors: Yuekang Li, Wei Song, Bangshuo Zhu, Dong Gong, Yi Liu, Gelei Deng, Chunyang Chen, Lei Ma, Jun Sun, Toby Walsh, Jingling Xue

    Abstract: We introduce ai.txt, a novel domain-specific language (DSL) designed to explicitly regulate interactions between AI models, agents, and web content, addressing critical limitations of the widely adopted robots.txt standard. As AI increasingly engages with online materials for tasks such as training, summarization, and content modification, existing regulatory methods lack the necessary granularity… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  23. arXiv:2505.07782  [pdf, ps, other

    cs.LG

    MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

    Authors: Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai

    Abstract: We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experimen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  24. arXiv:2505.07680  [pdf, other

    cs.LG cs.DC

    SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models

    Authors: Hang Wu, Jianian Zhu, Yinghui Li, Haojie Wang, Biao Hou, Jidong Zhai

    Abstract: Large Language Models (LLMs) present a critical trade-off between inference quality and computational cost: larger models offer superior capabilities but incur significant latency, while smaller models are faster but less powerful. Existing serving strategies often employ fixed model scales or static two-stage speculative decoding, failing to dynamically adapt to the varying complexities of user r… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 10 pages

  25. arXiv:2505.07634  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

    Authors: Jian Liu, Xiongtao Shi, Thai Duy Nguyen, Haitian Zhang, Tianxiang Zhang, Wei Sun, Yanjie Li, Athanasios V. Vasilakos, Giovanni Iacca, Arshad Ali Khan, Arvind Kumar, Jae Won Cho, Ajmal Mian, Lihua Xie, Erik Cambria, Lin Wang

    Abstract: The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the ris… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 51 pages, 17 figures, 9 tables

  26. arXiv:2505.07387  [pdf, ps, other

    cs.CV

    Feature Visualization in 3D Convolutional Neural Networks

    Authors: Chunpeng Li, Ya-tang Li

    Abstract: Understanding the computations of convolutional neural networks requires effective visualization of their kernels. While maximal activation methods have proven successful in highlighting the preferred features of 2D convolutional kernels, directly applying these techniques to 3D convolutions often leads to uninterpretable results due to the higher dimensionality and complexity of 3D features. To a… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  27. arXiv:2505.07378  [pdf, ps, other

    math.CO cs.CC math.LO

    Undecidability of Polynomial Inequalities in Subset Densities and Additive Energies

    Authors: Yaqiao Li

    Abstract: Many results in extremal graph theory can be formulated as certain polynomial inequalities in graph homomorphism densities. Answering fundamental questions raised by Lov{á}sz, Szegedy and Razborov, Hatami and Norine proved that determining the validity of an arbitrary such polynomial inequality in graph homomorphism densities is undecidable. We observe that many results in additive combinatorics c… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: accepted in COCOON2025

  28. arXiv:2505.07320  [pdf, other

    cs.LG cs.AI

    Dynamical Label Augmentation and Calibration for Noisy Electronic Health Records

    Authors: Yuhao Li, Ling Luo, Uwe Aickelin

    Abstract: Medical research, particularly in predicting patient outcomes, heavily relies on medical time series data extracted from Electronic Health Records (EHR), which provide extensive information on patient histories. Despite rigorous examination, labeling errors are inevitable and can significantly impede accurate predictions of patient outcome. To address this challenge, we propose an \textbf{A}ttenti… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  29. arXiv:2505.07089  [pdf, ps, other

    cs.AI

    RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

    Authors: Hanzheng Dai, Yuanliang Li, Zhibo Zhang, Jun Yan

    Abstract: Automated penetration testing (AutoPT) powered by large language models (LLMs) has gained attention for its ability to automate ethical hacking processes and identify vulnerabilities in target systems by leveraging the intrinsic knowledge of LLMs. However, existing LLM-based AutoPT frameworks often underperform compared to human experts in challenging tasks for several reasons: the imbalanced know… ▽ More

    Submitted 13 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  30. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  31. arXiv:2505.06937  [pdf, other

    cs.CV

    Transformer-Based Dual-Optical Attention Fusion Crowd Head Point Counting and Localization Network

    Authors: Fei Zhou, Yi Li, Mingqing Zhu

    Abstract: In this paper, the dual-optical attention fusion crowd head point counting model (TAPNet) is proposed to address the problem of the difficulty of accurate counting in complex scenes such as crowd dense occlusion and low light in crowd counting tasks under UAV view. The model designs a dual-optical attention fusion module (DAFP) by introducing complementary information from infrared images to impro… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  32. arXiv:2505.06918  [pdf, other

    eess.IV cs.CV cs.LG

    Uni-AIMS: AI-Powered Microscopy Image Analysis

    Authors: Yanhui Hong, Nan Wang, Zhiyi Xia, Haoyi Tao, Xi Fang, Yiming Li, Jiankun Wang, Peng Jin, Xiaochen Cai, Shengyu Li, Ziqi Chen, Zezhong Zhang, Guolin Ke, Linfeng Zhang

    Abstract: This paper presents a systematic solution for the intelligent recognition and automatic analysis of microscopy images. We developed a data engine that generates high-quality annotated datasets through a combination of the collection of diverse microscopy images from experiments, synthetic data generation and a human-in-the-loop annotation process. To address the unique challenges of microscopy ima… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  33. arXiv:2505.06858  [pdf, other

    cs.LG

    FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers

    Authors: Tianyu Chen, Haoyi Zhou, Ying Li, Hao Wang, Zhenzhe Zhang, Tianchen Zhu, Shanghang Zhang, Jianxin Li

    Abstract: Fourier Neural Operators (FNO) have emerged as promising solutions for efficiently solving partial differential equations (PDEs) by learning infinite-dimensional function mappings through frequency domain transformations. However, the sparsity of high-frequency signals limits computational efficiency for high-dimensional inputs, and fixed-pattern truncation often causes high-frequency signal loss,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  34. arXiv:2505.06556  [pdf, other

    cs.DB cs.DC

    TierBase: A Workload-Driven Cost-Optimized Key-Value Store

    Authors: Zhitao Shen, Shiyu Yang, Weibo Chen, Kunming Wang, Yue Li, Jiabao Jin, Wei Jia, Junwei Chen, Yuan Su, Xiaoxia Duan, Wei Chen, Lei Wang, Jie Song, Ruoyi Ruan, Xuemin Lin

    Abstract: In the current era of data-intensive applications, the demand for high-performance, cost-effective storage solutions is paramount. This paper introduces a Space-Performance Cost Model for key-value store, designed to guide cost-effective storage configuration decisions. The model quantifies the trade-offs between performance and storage costs, providing a framework for optimizing resource allocati… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Accepted by ICDE 2025

  35. arXiv:2505.06538  [pdf, other

    cs.CL

    Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model

    Authors: Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi, Chi Chen, Kaiyu Huang

    Abstract: The rapid development of multimodal large reasoning models (MLRMs) has demonstrated broad application potential, yet their safety and reliability remain critical concerns that require systematic exploration. To address this gap, we conduct a comprehensive and systematic safety evaluation of 11 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models. More… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  36. arXiv:2505.06347  [pdf, ps, other

    quant-ph cs.AI hep-lat hep-ph

    Quantum State Preparation via Large-Language-Model-Driven Evolution

    Authors: Qing-Hong Cao, Zong-Yue Hou, Ying-Ying Li, Xiaohui Liu, Zhuo-Yang Song, Liang-Qi Zhang, Shutao Zhang, Ke Zhao

    Abstract: We propose an automated framework for quantum circuit design by integrating large-language models (LLMs) with evolutionary optimization to overcome the rigidity, scalability limitations, and expert dependence of traditional ones in variational quantum algorithms. Our approach (FunSearch) autonomously discovers hardware-efficient ansätze with new features of scalability and system-size-independent… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 6 + 4 pages, 14 figures

    Report number: CPTNP-25-0001

  37. arXiv:2505.06290  [pdf, other

    cs.LG cs.DM

    UniCO: Towards a Unified Model for Combinatorial Optimization Problems

    Authors: Zefang Zong, Xiaochen Wei, Guozhen Zhang, Chen Gao, Huandong Wang, Yong Li

    Abstract: Combinatorial Optimization (CO) encompasses a wide range of problems that arise in many real-world scenarios. While significant progress has been made in developing learning-based methods for specialized CO problems, a unified model with a single architecture and parameter set for diverse CO problems remains elusive. Such a model would offer substantial advantages in terms of efficiency and conven… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  38. arXiv:2505.06240  [pdf, ps, other

    eess.SP cs.IT

    Pinching-Antenna Assisted Simultaneous Wireless Information and Power Transfer

    Authors: Yixuan Li, Ji Wang, Yuanwei Liu, Zhiguo Ding

    Abstract: This letter introduces a novel pinching-antenna-system (PASS) assisted simultaneous wireless information and power transfer (SWIPT), where multiple pinching antennas (PAs) are strategically activiated on a waveguide to facilitate information transmission to multiple information receivers (IRs) and power transfer to multiple energy receivers (ERs) simultaneously. Leveraging the single-waveguide arc… ▽ More

    Submitted 26 April, 2025; originally announced May 2025.

  39. arXiv:2505.06117  [pdf, other

    cs.CV

    Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation

    Authors: Dongying Li, Binyi Su, Hua Zhang, Yong Li, Haiyong Chen

    Abstract: Accurate defect detection of photovoltaic (PV) cells is critical for ensuring quality and efficiency in intelligent PV manufacturing systems. However, the scarcity of rich defect data poses substantial challenges for effective model training. While existing methods have explored generative models to augment datasets, they often suffer from instability, limited diversity, and domain shifts. To addr… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  40. arXiv:2505.05806  [pdf, ps, other

    cs.CV

    Image Segmentation via Variational Model Based Tailored UNet: A Deep Variational Framework

    Authors: Kaili Qi, Wenli Yang, Ye Li, Zhongyi Huang

    Abstract: Traditional image segmentation methods, such as variational models based on partial differential equations (PDEs), offer strong mathematical interpretability and precise boundary modeling, but often suffer from sensitivity to parameter settings and high computational costs. In contrast, deep learning models such as UNet, which are relatively lightweight in parameters, excel in automatic feature ex… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  41. arXiv:2505.05803  [pdf

    cs.LG

    A novel Neural-ODE model for the state of health estimation of lithium-ion battery using charging curve

    Authors: Yiming Li, Man He, Jiapeng Liu

    Abstract: The state of health (SOH) of lithium-ion batteries (LIBs) is crucial for ensuring the safe and reliable operation of electric vehicles. Nevertheless, the prevailing SOH estimation methods often have limited generalizability. This paper introduces a data-driven approach for estimating the SOH of LIBs, which is designed to improve generalization. We construct a hybrid model named ACLA, which integra… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 28 pages, 6 figures

  42. arXiv:2505.05784  [pdf, ps, other

    q-fin.TR cs.AI cs.CE q-fin.CP

    FlowHFT: Flow Policy Induced Optimal High-Frequency Trading under Diverse Market Conditions

    Authors: Yang Li, Zhi Chen, Steve Yang

    Abstract: High-frequency trading (HFT) is an investing strategy that continuously monitors market states and places bid and ask orders at millisecond speeds. Traditional HFT approaches fit models with historical data and assume that future market states follow similar patterns. This limits the effectiveness of any single model to the specific conditions it was trained for. Additionally, these models achieve… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 14 pages, 1 figure, 6 tables, 2 algorithms

  43. arXiv:2505.05768  [pdf, other

    eess.IV cs.AI cs.CV

    Predicting Diabetic Macular Edema Treatment Responses Using OCT: Dataset and Methods of APTOS Competition

    Authors: Weiyi Zhang, Peranut Chotcomwongse, Yinwen Li, Pusheng Xu, Ruijie Yao, Lianhao Zhou, Yuxuan Zhou, Hui Feng, Qiping Zhou, Xinyue Wang, Shoujin Huang, Zihao Jin, Florence H. T. Chung, Shujun Wang, Yalin Zheng, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

    Abstract: Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 42 pages,5 tables, 12 figures, challenge report

  44. arXiv:2505.05622  [pdf, ps, other

    cs.RO cs.AI

    CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory

    Authors: Weichen Zhang, Chen Gao, Shiquan Yu, Ruiying Peng, Baining Zhao, Qian Zhang, Jinqiang Cui, Xinlei Chen, Yong Li

    Abstract: Aerial vision-and-language navigation (VLN), requiring drones to interpret natural language instructions and navigate complex urban environments, emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN du… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  45. arXiv:2505.05528  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP

    Authors: Hanxun Huang, Sarah Erfani, Yige Li, Xingjun Ma, James Bailey

    Abstract: As Contrastive Language-Image Pre-training (CLIP) models are increasingly adopted for diverse downstream tasks and integrated into large vision-language models (VLMs), their susceptibility to adversarial perturbations has emerged as a critical concern. In this work, we introduce \textbf{X-Transfer}, a novel attack method that exposes a universal adversarial vulnerability in CLIP. X-Transfer genera… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  46. arXiv:2505.05470  [pdf, other

    cs.CV cs.AI

    Flow-GRPO: Training Flow Matching Models via Online RL

    Authors: Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang

    Abstract: We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistica… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/yifan123/flow_grpo

  47. arXiv:2505.05240  [pdf, other

    cs.CV

    PADriver: Towards Personalized Autonomous Driving

    Authors: Genghua Kou, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Ziheng Zhang, Osamu Yoshie, Tiancai Wang, Ying Li, Xiangyu Zhang

    Abstract: In this paper, we propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD). Built upon Multi-modal Large Language Model (MLLM), PADriver takes streaming frames and personalized textual prompts as inputs. It autoaggressively performs scene understanding, danger level estimation and action decision. The predicted danger level reflects the risk of the potential action… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  48. arXiv:2505.05190  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

    Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal

    Abstract: Text watermarking aims to subtly embed statistical signals into text by controlling the Large Language Model (LLM)'s sampling process, enabling watermark detectors to verify that the output was generated by the specified model. The robustness of these watermarking algorithms has become a key factor in evaluating their effectiveness. Current text watermarking algorithms embed watermarks in high-ent… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: ICML 2025 Accpeted

  49. arXiv:2505.05084  [pdf, ps, other

    cs.CL

    Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction

    Authors: Xiaowei Zhu, Yubing Ren, Yanan Cao, Xixun Lin, Fang Fang, Yangxi Li

    Abstract: The rapid advancement of large language models has raised significant concerns regarding their potential misuse by malicious actors. As a result, developing effective detectors to mitigate these risks has become a critical priority. However, most existing detection methods focus excessively on detection accuracy, often neglecting the societal risks posed by high false positive rates (FPRs). This p… ▽ More

    Submitted 14 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  50. arXiv:2505.04931  [pdf, other

    cs.LG cs.AI

    Fair Uncertainty Quantification for Depression Prediction

    Authors: Yonghong Li, Xiuzhuang Zhou

    Abstract: Trustworthy depression prediction based on deep learning, incorporating both predictive reliability and algorithmic fairness across diverse demographic groups, is crucial for clinical application. Recently, achieving reliable depression predictions through uncertainty quantification has attracted increasing attention. However, few studies have focused on the fairness of uncertainty quantification… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.