Skip to main content

Showing 1–50 of 1,507 results for author: Li, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10415  [pdf, ps, other

    cs.RO cs.HC

    Internal State Estimation in Groups via Active Information Gathering

    Authors: Xuebo Ji, Zherong Pan, Xifeng Gao, Lei Yang, Xinxin Du, Kaiyun Li, Yongjin Liu, Wenping Wang, Changhe Tu, Jia Pan

    Abstract: Accurately estimating human internal states, such as personality traits or behavioral patterns, is critical for enhancing the effectiveness of human-robot interaction, particularly in group settings. These insights are key in applications ranging from social navigation to autism diagnosis. However, prior methods are limited by scalability and passive observation, making real-time estimation in com… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10010  [pdf, ps, other

    cs.LG

    ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts

    Authors: Jing-Cheng Pang, Kaiyuan Li, Yidi Wang, Si-Hang Yang, Shengyi Jiang, Yang Yu

    Abstract: A central challenge in reinforcement learning (RL) is its dependence on extensive real-world interaction data to learn task-specific policies. While recent work demonstrates that large language models (LLMs) can mitigate this limitation by generating synthetic experience (noted as imaginary rollouts) for mastering novel tasks, progress in this emerging field is hindered due to the lack of a standa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09943  [pdf, ps, other

    cs.CV

    CSPENet: Contour-Aware and Saliency Priors Embedding Network for Infrared Small Target Detection

    Authors: Jiakun Deng, Kexuan Li, Xingye Cui, Jiaxuan Li, Chang Long, Tian Pu, Zhenming Peng

    Abstract: Infrared small target detection (ISTD) plays a critical role in a wide range of civilian and military applications. Existing methods suffer from deficiencies in the localization of dim targets and the perception of contour information under dense clutter environments, severely limiting their detection performance. To tackle these issues, we propose a contour-aware and saliency priors embedding net… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09262  [pdf, ps, other

    physics.chem-ph cs.AI cs.CV cs.LG

    EDBench: Large-Scale Electron Density Data for Molecular Modeling

    Authors: Hongxin Xiang, Ke Li, Mingquan Liu, Zhixiang Cheng, Bin Yao, Wenjie Du, Jun Xia, Li Zeng, Xin Jin, Xiangxiang Zeng

    Abstract: Existing molecular machine learning force fields (MLFFs) generally focus on the learning of atoms, molecules, and simple quantum chemical properties (such as energy and force), but ignore the importance of electron density (ED) $ρ(r)$ in accurately understanding molecular force fields (MFFs). ED describes the probability of finding electrons at specific locations around atoms or molecules, which u… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.09107  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.SR cs.DC

    Architecture of Tianyu Software: Relative Photometry as a Case Study

    Authors: Yicheng Rui, Yifan Xuan, Shuyue Zheng, Kexin Li, Kaiming Cui, Kai Xiao, Jie Zheng, Jun Kai Ng, Hongxuan Jiang, Fabo Feng, Qinghui Sun

    Abstract: Tianyu telescope, an one-meter robotic optical survey instrument to be constructed in Lenghu, Qinghai, China, is designed for detecting transiting exoplanets, variable stars and transients. It requires a highly automated, optimally distributed, easily extendable, and highly flexible software to enable the data processing for the raw data at rates exceeding 500MB/s. In this work, we introduce the a… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: 18 pages, 10 figures, 6 tables, accepted for publication in PASP

  6. arXiv:2505.08182  [pdf

    cs.IT

    Semantic De-boosting in e-commerce Query Autocomplete

    Authors: Adithya Rajan, Weiqi Tong, Greg Sharp, Prateek Verma, Kevin Li

    Abstract: In ecommerce search, query autocomplete plays a critical role to help users in their shopping journey. Often times, query autocomplete presents users with semantically similar queries, which can impede the user's ability to find diverse and relevant results. This paper proposes a novel strategy to enhance this service by refining the presentation of typeahead suggestions based on their semantic si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.07622  [pdf, ps, other

    cs.CV

    A Unified Hierarchical Framework for Fine-grained Cross-view Geo-localization over Large-scale Scenarios

    Authors: Zhuo Song, Ye Zhang, Kunhong Li, Longguang Wang, Yulan Guo

    Abstract: Cross-view geo-localization is a promising solution for large-scale localization problems, requiring the sequential execution of retrieval and metric localization tasks to achieve fine-grained predictions. However, existing methods typically focus on designing standalone models for these two tasks, resulting in inefficient collaboration and increased training overhead. In this paper, we propose Un… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  9. arXiv:2505.05829  [pdf, other

    cs.CV cs.LG eess.IV

    Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

    Authors: Zhiyuan Chen, Keyi Li, Yifan Jia, Le Ye, Yufei Ma

    Abstract: Diffusion transformer (DiT) models have achieved remarkable success in image generation, thanks for their exceptional generative capabilities and scalability. Nonetheless, the iterative nature of diffusion models (DMs) results in high computation complexity, posing challenges for deployment. Although existing cache-based acceleration methods try to utilize the inherent temporal similarity to skip… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: accepted by CVPR2025

  10. arXiv:2505.05605  [pdf, ps, other

    cs.LG cs.CE cs.IR stat.AP

    The Evolution of Embedding Table Optimization and Multi-Epoch Training in Pinterest Ads Conversion

    Authors: Andrew Qiu, Shubham Barhate, Hin Wai Lui, Runze Su, Rafael Rios Müller, Kungang Li, Ling Leng, Han Sun, Shayan Ehsani, Zhifang Liu

    Abstract: Deep learning for conversion prediction has found widespread applications in online advertising. These models have become more complex as they are trained to jointly predict multiple objectives such as click, add-to-cart, checkout and other conversion types. Additionally, the capacity and performance of these models can often be increased with the use of embedding tables that encode high cardinali… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    ACM Class: F.2.2, I.2.7

  11. arXiv:2505.04741  [pdf, other

    cs.LG cs.AI cs.CL

    When Bad Data Leads to Good Models

    Authors: Kenneth Li, Yida Chen, Fernanda Viégas, Martin Wattenberg

    Abstract: In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy e… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  12. arXiv:2505.04638  [pdf, ps, other

    cs.AI cs.CL cs.IR

    Towards Artificial Intelligence Research Assistant for Expert-Involved Learning

    Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao

    Abstract: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research, yet their reliability and specific contributions to biomedical applications remain insufficiently characterized. In this study, we present \textbf{AR}tificial \textbf{I}ntelligence research assistant for \textbf{E}xpert-involved \textbf{L}earning (ARIEL), a multimodal datas… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 36 pages, 7 figures

  13. arXiv:2505.03739  [pdf, other

    cs.CL cs.AI

    VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

    Authors: Zuwei Long, Yunhang Shen, Chaoyou Fu, Heting Gao, Lijiang Li, Peixian Chen, Mengdan Zhang, Hang Shao, Jian Li, Jinlong Peng, Haoyu Cao, Ke Li, Rongrong Ji, Xing Sun

    Abstract: With the growing requirement for natural human-computer interaction, speech-based systems receive increasing attention as speech is one of the most common forms of daily communication. However, the existing speech models still experience high latency when generating the first audio token during streaming, which poses a significant bottleneck for deployment. To address this issue, we propose VITA-A… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Training and Inference Codes: https://github.com/VITA-MLLM/VITA-Audio

  14. arXiv:2505.03654  [pdf, other

    cs.CV cs.AI

    ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

    Authors: Yifan Xiang, Zhenxi Zhang, Bin Li, Yixuan Weng, Shoujun Zhou, Yangfan He, Keqin Li

    Abstract: Recent advances in personalized MLLMs enable effective capture of user-specific concepts, supporting both recognition of personalized concepts and contextual captioning. However, humans typically explore and reason over relations among objects and individuals, transcending surface-level information to achieve more personalized and contextual understanding. To this end, existing methods may face th… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Work in progress

  15. arXiv:2505.02016  [pdf, ps, other

    cs.AR

    ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA

    Authors: Zhengyuan Shi, Zeju Li, Chengyu Ma, Yunhao Zhou, Ziyang Zheng, Jiawei Liu, Hongyang Pan, Lingfeng Zhou, Kezhi Li, Jiaying Zhu, Lingwei Yan, Zhiqiang He, Chenhao Xue, Wentao Jiang, Fan Yang, Guangyu Sun, Xiaoyan Yang, Gang Chen, Chuan Shi, Zhufei Chu, Jun Yang, Qiang Xu

    Abstract: We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  16. arXiv:2505.01113  [pdf, other

    cs.RO cs.CV cs.NE

    NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

    Authors: Xun Li, Jian Yang, Fenli Jia, Muyu Wang, Qi Wu, Jun Wu, Jinpeng Mi, Jilin Hu, Peidong Liang, Xuan Tang, Ke Li, Xiong You, Xian Wei

    Abstract: Recently, camera localization has been widely adopted in autonomous robotic navigation due to its efficiency and convenience. However, autonomous navigation in unknown environments often suffers from scene ambiguity, environmental disturbances, and dynamic object transformation in camera localization. To address this problem, inspired by the biological brain navigation mechanism (such as grid cell… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  17. arXiv:2505.00976  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    Attack and defense techniques in large language models: A survey and new perspectives

    Authors: Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng Chen, Xingwang Huang, Yuanhui Yu

    Abstract: Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present significant security and ethical challenges. This systematic survey explores the evolving landscape of attack and defense techniques in LLMs. We classify attacks into adversarial prompt attack, optimized attacks, model theft, as well as attacks on application of LLMs, d… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  18. arXiv:2505.00032  [pdf

    cs.CL cs.AI

    MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis

    Authors: Yuyang Sha, Hongxin Pan, Wei Xu, Weiyu Meng, Gang Luo, Xinyu Du, Xiaobing Zhai, Henry H. Y. Tong, Caijuan Shi, Kefeng Li

    Abstract: Major depressive disorder (MDD) impacts more than 300 million people worldwide, highlighting a significant public health issue. However, the uneven distribution of medical resources and the complexity of diagnostic methods have resulted in inadequate attention to this disorder in numerous countries and regions. This paper introduces a high-performance MDD diagnosis tool named MDD-LLM, an AI-driven… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  19. arXiv:2504.21596  [pdf, other

    cs.RO cs.AI

    Leveraging Pre-trained Large Language Models with Refined Prompting for Online Task and Motion Planning

    Authors: Huihui Guo, Huilong Pi, Yunchuan Qin, Zhuo Tang, Kenli Li

    Abstract: With the rapid advancement of artificial intelligence, there is an increasing demand for intelligent robots capable of assisting humans in daily tasks and performing complex operations. Such robots not only require task planning capabilities but must also execute tasks with stability and robustness. In this paper, we present a closed-loop task planning and acting system, LLM-PAS, which is assisted… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  20. arXiv:2504.20536  [pdf, other

    cs.CR

    Starfish: Rebalancing Multi-Party Off-Chain Payment Channels

    Authors: Minghui Xu, Wenxuan Yu, Guangyong Shang, Guangpeng Qi, Dongliang Duan, Shan Wang, Kun Li, Yue Zhang, Xiuzhen Cheng

    Abstract: Blockchain technology has revolutionized the way transactions are executed, but scalability remains a major challenge. Payment Channel Network (PCN), as a Layer-2 scaling solution, has been proposed to address this issue. However, skewed payments can deplete the balance of one party within a channel, restricting the ability of PCNs to transact through a path and subsequently reducing the transacti… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 17 pages, 10 figures

  21. arXiv:2504.20097  [pdf, other

    cs.CV quant-ph

    Long-Distance Field Demonstration of Imaging-Free Drone Identification in Intracity Environments

    Authors: Junran Guo, Tonglin Mu, Keyuan Li, Jianing Li, Ziyang Luo, Ye Chen, Xiaodong Fan, Jinquan Huang, Minjie Liu, Jinbei Zhang, Ruoyang Qi, Naiting Gu, Shihai Sun

    Abstract: Detecting small objects, such as drones, over long distances presents a significant challenge with broad implications for security, surveillance, environmental monitoring, and autonomous systems. Traditional imaging-based methods rely on high-resolution image acquisition, but are often constrained by range, power consumption, and cost. In contrast, data-driven single-photon-single-pixel light dete… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: 15 pages, 9 figures

  22. arXiv:2504.19759  [pdf, other

    cs.CL

    Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs

    Authors: Huichi Zhou, Zehao Xu, Munan Zhao, Kaihong Li, Yiqiang Li, Hongtao Wang

    Abstract: In this paper, we introduce the Multilingual Moral Reasoning Benchmark (MMRB) to evaluate the moral reasoning abilities of large language models (LLMs) across five typologically diverse languages and three levels of contextual complexity: sentence, paragraph, and document. Our results show moral reasoning performance degrades with increasing context complexity, particularly for low-resource langua… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 5 pages, 2 figures

  23. arXiv:2504.18323  [pdf, other

    math.NA cs.CV cs.LG

    Outlier-aware Tensor Robust Principal Component Analysis with Self-guided Data Augmentation

    Authors: Yangyang Xu, Kexin Li, Li Yang, You-Wei Wen

    Abstract: Tensor Robust Principal Component Analysis (TRPCA) is a fundamental technique for decomposing multi-dimensional data into a low-rank tensor and an outlier tensor, yet existing methods relying on sparse outlier assumptions often fail under structured corruptions. In this paper, we propose a self-guided data augmentation approach that employs adaptive weighting to suppress outlier influence, reformu… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 12 pages, 6 figures, 3 tables

    MSC Class: 65K10; 15A69 ACM Class: I.4.5; G.1.6

  24. arXiv:2504.18204  [pdf, ps, other

    cs.CV

    Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding

    Authors: Kun Li, Jianhui Wang, Yangfan He, Xinyuan Song, Ruoyu Wang, Hongyang He, Wenxin Zhang, Jiaqi Chen, Keqin Li, Sida Li, Miao Zhang, Tianyu Shi, Xueqian Wang

    Abstract: Generative AI has significantly changed industries by enabling text-driven image generation, yet challenges remain in achieving high-resolution outputs that align with fine-grained user preferences. Consequently, multi-round interactions are necessary to ensure the generated images meet expectations. Previous methods enhanced prompts via reward feedback but did not optimize over a multi-round dial… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2503.17660

  25. arXiv:2504.17789  [pdf, other

    cs.CV

    Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

    Authors: Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu

    Abstract: Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a n… ▽ More

    Submitted 27 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: Project Page: https://ma-xu.github.io/token-shuffle/ Add related works

  26. arXiv:2504.16261  [pdf, other

    cs.CE

    Accurate and generalizable protein-ligand binding affinity prediction with geometric deep learning

    Authors: Krinos Li, Xianglu Xiao, Zijun Zhong, Guang Yang

    Abstract: Protein-ligand binding complexes are ubiquitous and essential to life. Protein-ligand binding affinity prediction (PLA) quantifies the binding strength between ligands and proteins, providing crucial insights for discovering and designing potential candidate ligands. While recent advances have been made in predicting protein-ligand complex structures, existing algorithms for interaction and affini… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 6 pages, 5 figures

  27. arXiv:2504.16016  [pdf, ps, other

    cs.CV

    Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework

    Authors: Xinyuan Song, Yangfan He, Sida Li, Jianhui Wang, Hongyang He, Xinhang Yuan, Ruoyu Wang, Jiaqi Chen, Keqin Li, Kuan Lu, Menghao Huo, Binxu Li, Pei Liu

    Abstract: Adapter-based methods are commonly used to enhance model performance with minimal additional complexity, especially in video editing tasks that require frame-to-frame consistency. By inserting small, learnable modules into pretrained diffusion models, these adapters can maintain temporal coherence without extensive retraining. Approaches that incorporate prompt learning with both shared and frame-… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2501.04606

  28. arXiv:2504.14868  [pdf, ps, other

    cs.CV

    Twin Co-Adaptive Dialogue for Progressive Image Generation

    Authors: Jianhui Wang, Yangfan He, Yan Zhong, Xinyuan Song, Jiayi Su, Yuheng Feng, Hongyang He, Wenyu Zhu, Xinhang Yuan, Kuan Lu, Menghao Huo, Miao Zhang, Keqin Li, Jiaqi Chen, Tianyu Shi, Xueqian Wang

    Abstract: Modern text-to-image generation systems have enabled the creation of remarkably realistic and high-quality visuals, yet they often falter when handling the inherent ambiguities in user prompts. In this work, we present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation. Instead of a static generation process, Twin-Co employs a dynamic, i… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  29. arXiv:2504.13936  [pdf, other

    cs.HC cs.LG eess.SY

    ViMo: A Generative Visual GUI World Model for App Agent

    Authors: Dezhao Luo, Bohan Tang, Kang Li, Georgios Papoudakis, Jifei Song, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao

    Abstract: App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effectiv… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  30. arXiv:2504.12276  [pdf, other

    cs.CV

    The Tenth NTIRE 2025 Image Denoising Challenge Report

    Authors: Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Jingyu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi , et al. (69 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  31. arXiv:2504.11580  [pdf, other

    cs.RO

    RESPLE: Recursive Spline Estimation for LiDAR-Based Odometry

    Authors: Ziyu Cao, William Talbot, Kailai Li

    Abstract: We present a novel recursive Bayesian estimation framework for continuous-time six-DoF dynamic motion estimation using B-splines. The state vector consists of a recurrent set of position control points and orientation control point increments, enabling a straightforward modification of the iterated extended Kalman filter without involving the error-state formulation. The resulting recursive spline… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  32. arXiv:2504.11264  [pdf, other

    cs.LG cs.AI

    DeepSelective: Feature Gating and Representation Matching for Interpretable Clinical Prediction

    Authors: Ruochi Zhang, Qian Yang, Xiaoyang Wang, Haoran Wu, Qiong Zhou, Yu Wang, Kewei Li, Yueying Wang, Yusi Fan, Jiale Zhang, Lan Huang, Chang Liu, Fengfeng Zhou

    Abstract: The rapid accumulation of Electronic Health Records (EHRs) has transformed healthcare by providing valuable data that enhance clinical predictions and diagnoses. While conventional machine learning models have proven effective, they often lack robust representation learning and depend heavily on expert-crafted features. Although deep learning offers powerful solutions, it is often criticized for i… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  33. arXiv:2504.11186  [pdf

    cs.CL cs.AI

    Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items

    Authors: Minjie Zou, Sahana Srinivasan, Thaddaeus Wai Soon Lo, Ke Zou, Gabriel Dawei Yang, Xuguang Ai, Hyunjae Kim, Maxwell Singer, Fares Antaki, Kelvin Li, Robert Chang, Marcus Tan, David Ziyou Chen, Dianbo Liu, Qingyu Chen, Yih Chung Tham

    Abstract: Recent advances in reasoning-focused large language models (LLMs) mark a shift from general LLMs toward models designed for complex decision-making, a crucial aspect in medicine. However, their performance in specialized domains like ophthalmology remains underexplored. This study comprehensively evaluated and compared the accuracy and reasoning capabilities of four newly developed reasoning-focus… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 83 pages, 6 figures, 3 tables, 9 supplementary figures, 7 supplementary tables

  34. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  35. arXiv:2504.10067  [pdf, other

    cs.LG

    Undermining Federated Learning Accuracy in EdgeIoT via Variational Graph Auto-Encoders

    Authors: Kai Li, Shuyan Hu, Bochun Wu, Sai Zou, Wei Ni, Falko Dressler

    Abstract: EdgeIoT represents an approach that brings together mobile edge computing with Internet of Things (IoT) devices, allowing for data processing close to the data source. Sending source data to a server is bandwidth-intensive and may compromise privacy. Instead, federated learning allows each device to upload a shared machine-learning model update with locally processed data. However, this technique,… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 7 pages and 6 figures. Accepted in IEEE IWCMC 2025

  36. arXiv:2504.09644  [pdf, other

    cs.CV

    SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

    Authors: Kaiyu Li, Zepeng Xin, Li Pang, Chao Pang, Yupeng Deng, Jing Yao, Guisong Xia, Deyu Meng, Zhi Wang, Xiangyong Cao

    Abstract: Remote sensing has become critical for understanding environmental dynamics, urban planning, and disaster management. However, traditional remote sensing workflows often rely on explicit segmentation or detection methods, which struggle to handle complex, implicit queries that require reasoning over spatial context, domain knowledge, and implicit user intent. Motivated by this, we introduce a new… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  37. arXiv:2504.09621  [pdf, other

    cs.CV

    Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

    Authors: Jiuchen Chen, Xinyu Yan, Qizhi Xu, Kaiqi Li

    Abstract: Global contextual information and local detail features are essential for haze removal tasks. Deep learning models perform well on small, low-resolution images, but they encounter difficulties with large, high-resolution ones due to GPU memory limitations. As a compromise, they often resort to image slicing or downsampling. The former diminishes global information, while the latter discards high-f… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025

  38. arXiv:2504.08169  [pdf, other

    cs.LG cs.AI stat.AP stat.ML

    On the Practice of Deep Hierarchical Ensemble Network for Ad Conversion Rate Prediction

    Authors: Jinfeng Zhuang, Yinrui Li, Runze Su, Ke Xu, Zhixuan Shao, Kungang Li, Ling Leng, Han Sun, Meng Qi, Yixiong Meng, Yang Tang, Zhifang Liu, Qifei Shen, Aayush Mudgal, Caleb Lu, Jie Liu, Hongda Shen

    Abstract: The predictions of click through rate (CTR) and conversion rate (CVR) play a crucial role in the success of ad-recommendation systems. A Deep Hierarchical Ensemble Network (DHEN) has been proposed to integrate multiple feature crossing modules and has achieved great success in CTR prediction. However, its performance for CVR prediction is unclear in the conversion ads setting, where an ad bids for… ▽ More

    Submitted 23 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by WWW 2025

  39. arXiv:2504.07981  [pdf, other

    cs.CV cs.HC cs.MM

    ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

    Authors: Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, Tat-Seng Chua

    Abstract: Recent advancements in Multi-modal Large Language Models (MLLMs) have led to significant progress in developing GUI agents for general tasks such as web browsing and mobile phone use. However, their application in professional domains remains under-explored. These specialized workflows introduce unique challenges for GUI perception models, including high-resolution displays, smaller target sizes,… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 13pages

    MSC Class: 68-11 68-04 ACM Class: I.2.7; I.2.10

  40. arXiv:2504.06780  [pdf, ps, other

    cs.IR

    CHIME: A Compressive Framework for Holistic Interest Modeling

    Authors: Yong Bai, Rui Xiang, Kaiyuan Li, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, Kun Gai

    Abstract: Modeling holistic user interests is important for improving recommendation systems but is challenged by high computational cost and difficulty in handling diverse information with full behavior context. Existing search-based methods might lose critical signals during behavior selection. To overcome these limitations, we propose CHIME: A Compressive Framework for Holistic Interest Modeling. It uses… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  41. arXiv:2504.06636  [pdf, other

    cs.IR

    BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation

    Authors: Kaiyuan Li, Rui Xiang, Yong Bai, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, Kun Gai

    Abstract: Multi-modal sequential recommendation systems leverage auxiliary signals (e.g., text, images) to alleviate data sparsity in user-item interactions. While recent methods exploit large language models to encode modalities into discrete semantic IDs for autoregressive prediction, we identify two critical limitations: (1) Existing approaches adopt fragmented quantization, where modalities are independ… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  42. arXiv:2504.06256  [pdf, other

    cs.CV

    Transfer between Modalities with MetaQueries

    Authors: Xichen Pan, Satya Narayan Shukla, Aashu Singh, Zhuokai Zhao, Shlok Kumar Mishra, Jialiang Wang, Zhiyang Xu, Jiuhai Chen, Kunpeng Li, Felix Juefei-Xu, Ji Hou, Saining Xie

    Abstract: Unified multimodal models aim to integrate understanding (text output) and generation (pixel output), but aligning these different modalities within a single architecture often demands complex training recipes and careful data balancing. We introduce MetaQueries, a set of learnable queries that act as an efficient interface between autoregressive multimodal LLMs (MLLMs) and diffusion models. MetaQ… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Project Page: https://xichenpan.com/metaquery

  43. arXiv:2504.04540  [pdf, other

    cs.CV cs.AI

    The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?

    Authors: Weichen Zhang, Ruiying Peng, Chen Gao, Jianjie Fang, Xin Zeng, Kaiyuan Li, Ziyou Wang, Jinqiang Cui, Xin Wang, Xinlei Chen, Yong Li

    Abstract: 3D Large Language Models (LLMs) leveraging spatial information in point clouds for 3D spatial reasoning attract great attention. Despite some promising results, the role of point clouds in 3D spatial reasoning remains under-explored. In this work, we comprehensively evaluate and analyze these models to answer the research question: \textit{Does point cloud truly boost the spatial reasoning capacit… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  44. arXiv:2504.04061  [pdf, other

    cs.RO cs.AI

    Mapping at First Sense: A Lightweight Neural Network-Based Indoor Structures Prediction Method for Robot Autonomous Exploration

    Authors: Haojia Gao, Haohua Que, Kunrong Li, Weihao Shan, Mingkai Liu, Rong Zhao, Lei Mu, Xinghua Yang, Qi Wei, Fei Qiao

    Abstract: Autonomous exploration in unknown environments is a critical challenge in robotics, particularly for applications such as indoor navigation, search and rescue, and service robotics. Traditional exploration strategies, such as frontier-based methods, often struggle to efficiently utilize prior knowledge of structural regularities in indoor spaces. To address this limitation, we propose Mapping at F… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  45. arXiv:2504.03624  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  46. arXiv:2504.03563  [pdf, other

    cs.CV

    PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector

    Authors: Kaidong Li, Tianxiao Zhang, Kuan-Chuan Peng, Guanghui Wang

    Abstract: 3D object detection is crucial for autonomous driving, leveraging both LiDAR point clouds for precise depth information and camera images for rich semantic information. Therefore, the multi-modal methods that combine both modalities offer more robust detection results. However, efficiently fusing LiDAR points and images remains challenging due to the domain gaps. In addition, the performance of ma… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: This paper is accepted to the CVPR 2025 Workshop on Distillation of Foundation Models for Autonomous Driving (WDFM-AD)

  47. arXiv:2504.03128  [pdf, other

    cs.CV

    FontGuard: A Robust Font Watermarking Approach Leveraging Deep Font Knowledge

    Authors: Kahim Wong, Jicheng Zhou, Kemou Li, Yain-Whar Si, Xiaowei Wu, Jiantao Zhou

    Abstract: The proliferation of AI-generated content brings significant concerns on the forensic and security issues such as source tracing, copyright protection, etc, highlighting the need for effective watermarking technologies. Font-based text watermarking has emerged as an effective solution to embed information, which could ensure copyright, traceability, and compliance of the generated text content. Ex… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  48. arXiv:2504.01395  [pdf, other

    cs.CR cs.AI

    From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis

    Authors: Kecen Li, Chen Gong, Xiaochen Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang

    Abstract: Differentially private (DP) image synthesis aims to generate synthetic images from a sensitive dataset, alleviating the privacy leakage concerns of organizations sharing and utilizing synthetic images. Although previous methods have significantly progressed, especially in training diffusion models on sensitive images with DP Stochastic Gradient Descent (DP-SGD), they still suffer from unsatisfacto… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted at IEEE S&P (Oakland) 2025; code available at https://github.com/SunnierLee/DP-FETA

  49. arXiv:2504.01240  [pdf, other

    cs.CR cs.DC

    Towards Resilient Federated Learning in CyberEdge Networks: Recent Advances and Future Trends

    Authors: Kai Li, Zhengyang Zhang, Azadeh Pourkabirian, Wei Ni, Falko Dressler, Ozgur B. Akan

    Abstract: In this survey, we investigate the most recent techniques of resilient federated learning (ResFL) in CyberEdge networks, focusing on joint training with agglomerative deduction and feature-oriented security mechanisms. We explore adaptive hierarchical learning strategies to tackle non-IID data challenges, improving scalability and reducing communication overhead. Fault tolerance techniques and agg… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 15 pages, 8 figures, 4 tables, 122 references, journal paper

  50. arXiv:2504.00347  [pdf, other

    astro-ph.SR cs.LG

    Using machine learning method for variable star classification using the TESS Sectors 1-57 data

    Authors: Li-Heng Wang, Kai Li, Xiang Gao, Ya-Ni Guo, Guo-You Sun

    Abstract: The Transiting Exoplanet Survey Satellite (TESS) is a wide-field all-sky survey mission designed to detect Earth-sized exoplanets. After over four years photometric surveys, data from sectors 1-57, including approximately 1,050,000 light curves with a 2-minute cadence, were collected. By cross-matching the data with Gaia's variable star catalogue, we obtained labeled datasets for further analysis.… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 15pages, 12 figures, 3 tables, accepted by ApJ, Data available via China-VO PaperData repository