Skip to main content

Showing 1–50 of 5,660 results for author: Liu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10562  [pdf, ps, other

    cs.CV

    End-to-End Vision Tokenizer Tuning

    Authors: Wenxuan Wang, Fan Zhang, Yufeng Cui, Haiwen Diao, Zhuoyan Luo, Huchuan Lu, Jing Liu, Xinlong Wang

    Abstract: Existing vision tokenization isolates the optimization of vision tokenizers from downstream training, implicitly assuming the visual tokens can generalize well across various tasks, e.g., image generation and visual question answering. The vision tokenizer optimized for low-level reconstruction is agnostic to downstream tasks requiring varied representations and semantics. This decoupled paradigm… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09926  [pdf, ps, other

    cs.CV cs.AI

    AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

    Authors: Bin-Bin Gao, Yue Zhu, Jiangtao Yan, Yuezhi Cai, Weixi Zhang, Meng Wang, Jun Liu, Yong Liu, Lei Wang, Chengjie Wang

    Abstract: Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle with designing prompt templates, complex token… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 27 pages, 15 figures, 22 tables

  3. arXiv:2505.09887  [pdf, other

    cs.RO

    Unsupervised Radar Point Cloud Enhancement via Arbitrary LiDAR Guided Diffusion Prior

    Authors: Yanlong Yang, Jianan Liu, Guanxiong Luo, Hao Li, Euijoon Ahn, Mostafa Rahimi Azghadi, Tao Huang

    Abstract: In industrial automation, radar is a critical sensor in machine perception. However, the angular resolution of radar is inherently limited by the Rayleigh criterion, which depends on both the radar's operating wavelength and the effective aperture of its antenna array.To overcome these hardware-imposed limitations, recent neural network-based methods have leveraged high-resolution LiDAR data, pair… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 19 pages, 15 figures, 4 tables

  4. arXiv:2505.09662  [pdf

    cs.CL

    Large Language Models Are More Persuasive Than Incentivized Human Persuaders

    Authors: Philipp Schoenegger, Francesco Salvi, Jiacheng Liu, Xiaoli Nan, Ramit Debnath, Barbara Fasolo, Evelina Leivada, Gabriel Recchia, Fritz Günther, Ali Zarifhonarvar, Joe Kwon, Zahoor Ul Islam, Marco Dehnert, Daryl Y. H. Lee, Madeline G. Reinecke, David G. Kamper, Mert Kobaş, Adam Sandford, Jonas Kgomo, Luke Hewitt, Shreya Kapoor, Kerem Oktar, Eyup Engin Kucuk, Bo Feng, Cameron R. Jones , et al. (15 additional authors not shown)

    Abstract: We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz setting. In this preregistered, large-scale incentivized experiment, participants (quiz takers) completed an online quiz where persuaders (either humans or LLMs) attempted to persuade quiz takers toward co… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    ACM Class: I.2.7; H.1.2; K.4.1; H.5.2

  5. arXiv:2505.09651  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era

    Authors: Xixuan Hao, Yutian Jiang, Xingchen Zou, Jiabo Liu, Yifang Yin, Yuxuan Liang

    Abstract: Location Intelligence (LI), the science of transforming location-centric geospatial data into actionable knowledge, has become a cornerstone of modern spatial decision-making. The rapid evolution of Geospatial Representation Learning is fundamentally reshaping LI development through two successive technological revolutions: the deep learning breakthrough and the emerging large language model (LLM)… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.09329  [pdf, ps, other

    cs.CV cs.AI

    BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

    Authors: Jiarun Liu, Hong-Yu Zhou, Weijian Huang, Hao Yang, Dongning Song, Tao Tan, Yong Liang, Shanshan Wang

    Abstract: Scaling up model and data size have demonstrated impressive performance improvement over a wide range of tasks. Despite extensive studies on scaling behaviors for general-purpose tasks, medical images exhibit substantial differences from natural data. It remains unclear the key factors in developing medical vision foundation models at scale due to the absence of an extensive understanding of scali… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 figures

  7. arXiv:2505.09263  [pdf, ps, other

    cs.CV cs.AI

    Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

    Authors: Guan Gui, Bin-Bin Gao, Jun Liu, Chengjie Wang, Yunsheng Wu

    Abstract: Anomaly detection is a practical and challenging task due to the scarcity of anomaly samples in industrial inspection. Some existing anomaly detection methods address this issue by synthesizing anomalies with noise or external data. However, there is always a large semantic gap between synthetic and real-world anomalies, resulting in weak performance in anomaly detection. To solve the problem, we… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Accepted by ECCV 2024

  8. arXiv:2505.08548  [pdf, other

    cs.RO cs.AI cs.LG

    From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation

    Authors: Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, Jianye Hao

    Abstract: Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Early version

  9. arXiv:2505.08318  [pdf, ps, other

    cs.DB

    A Unified Model for Cardinality Estimation by Learning from Data and Queries via Sum-Product Networks

    Authors: Jiawei Liu, Ju Fan, Tongyu Liu, Kai Zeng, Jiannan Wang, Quehuan Liu, Tao Ye, Nan Tang

    Abstract: Cardinality estimation is a fundamental component in database systems, crucial for generating efficient execution plans. Despite advancements in learning-based cardinality estimation, existing methods may struggle to simultaneously optimize the key criteria: estimation accuracy, inference time, and storage overhead, limiting their practical applicability in real-world database environments. This p… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 17 pages, 8 figures

    ACM Class: H.2.4; E.5

  10. arXiv:2505.08292  [pdf, ps, other

    cs.CR

    On the Account Security Risks Posed by Password Strength Meters

    Authors: Ming Xu, Weili Han, Jitao Yu, Jing Liu, Xinyi Zhang, Yun Lin, Jin Song Dong

    Abstract: Password strength meters (PSMs) have been widely used by websites to gauge password strength, encouraging users to create stronger passwords. Popular data-driven PSMs, e.g., based on Markov, Probabilistic Context-free Grammar (PCFG) and neural networks, alarm strength based on a model learned from real passwords. Despite their proven effectiveness, the secure utility that arises from the leakage o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.07813  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

    Authors: Tony Tao, Mohan Kumar Srirama, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

    Abstract: Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data?… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: In RSS 2025. Website at https://dexwild.github.io

  12. arXiv:2505.07747  [pdf, other

    cs.CV

    Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets

    Authors: Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, Ping Tan

    Abstract: While generative artificial intelligence has advanced significantly across text, image, audio, and video domains, 3D generation remains comparatively underdeveloped due to fundamental challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an open framework addressing these challenges through: (1) a rigorous data curation pipeline… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Technical report

  13. arXiv:2505.07692  [pdf, other

    cs.DB

    ABase: the Multi-Tenant NoSQL Serverless Database for Diverse and Dynamic Workloads in Large-scale Cloud Environments

    Authors: Rong Kang, Yanbin Chen, Ye Liu, Fuxin Jiang, Qingshuo Li, Miao Ma, Jian Liu, Guangliang Zhao, Tieying Zhang, Jianjun Chen, Lei Zhang

    Abstract: Multi-tenant architectures enhance the elasticity and resource utilization of NoSQL databases by allowing multiple tenants to co-locate and share resources. However, in large-scale cloud environments, the diverse and dynamic nature of workloads poses significant challenges for multi-tenant NoSQL databases. Based on our practical observations, we have identified three crucial challenges: (1) the im… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: SIGMOD 2025 accepted

  14. arXiv:2505.07634  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

    Authors: Jian Liu, Xiongtao Shi, Thai Duy Nguyen, Haitian Zhang, Tianxiang Zhang, Wei Sun, Yanjie Li, Athanasios V. Vasilakos, Giovanni Iacca, Arshad Ali Khan, Arvind Kumar, Jae Won Cho, Ajmal Mian, Lihua Xie, Erik Cambria, Lin Wang

    Abstract: The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the ris… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 51 pages, 17 figures, 9 tables

  15. arXiv:2505.07257  [pdf, ps, other

    cs.IR

    DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward

    Authors: Yi Zhang, Ruihong Qiu, Xuwei Xu, Jiajun Liu, Sen Wang

    Abstract: Model-based offline reinforcement learning (RL) has emerged as a promising approach for recommender systems, enabling effective policy learning by interacting with frozen world models. However, the reward functions in these world models, trained on sparse offline logs, often suffer from inaccuracies. Specifically, existing methods face two major limitations in addressing this challenge: (1) determ… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: SIGIR 2025

  16. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.06875  [pdf, ps, other

    cs.RO

    Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating Large Language Model Guidance with Reinforcement Learning

    Authors: Chengkai Xu, Jiaqi Liu, Yicheng Guo, Yuhang Zhang, Peng Hang, Jian Sun

    Abstract: Autonomous driving has made significant strides through data-driven techniques, achieving robust performance in standardized tasks. However, existing methods frequently overlook user-specific preferences, offering limited scope for interaction and adaptation with users. To address these challenges, we propose a "fast-slow" decision-making framework that integrates a Large Language Model (LLM) for… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  18. arXiv:2505.06827  [pdf, ps, other

    cs.CR cs.AI

    Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking

    Authors: Fabrice Y Harel-Canada, Boran Erol, Connor Choi, Jason Liu, Gary Jiarui Song, Nanyun Peng, Amit Sahai

    Abstract: Watermarking AI-generated text is critical for combating misuse. Yet recent theoretical work argues that any watermark can be erased via random walk attacks that perturb text while preserving quality. However, such attacks rely on two key assumptions: (1) rapid mixing (watermarks dissolve quickly under perturbations) and (2) reliable quality preservation (automated quality oracles perfectly guide… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: In Review @ ACL 2025

  19. arXiv:2505.06771  [pdf, ps, other

    cs.RO cs.LG cs.MA

    JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes

    Authors: Shalin Anand Jain, Jiazhen Liu, Siva Kailas, Harish Ravichandar

    Abstract: Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluat… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 22 pages, 14 figures, 10 tables

  20. arXiv:2505.06679  [pdf, other

    cs.CV

    Jailbreaking the Text-to-Video Generative Models

    Authors: Jiayang Liu, Siyuan Liang, Shiqian Zhao, Rongcheng Tu, Wenbo Zhou, Xiaochun Cao, Dacheng Tao, Siew Kei Lam

    Abstract: Text-to-video generative models have achieved significant progress, driven by the rapid advancements in diffusion models, with notable examples including Pika, Luma, Kling, and Sora. Despite their remarkable generation ability, their vulnerability to jailbreak attack, i.e. to generate unsafe content, including pornography, violence, and discrimination, raises serious safety concerns. Existing effo… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  21. arXiv:2505.06637  [pdf, other

    cs.AI

    Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers

    Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C. H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric

    Abstract: Wild salmon are essential to the ecological, economic, and cultural sustainability of the North Pacific Rim. Yet climate variability, habitat loss, and data limitations in remote ecosystems that lack basic infrastructure support pose significant challenges to effective fisheries management. This project explores the integration of multimodal foundation AI and expert-in-the-loop frameworks to enhan… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 10 pages, accepted by IJCAI 2025, AI and Social Good Track

  22. arXiv:2505.06371  [pdf, ps, other

    cs.LG cs.AI

    The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

    Authors: Jae-Won Chung, Jiachen Liu, Jeff J. Ma, Ruofan Wu, Oh Jun Kweon, Yuxuan Xia, Zhiyu Wu, Mosharaf Chowdhury

    Abstract: As the adoption of Generative AI in real-world services grow explosively, energy has emerged as a critical bottleneck resource. However, energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems. We present the ML.ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environ… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Leaderboard: https://ml.energy/leaderboard

  23. arXiv:2505.05803  [pdf

    cs.LG

    A novel Neural-ODE model for the state of health estimation of lithium-ion battery using charging curve

    Authors: Yiming Li, Man He, Jiapeng Liu

    Abstract: The state of health (SOH) of lithium-ion batteries (LIBs) is crucial for ensuring the safe and reliable operation of electric vehicles. Nevertheless, the prevailing SOH estimation methods often have limited generalizability. This paper introduces a data-driven approach for estimating the SOH of LIBs, which is designed to improve generalization. We construct a hybrid model named ACLA, which integra… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 28 pages, 6 figures

  24. arXiv:2505.05470  [pdf, other

    cs.CV cs.AI

    Flow-GRPO: Training Flow Matching Models via Online RL

    Authors: Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang

    Abstract: We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistica… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/yifan123/flow_grpo

  25. arXiv:2505.04880  [pdf, other

    quant-ph cs.AI cs.LG

    GroverGPT-2: Simulating Grover's Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization

    Authors: Min Chen, Jinglei Cheng, Pingzhi Li, Haoran Wang, Tianlong Chen, Junyu Liu

    Abstract: Quantum computing offers theoretical advantages over classical computing for specific tasks, yet the boundary of practical quantum advantage remains an open question. To investigate this boundary, it is crucial to understand whether, and how, classical machines can learn and simulate quantum algorithms. Recent progress in large language models (LLMs) has demonstrated strong reasoning abilities, pr… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 26 pages, 12 figures

  26. arXiv:2505.04586  [pdf, other

    eess.IV cs.CV cs.LG

    Active Sampling for MRI-based Sequential Decision Making

    Authors: Yuning Du, Jingshuai Liu, Rohan Dharmakumar, Sotirios A. Tsaftaris

    Abstract: Despite the superior diagnostic capability of Magnetic Resonance Imaging (MRI), its use as a Point-of-Care (PoC) device remains limited by high cost and complexity. To enable such a future by reducing the magnetic field strength, one key approach will be to improve sampling strategies. Previous work has shown that it is possible to make diagnostic decisions directly from k-space with fewer samples… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Under Review

  27. arXiv:2505.03973  [pdf, other

    cs.CL

    Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale

    Authors: Jiale Liu, Yifan Zeng, Shaokun Zhang, Chi Zhang, Malte Højmark-Bertelsen, Marie Normann Gadeberg, Huazheng Wang, Qingyun Wu

    Abstract: LLM-based optimization has shown remarkable potential in enhancing agentic systems. However, the conventional approach of prompting LLM optimizer with the whole training trajectories on training dataset in a single pass becomes untenable as datasets grow, leading to context window overflow and degraded pattern recognition. To address these challenges, we propose Fine-Grained Optimization (FGO), a… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  28. arXiv:2505.03538  [pdf, other

    cs.CV

    RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT

    Authors: Chuyu Zhao, Hao Huang, Jiashuo Guo, Ziyu Shen, Zhongwei Zhou, Jie Liu, Zekuan Yu

    Abstract: Semi-supervised learning has become a compelling approach for 3D tooth segmentation from CBCT scans, where labeled data is minimal. However, existing methods still face two persistent challenges: limited corrective supervision in structurally ambiguous or mislabeled regions during supervised training and performance degradation caused by unreliable pseudo-labels on unlabeled data. To address these… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  29. arXiv:2505.03171  [pdf, other

    cs.AI

    CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

    Authors: Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu

    Abstract: Neurosymbolic approaches integrating large language models with formal reasoning have recently achieved human-level performance on mathematics competition problems in algebra, geometry and number theory. In comparison, combinatorics remains a challenging domain, characterized by a lack of appropriate benchmarks and theorem libraries. To address this gap, we introduce CombiBench, a comprehensive be… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  30. arXiv:2505.03007  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results

    Authors: Nikolay Safonov, Alexey Bryncev, Andrey Moskalenko, Dmitry Kulikov, Dmitry Vatolin, Radu Timofte, Haibo Lei, Qifan Gao, Qing Luo, Yaqing Li, Jie Song, Shaozhe Hao, Meisong Zheng, Jingyi Xu, Chengbin Wu, Jiahui Liu, Ying Chen, Xin Deng, Mai Xu, Peipei Liang, Jie Ma, Junjie Jin, Yingxue Pang, Fangzhou Luo, Kai Chen , et al. (6 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Challenge on UGC Video Enhancement. The challenge constructed a set of 150 user-generated content videos without reference ground truth, which suffer from real-world degradations such as noise, blur, faded colors, compression artifacts, etc. The goal of the participants was to develop an algorithm capable of improving the visual quality of such vid… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  31. arXiv:2505.02922  [pdf, ps, other

    cs.LG

    RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

    Authors: Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

    Abstract: The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages

  32. arXiv:2505.02863  [pdf

    cs.CY cs.AI

    Understanding University Students' Use of Generative AI: The Roles of Demographics and Personality Traits

    Authors: Newnew Deng, Edward Jiusi Liu, Xiaoming Zhai

    Abstract: The use of generative AI (GAI) among university students is rapidly increasing, yet empirical research on students' GAI use and the factors influencing it remains limited. To address this gap, we surveyed 363 undergraduate and graduate students in the United States, examining their GAI usage and how it relates to demographic variables and personality traits based on the Big Five model (i.e., extra… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  33. arXiv:2505.02516  [pdf, other

    cs.AI cs.AR cs.LG eess.SP q-bio.NC

    Machine-Learning-Powered Neural Interfaces for Smart Prosthetics and Diagnostics

    Authors: MohammadAli Shaeri, Jinhan Liu, Mahsa Shoaran

    Abstract: Advanced neural interfaces are transforming applications ranging from neuroscience research to diagnostic tools (for mental state recognition, tremor and seizure detection) as well as prosthetic devices (for motor and communication recovery). By integrating complex functions into miniaturized neural devices, these systems unlock significant opportunities for personalized assistive technologies and… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: To appear in the 2025 IEEE International NEWCAS Conference (NEWCAS'25)

    ACM Class: I.2.0; B.7.0; I.5.1; C.3

  34. arXiv:2505.02166  [pdf, other

    cs.RO

    CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

    Authors: Xiaoqi Li, Lingyun Xu, Mingxu Zhang, Jiaming Liu, Yan Shen, Iaroslav Ponomarenko, Jiahui Xu, Liang Heng, Siyuan Huang, Shanghang Zhang, Hao Dong

    Abstract: In robotic, task goals can be conveyed through various modalities, such as language, goal images, and goal videos. However, natural language can be ambiguous, while images or videos may offer overly detailed specifications. To tackle these challenges, we introduce CrayonRobo that leverages comprehensive multi-modal prompts that explicitly convey both low-level actions and high-level planning in a… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  35. arXiv:2505.02027  [pdf, ps, other

    cs.LG cs.AI cs.SI

    GraphPrompter: Multi-stage Adaptive Prompt Optimization for Graph In-Context Learning

    Authors: Rui Lv, Zaixi Zhang, Kai Zhang, Qi Liu, Weibo Gao, Jiawei Liu, Jiaxia Yan, Linan Yue, Fangzhou Yao

    Abstract: Graph In-Context Learning, with the ability to adapt pre-trained graph models to novel and diverse downstream graphs without updating any parameters, has gained much attention in the community. The key to graph in-context learning is to perform downstream graphs conditioned on chosen prompt examples. Existing methods randomly select subgraphs or edges as prompts, leading to noisy graph prompts and… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 14 pages. IEEE International Conference on Data Engineering (ICDE'2025), accepted

  36. arXiv:2505.02016  [pdf, ps, other

    cs.AR

    ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA

    Authors: Zhengyuan Shi, Zeju Li, Chengyu Ma, Yunhao Zhou, Ziyang Zheng, Jiawei Liu, Hongyang Pan, Lingfeng Zhou, Kezhi Li, Jiaying Zhu, Lingwei Yan, Zhiqiang He, Chenhao Xue, Wentao Jiang, Fan Yang, Guangyu Sun, Xiaoyan Yang, Gang Chen, Chuan Shi, Zhufei Chu, Jun Yang, Qiang Xu

    Abstract: We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  37. arXiv:2505.01821  [pdf, ps, other

    cs.DC cs.AI cs.LG

    Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey

    Authors: Jing Liu, Yao Du, Kun Yang, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C. M. Leung

    Abstract: Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications, integrating cloud resources with edge devices to enable efficient, low-latency processing. Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed sys… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 30 pages, 10figures, 6 tables

  38. arXiv:2505.01809  [pdf, other

    cs.CV

    3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment

    Authors: Xiaoqi Li, Jiaming Liu, Nuowei Han, Liang Heng, Yandong Guo, Hao Dong, Yang Liu

    Abstract: The 3D weakly-supervised visual grounding task aims to localize oriented 3D boxes in point clouds based on natural language descriptions without requiring annotations to guide model learning. This setting presents two primary challenges: category-level ambiguity and instance-level complexity. Category-level ambiguity arises from representing objects of fine-grained categories in a highly sparse po… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: ICRA 2025

  39. arXiv:2505.01726  [pdf, other

    cs.CV

    Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes

    Authors: Jie Liu, Pan Zhou, Zehao Xiao, Jiayi Shen, Wenzhe Yin, Jan-Jakob Sonke, Efstratios Gavves

    Abstract: Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentation, and (2) quantifying predictive uncertainty to help users identify unreliable regions. In this wor… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: ICML 2025 Proceedings

  40. arXiv:2505.01700  [pdf, other

    cs.LG q-bio.QM

    PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking

    Authors: Yize Jiang, Xinze Li, Yuanyuan Zhang, Jin Han, Youjun Xu, Ayush Pandit, Zaixi Zhang, Mengdi Wang, Mengyang Wang, Chong Liu, Guang Yang, Yejin Choi, Wu-Jun Li, Tianfan Fu, Fang Wu, Junhong Liu

    Abstract: Recently, significant progress has been made in protein-ligand docking, especially in modern deep learning methods, and some benchmarks were proposed, e.g., PoseBench, Plinder. However, these benchmarks suffer from less practical evaluation setups (e.g., blind docking, self docking), or heavy framework that involves training, raising challenges to assess docking methods efficiently. To fill this g… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  41. arXiv:2505.01182  [pdf, other

    cs.CV cs.AI

    TSTMotion: Training-free Scene-aware Text-to-motion Generation

    Authors: Ziyan Guo, Haoxuan Qu, Hossein Rahmani, Dewen Soh, Ping Hu, Qiuhong Ke, Jun Liu

    Abstract: Text-to-motion generation has recently garnered significant research interest, primarily focusing on generating human motion sequences in blank backgrounds. However, human motions commonly occur within diverse 3D scenes, which has prompted exploration into scene-aware text-to-motion generation methods. Yet, existing scene-aware methods often rely on large-scale ground-truth motion sequences in div… ▽ More

    Submitted 5 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: Accepted by ICME2025

  42. arXiv:2505.00426  [pdf, other

    cs.CV

    Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly

    Authors: Ruiyuan Zhang, Qi Wang, Jiaxiang Liu, Yu Zhang, Yuchi Huo, Chao Wu

    Abstract: 3D part assembly aims to understand part relationships and predict their 6-DoF poses to construct realistic 3D shapes, addressing the growing demand for autonomous assembly, which is crucial for robots. Existing methods mainly estimate the transformation of each part by training neural networks under supervision, which requires a substantial quantity of manually labeled data. However, the high cos… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 10 pages, 12 figures, Accepted by IJCAI-2025

    Journal ref: IJCAI 2025

  43. arXiv:2505.00212  [pdf, ps, other

    cs.MA cs.CL

    Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

    Authors: Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, Qingyun Wu

    Abstract: Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extens… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  44. arXiv:2504.21495  [pdf

    cs.CV cs.MM

    Consistency-aware Fake Videos Detection on Short Video Platforms

    Authors: Junxi Wang, Jize liu, Na Zhang, Yaxiong Wang

    Abstract: This paper focuses to detect the fake news on the short video platforms. While significant research efforts have been devoted to this task with notable progress in recent years, current detection accuracy remains suboptimal due to the rapid evolution of content manipulation and generation technologies. Existing approaches typically employ a cross-modal fusion strategy that directly combines raw vi… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 2025 icic

  45. arXiv:2504.21472  [pdf, other

    cs.CV

    Robust Orthogonal NMF with Label Propagation for Image Clustering

    Authors: Jingjing Liu, Nian Wu, Xianchao Xiu, Jianhua Zhang

    Abstract: Non-negative matrix factorization (NMF) is a popular unsupervised learning approach widely used in image clustering. However, in real-world clustering scenarios, most existing NMF methods are highly sensitive to noise corruption and are unable to effectively leverage limited supervised information. To overcome these drawbacks, we propose a unified non-convex framework with label propagation called… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  46. arXiv:2504.21444  [pdf, other

    cs.NI

    A Unified QoS-Aware Multiplexing Framework for Next Generation Immersive Communication with Legacy Wireless Applications

    Authors: Jihong Li, Shunqing Zhang, Tao Yu, Guangjin Pan, Kaixuan Huang, Xiaojing Chen, Yanzan Sun, Junyu Liu, Jiandong Li, Derrick Wing Kwan Ng

    Abstract: Immersive communication, including emerging augmented reality, virtual reality, and holographic telepresence, has been identified as a key service for enabling next-generation wireless applications. To align with legacy wireless applications, such as enhanced mobile broadband or ultra-reliable low-latency communication, network slicing has been widely adopted. However, attempting to statistically… ▽ More

    Submitted 2 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  47. arXiv:2504.21230  [pdf, other

    cs.LO

    Kimina Lean Server: Technical Report

    Authors: Marco Dos Santos, Haiming Wang, Hugues de Saxcé, Ran Wang, Mantas Baksys, Mert Unsal, Junqi Liu, Zhengying Liu, Jia Li

    Abstract: We introduce the Kimina Lean Server, an open-source project that enables fast and scalable interaction with Lean 4 via a unified REST API, designed as a simple verifier for reinforcement learning pipelines. Built on top of the Lean FRO's LeanREPL, it combines server-side parallelization by managing multiple Lean REPL processes in parallel, with an LRU caching strategy that reuses Lean imports acro… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  48. arXiv:2504.21226  [pdf, other

    cs.CV cs.AI

    MemeBLIP2: A novel lightweight multimodal system to detect harmful memes

    Authors: Jiaqi Liu, Ran Tong, Aowei Shen, Shuzheng Li, Changlin Yang, Lisha Xu

    Abstract: Memes often merge visuals with brief text to share humor or opinions, yet some memes contain harmful messages such as hate speech. In this paper, we introduces MemeBLIP2, a light weight multimodal system that detects harmful memes by combining image and text features effectively. We build on previous studies by adding modules that align image and text representations into a shared space and fuse t… ▽ More

    Submitted 6 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: 11pages, 3 figures, manucripts in preparation

  49. arXiv:2504.20653  [pdf, other

    cs.SE eess.SY

    ComplexVCoder: An LLM-Driven Framework for Systematic Generation of Complex Verilog Code

    Authors: Jian Zuo, Junzhe Liu, Xianyong Wang, Yicheng Liu, Navya Goli, Tong Xu, Hao Zhang, Umamaheswara Rao Tida, Zhenge Jia, Mengying Zhao

    Abstract: Recent advances have demonstrated the promising capabilities of large language models (LLMs) in generating register-transfer level (RTL) code, such as Verilog. However, existing LLM-based frameworks still face significant challenges in accurately handling the complexity of real-world RTL designs, particularly those that are large-scale and involve multi-level module instantiations. To address this… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  50. arXiv:2504.20501  [pdf, ps, other

    eess.IV cs.CV

    SAM-Guided Robust Representation Learning for One-Shot 3D Medical Image Segmentation

    Authors: Jia Wang, Yunan Mei, Jiarui Liu, Xin Fan

    Abstract: One-shot medical image segmentation (MIS) is crucial for medical analysis due to the burden of medical experts on manual annotation. The recent emergence of the segment anything model (SAM) has demonstrated remarkable adaptation in MIS but cannot be directly applied to one-shot medical image segmentation (MIS) due to its reliance on labor-intensive user interactions and the high computational cost… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.