Skip to main content

Showing 1–50 of 3,727 results for author: li, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15675  [pdf, ps, other

    cs.CV cs.AI

    Sekai: A Video Dataset towards World Exploration

    Authors: Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang

    Abstract: Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 12 pages, 6 figures

  2. arXiv:2506.15665  [pdf, ps, other

    eess.SY cs.LG

    A Data-Integrated Framework for Learning Fractional-Order Nonlinear Dynamical Systems

    Authors: Bahram Yaghooti, Chengyu Li, Bruno Sinopoli

    Abstract: This paper presents a data-integrated framework for learning the dynamics of fractional-order nonlinear systems in both discrete-time and continuous-time settings. The proposed framework consists of two main steps. In the first step, input-output experiments are designed to generate the necessary datasets for learning the system dynamics, including the fractional order, the drift vector field, and… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  3. arXiv:2506.15522  [pdf, ps, other

    cs.CL

    Lessons from Training Grounded LLMs with Verifiable Rewards

    Authors: Shang Hong Sim, Tej Deep Pala, Vernon Toh, Hai Leong Chieu, Amir Zadeh, Chuan Li, Navonil Majumder, Soujanya Poria

    Abstract: Generating grounded and trustworthy responses remains a key challenge for large language models (LLMs). While retrieval-augmented generation (RAG) with citation-based grounding holds promise, instruction-tuned models frequently fail even in straightforward scenarios: missing explicitly stated answers, citing incorrectly, or refusing when evidence is available. In this work, we explore how reinforc… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  4. arXiv:2506.15477  [pdf, ps, other

    cs.CV

    Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning

    Authors: Chunlei Li, Jingyang Hou, Yilei Shi, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

    Abstract: Medical report generation from imaging data remains a challenging task in clinical practice. While large language models (LLMs) show great promise in addressing this challenge, their effective integration with medical imaging data still deserves in-depth exploration. In this paper, we present MRG-LLM, a novel multimodal large language model (MLLM) that combines a frozen LLM with a learnable visual… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  5. arXiv:2506.15289  [pdf, ps, other

    cs.LG math.OC

    DOVA-PATBM: An Intelligent, Adaptive, and Scalable Framework for Optimizing Large-Scale EV Charging Infrastructure

    Authors: Chuan Li, Shunyu Zhao, Vincent Gauthier, Hassine Moungla

    Abstract: The accelerating uptake of battery-electric vehicles demands infrastructure planning tools that are both data-rich and geographically scalable. Whereas most prior studies optimise charging locations for single cities, state-wide and national networks must reconcile the conflicting requirements of dense metropolitan cores, car-dependent exurbs, and power-constrained rural corridors. We present DO… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  6. arXiv:2506.15284  [pdf, ps, other

    cs.IR

    Multi-Interest Recommendation: A Survey

    Authors: Zihao Li, Qiang Chen, Lixin Zou, Aixin Sun, Chenliang Li

    Abstract: Existing recommendation methods often struggle to model users' multifaceted preferences due to the diversity and volatility of user behavior, as well as the inherent uncertainty and ambiguity of item attributes in practical scenarios. Multi-interest recommendation addresses this challenge by extracting multiple interest representations from users' historical interactions, enabling fine-grained pre… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  7. arXiv:2506.15120  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Advancing Loss Functions in Recommender Systems: A Comparative Study with a Rényi Divergence-Based Solution

    Authors: Shengjia Zhang, Jiawei Chen, Changdong Li, Sheng Zhou, Qihao Shi, Yan Feng, Chun Chen, Can Wang

    Abstract: Loss functions play a pivotal role in optimizing recommendation models. Among various loss functions, Softmax Loss (SL) and Cosine Contrastive Loss (CCL) are particularly effective. Their theoretical connections and differences warrant in-depth exploration. This work conducts comprehensive analyses of these losses, yielding significant insights: 1) Common strengths -- both can be viewed as augment… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: AAAI 2025

  8. arXiv:2506.15098  [pdf, ps, other

    cs.SE

    Enhancement Report Approval Prediction: A Comparative Study of Large Language Models

    Authors: Haosheng Zuo, Feifei Niu, Chuanyi Li

    Abstract: Enhancement reports (ERs) serve as a critical communication channel between users and developers, capturing valuable suggestions for software improvement. However, manually processing these reports is resource-intensive, leading to delays and potential loss of valuable insights. To address this challenge, enhancement report approval prediction (ERAP) has emerged as a research focus, leveraging mac… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  9. arXiv:2506.14707  [pdf, ps, other

    cs.DB

    HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search

    Authors: Qian Xu, Feng Zhang, Chengxi Li, Lei Cao, Zheng Chen, Jidong Zhai, Xiaoyong Du

    Abstract: Approximate Nearest Neighbor Search (ANNS) is essential for various data-intensive applications, including recommendation systems, image retrieval, and machine learning. Scaling ANNS to handle billions of high-dimensional vectors on a single machine presents significant challenges in memory capacity and processing efficiency. To address these challenges, distributed vector databases leverage multi… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  10. arXiv:2506.14248  [pdf, ps, other

    cs.CL cs.AI

    Re-Initialization Token Learning for Tool-Augmented Large Language Models

    Authors: Chenghao Li, Liu Liu, Baosheng Yu, Jiayan Qiu, Yibing Zhan

    Abstract: Large language models have demonstrated exceptional performance, yet struggle with complex tasks such as numerical reasoning, plan generation. Integrating external tools, such as calculators and databases, into large language models (LLMs) is crucial for enhancing problem-solving capabilities. Current methods assign a unique token to each tool, enabling LLMs to call tools through token prediction-… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  11. arXiv:2506.14243  [pdf, ps, other

    cs.CV

    Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition

    Authors: Xiaohui Jiang, Haijiang Zhu, Chadei Li, Fulin Tang, Ning An

    Abstract: LiDAR-based place recognition serves as a crucial enabler for long-term autonomy in robotics and autonomous driving systems. Yet, prevailing methodologies relying on handcrafted feature extraction face dual challenges: (1) Inconsistent point cloud density, induced by ego-motion dynamics and environmental disturbances during repeated traversals, leads to descriptor instability, and (2) Representati… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  12. arXiv:2506.14229  [pdf, ps, other

    cs.CV cs.AI

    HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction

    Authors: Changbai Li, Haodong Zhu, Hanlin Chen, Juan Zhang, Tongfei Chen, Shuo Yang, Shuwei Shao, Wenhao Dong, Baochang Zhang

    Abstract: 3D Gaussian Splatting (3DGS) has made significant strides in real-time 3D scene reconstruction, but faces memory scalability issues in high-resolution scenarios. To address this, we propose Hierarchical Gaussian Splatting (HRGS), a memory-efficient framework with hierarchical block-level optimization. First, we generate a global, coarse Gaussian representation from low-resolution data. Then, we pa… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  13. arXiv:2506.13831  [pdf, ps, other

    cs.LG cs.AI

    Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

    Authors: Jitian Zhao, Chenghui Li, Frederic Sala, Karl Rohe

    Abstract: Concept-based approaches, which aim to identify human-understandable concepts within a model's internal representations, are a promising method for interpreting embeddings from deep neural network models, such as CLIP. While these approaches help explain model behavior, current methods lack statistical rigor, making it challenging to validate identified concepts and compare different techniques. T… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  14. arXiv:2506.13363  [pdf, ps, other

    cs.CL

    Efficient Medical VIE via Reinforcement Learning

    Authors: Lijun Liu, Ruiyang Li, Zhaocheng Liu, Chenglin Zhu, Chong Li, Jiehan Cheng, Qiang Ju, Jian Xie

    Abstract: Visual Information Extraction (VIE) converts unstructured document images into structured formats like JSON, critical for medical applications such as report analysis and online consultations. Traditional methods rely on OCR and language models, while end-to-end multimodal models offer direct JSON generation. However, domain-specific schemas and high annotation costs limit their effectiveness in m… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  15. arXiv:2506.13301  [pdf, ps, other

    cs.CV

    AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing

    Authors: Biao Yang, Muqi Huang, Yuhui Zhang, Yun Xiong, Kun Zhou, Xi Chen, Shiyang Zhou, Huishuai Bao, Chuan Li, Feng Shi, Hualei Liu

    Abstract: Traditional point-based image editing methods rely on iterative latent optimization or geometric transformations, which are either inefficient in their processing or fail to capture the semantic relationships within the image. These methods often overlook the powerful yet underutilized image editing capabilities inherent in pre-trained diffusion models. In this work, we propose a novel one-step po… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  16. arXiv:2506.13276  [pdf, ps, other

    cs.AI

    Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks

    Authors: Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang

    Abstract: Text-attributed graphs (TAGs) integrate textual data with graph structures, providing valuable insights in applications such as social network analysis and recommendation systems. Graph Neural Networks (GNNs) effectively capture both topological structure and textual information in TAGs but are vulnerable to adversarial attacks. Existing graph injection attack (GIA) methods assume that attackers c… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  17. arXiv:2506.13129  [pdf, ps, other

    cs.HC

    ChartBlender: An Interactive System for Authoring and Synchronizing Visualization Charts in Video

    Authors: Yi He, Yuqi Liu, Chenpu Li, Ruoyan Chen, Chuer Chen, Shengqi Dang, Nan Cao

    Abstract: Embedding data visualizations in video can enhance the communication of complex information. However, this process is often labor-intensive, requiring designers to adjust visualizations frame by frame manually. In this work, we present ChartBlender, a novel system that streamlines this process by enabling users to create data visualizations, embed them seamlessly into video scenes, and automatical… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures

  18. arXiv:2506.12388  [pdf, ps, other

    cs.CL cs.AI

    Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

    Authors: Chong Li, Yingzhuo Deng, Jiajun Zhang, Chengqing Zong

    Abstract: The curse of multilinguality phenomenon is a fundamental problem of multilingual Large Language Models (LLMs), where the competition between massive languages results in inferior performance. It mainly comes from limited capacity and negative transfer between dissimilar languages. To address this issue, we propose a method to dynamically group and scale up the parameters of multilingual LLM while… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: ACL 2025, our codes and models are available at https://github.com/ZNLP/DMoE

  19. arXiv:2506.12260  [pdf, ps, other

    cs.SD eess.AS

    Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment

    Authors: Wei Wang, Wangyou Zhang, Chenda Li, Jiatong Shi, Shinji Watanabe, Yanmin Qian

    Abstract: Speech quality assessment (SQA) aims to predict the perceived quality of speech signals under a wide range of distortions. It is inherently connected to speech enhancement (SE), which seeks to improve speech quality by removing unwanted signal components. While SQA models are widely used to evaluate SE performance, their potential to guide SE training remains underexplored. In this work, we invest… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Submitted to ASRU 2025

  20. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  21. arXiv:2506.11565  [pdf, ps, other

    cs.ET cs.LG physics.optics

    Gradients of unitary optical neural networks using parameter-shift rule

    Authors: Jinzhe Jiang, Yaqian Zhao, Xin Zhang, Chen Li, Yunlong Yu, Hailing Liu

    Abstract: This paper explores the application of the parameter-shift rule (PSR) for computing gradients in unitary optical neural networks (UONNs). While backpropagation has been fundamental to training conventional neural networks, its implementation in optical neural networks faces significant challenges due to the physical constraints of optical systems. We demonstrate how PSR, which calculates gradients… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 8 pages, 3 figures

  22. arXiv:2506.11516  [pdf, ps, other

    cs.LG cs.CL

    Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning

    Authors: Chengye Li, Haiyun Liu, Yuanxi Li

    Abstract: In-context learning (ICL) allows large language models (LLMs) to solve novel tasks without weight updates. Despite its empirical success, the mechanism behind ICL remains poorly understood, limiting our ability to interpret, improve, and reliably apply it. In this paper, we propose a new theoretical perspective that interprets ICL as an implicit form of knowledge distillation (KD), where prompt de… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 10 main pages, 10 page appendix

  23. arXiv:2506.11496  [pdf, ps, other

    eess.IV cs.CV

    Taming Stable Diffusion for Computed Tomography Blind Super-Resolution

    Authors: Chunlei Li, Yilei Shi, Haoxi Hu, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

    Abstract: High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion mod… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  24. arXiv:2506.11442  [pdf, other

    cs.SE cs.LG

    ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification

    Authors: Yiyang Jin, Kunzhao Xu, Hang Li, Xueting Han, Yanmin Zhou, Cheng Li, Jing Bai

    Abstract: Recent advances in reinforcement learning (RL) with verifiable outcome rewards have significantly improved the reasoning capabilities of large language models (LLMs), especially when combined with multi-turn tool interactions. However, existing methods lack both meaningful verification signals from realistic environments and explicit optimization for verification, leading to unreliable self-verifi… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  25. arXiv:2506.11418  [pdf, ps, other

    cs.CL

    Efficient Long-Context LLM Inference via KV Cache Clustering

    Authors: Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, Kun Yuan

    Abstract: Large language models (LLMs) with extended context windows have become increasingly prevalent for tackling complex tasks. However, the substantial Key-Value (KV) cache required for long-context LLMs poses significant deployment challenges. Existing approaches either discard potentially critical information needed for future generations or offer limited efficiency gains due to high computational ov… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  26. arXiv:2506.11331  [pdf, ps, other

    cs.AI cs.SD eess.AS

    MUDAS: Mote-scale Unsupervised Domain Adaptation in Multi-label Sound Classification

    Authors: Jihoon Yun, Chengzhang Li, Dhrubojyoti Roy, Anish Arora

    Abstract: Unsupervised Domain Adaptation (UDA) is essential for adapting machine learning models to new, unlabeled environments where data distribution shifts can degrade performance. Existing UDA algorithms are designed for single-label tasks and rely on significant computational resources, limiting their use in multi-label scenarios and in resource-constrained IoT devices. Overcoming these limitations is… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  27. arXiv:2506.11167  [pdf, ps, other

    cs.CV cs.LG

    Towards a general-purpose foundation model for fMRI analysis

    Authors: Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, Tianming Liu, Daniel Barron, Quanzheng Li, Randy Hirschtick, Byung-Hoon Kim, Xiang Li, Yixuan Yuan

    Abstract: Functional Magnetic Resonance Imaging (fMRI) is essential for studying brain function and diagnosing neurological disorders, but current analysis methods face reproducibility and transferability issues due to complex pre-processing and task-specific models. We introduce NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized Representation Modeling), a generalizable framework tha… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  28. arXiv:2506.11137  [pdf

    cs.CL

    Scalable Medication Extraction and Discontinuation Identification from Electronic Health Records Using Large Language Models

    Authors: Chong Shao, Douglas Snyder, Chiran Li, Bowen Gu, Kerry Ngan, Chun-Ting Yang, Jiageng Wu, Richard Wyss, Kueiyu Joshua Lin, Jie Yang

    Abstract: Identifying medication discontinuations in electronic health records (EHRs) is vital for patient safety but is often hindered by information being buried in unstructured notes. This study aims to evaluate the capabilities of advanced open-sourced and proprietary large language models (LLMs) in extracting medications and classifying their medication status from EHR notes, focusing on their scalabil… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: preprint, under review

  29. arXiv:2506.11094  [pdf, ps, other

    cs.CL cs.AI cs.CR

    The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

    Authors: Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu

    Abstract: With the rapid advancement of artificial intelligence technology, Large Language Models (LLMs) have demonstrated remarkable potential in the field of Natural Language Processing (NLP), including areas such as content generation, human-computer interaction, machine translation, and code generation, among others. However, their widespread deployment has also raised significant safety concerns. In re… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 21 pages, preprint

  30. arXiv:2506.11088  [pdf, ps, other

    cs.CL cs.AI

    Two Birds with One Stone: Improving Factuality and Faithfulness of LLMs via Dynamic Interactive Subspace Editing

    Authors: Pengbo Wang, Chaozhuo Li, Chenxu Wang, Liwen Zheng, Litian Zhang, Xi Zhang

    Abstract: LLMs have demonstrated unprecedented capabilities in natural language processing, yet their practical deployment remains hindered by persistent factuality and faithfulness hallucinations. While existing methods address these hallucination types independently, they inadvertently induce performance trade-offs, as interventions targeting one type often exacerbate the other. Through empirical and theo… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    MSC Class: 68T50

  31. arXiv:2506.10954  [pdf, ps, other

    cs.SE cs.AI

    SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks

    Authors: Lianghong Guo, Yanlin Wang, Caihua Li, Pengyu Yang, Jiachi Chen, Wei Tao, Yingtian Zou, Duyu Tang, Zibin Zheng

    Abstract: Constructing large-scale datasets for the GitHub issue resolution task is crucial for both training and evaluating the software engineering capabilities of Large Language Models (LLMs). However, the traditional process for creating such benchmarks is notoriously challenging and labor-intensive, particularly in the stages of setting up evaluation environments, grading test outcomes, and validating… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  32. arXiv:2506.10521  [pdf, ps, other

    cs.AI cs.CL

    Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

    Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv , et al. (2 additional authors not shown)

    Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on ev… ▽ More

    Submitted 12 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 82 pages

  33. arXiv:2506.09820  [pdf, ps, other

    cs.CL cs.AI cs.LG

    CoRT: Code-integrated Reasoning within Thinking

    Authors: Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu

    Abstract: Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge:… ▽ More

    Submitted 12 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: work in progress

  34. arXiv:2506.09803  [pdf, ps, other

    cs.LG cs.CR

    Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols

    Authors: Longzhu He, Chaozhuo Li, Peng Tang, Litian Zhang, Sen Su

    Abstract: Graph neural networks (GNNs) have achieved significant success in graph representation learning and have been applied to various domains. However, many real-world graphs contain sensitive personal information, such as user profiles in social networks, raising serious privacy concerns when graph learning is performed using GNNs. To address this issue, locally private graph learning protocols have g… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  35. arXiv:2506.09427  [pdf, other

    cs.CV cs.AI

    A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

    Authors: Yukang Feng, Jianwen Sun, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yifan Chang, Sizhuo Zhou, Shenglin Zhang, Yu Dai, Kaipeng Zhang

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed us… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  36. arXiv:2506.09416  [pdf, ps, other

    cs.CV

    Noise Conditional Variational Score Distillation

    Authors: Xinyu Peng, Ziyang Zheng, Yaoming Wang, Han Li, Nuowen Kan, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: We propose Noise Conditional Variational Score Distillation (NCVSD), a novel method for distilling pretrained diffusion models into generative denoisers. We achieve this by revealing that the unconditional score function implicitly characterizes the score function of denoising posterior distributions. By integrating this insight into the Variational Score Distillation (VSD) framework, we enable sc… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  37. arXiv:2506.09368  [pdf, ps, other

    cs.LG cs.AI

    Anomaly Detection and Generation with Diffusion Models: A Survey

    Authors: Yang Liu, Jing Liu, Chengfang Li, Rui Xi, Wenchao Li, Liang Cao, Jin Wang, Laurence T. Yang, Junsong Yuan, Wei Zhou

    Abstract: Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing, by identifying unexpected patterns that deviate from established norms in real-world data. Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest due to their ability to learn complex data distributions… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 20 pages, 11 figures, 13 tables

  38. arXiv:2506.09332  [pdf, ps, other

    cs.LG cs.CE cs.CL

    Natural Language Guided Ligand-Binding Protein Design

    Authors: Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, Lei Li

    Abstract: Can AI protein models follow human language instructions and design proteins with desired functions (e.g. binding to a ligand)? Designing proteins that bind to a given ligand is crucial in a wide range of applications in biology and chemistry. Most prior AI models are trained on protein-ligand complex data, which is scarce due to the high cost and time requirements of laboratory experiments. In co… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  39. arXiv:2506.09284  [pdf, ps, other

    cs.RO cs.AI cs.CV

    UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

    Authors: Yihe Tang, Wenlong Huang, Yingke Wang, Chengshu Li, Roy Yuan, Ruohan Zhang, Jiajun Wu, Li Fei-Fei

    Abstract: Understanding fine-grained object affordances is imperative for robots to manipulate objects in unstructured environments given open-ended task instructions. However, existing methods of visual affordance predictions often rely on manually annotated data or conditions only on a predefined set of tasks. We introduce UAD (Unsupervised Affordance Distillation), a method for distilling affordance know… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  40. arXiv:2506.09239  [pdf, ps, other

    cs.IT

    Rejection-Sampled Linear Codes for Lossy Compression and Channel Simulation

    Authors: Cheuk Ting Li, Jianguo Zhao

    Abstract: We show that a linear code combined with rejection sampling can give a capacity-achieving scheme for simulating channels with additive noises with exchangeable distributions. Hence, it can be used in lossy source coding to achieve the rate-distortion function. Interestingly, unlike conventional linear covering codes for lossy compression which concerns the trade-off between the rate and the coveri… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 5 figures

  41. arXiv:2506.07809  [pdf, ps, other

    cs.CV

    Incorporating Uncertainty-Guided and Top-k Codebook Matching for Real-World Blind Image Super-Resolution

    Authors: Weilei Wen, Tianyi Zhang, Qianqian Zhao, Zhaohui Zheng, Chunle Guo, Xiuli Shao, Chongyi Li

    Abstract: Recent advancements in codebook-based real image super-resolution (SR) have shown promising results in real-world applications. The core idea involves matching high-quality image features from a codebook based on low-resolution (LR) image features. However, existing methods face two major challenges: inaccurate feature matching with the codebook and poor texture detail reconstruction. To address t… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  42. arXiv:2506.07637  [pdf, ps, other

    cs.CV cs.LG

    HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition

    Authors: Yuchong Long, Wen Sun, Ningxiao Sun, Wenxiao Wang, Chao Li, Shan Yin

    Abstract: Automated pollen recognition is vital to paleoclimatology, biodiversity monitoring, and public health, yet conventional methods are hampered by inefficiency and subjectivity. Existing deep learning models often struggle to achieve the requisite localization accuracy for microscopic targets like pollen, which are characterized by their minute size, indistinct edges, and complex backgrounds. To over… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 16 pages, 5 figures, 2 tables. The dataset at https://www.kaggle.com/datasets/ayinven/hieraedgenetintegratesdatasets. The models at https://huggingface.co/datasets/AyinMostima/HieraEdgeNetintegratesdatasets. The source code in at https://github.com/AyinMostima/PalynoKit

    MSC Class: 68T07; 68T45 ACM Class: I.2.10; I.4.9; I.5.4

  43. arXiv:2506.07446  [pdf, ps, other

    cs.AI

    Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification

    Authors: Liwen Zheng, Chaozhuo Li, Zheng Liu, Feiran Huang, Haoran Jia, Zaisheng Ye, Xi Zhang

    Abstract: Fact verification plays a vital role in combating misinformation by assessing the veracity of claims through evidence retrieval and reasoning. However, traditional methods struggle with complex claims requiring multi-hop reasoning over fragmented evidence, as they often rely on static decomposition strategies and surface-level semantic retrieval, which fail to capture the nuanced structure and int… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  44. arXiv:2506.07075  [pdf, ps, other

    cs.AI

    Reasoning Paths as Signals: Augmenting Multi-hop Fact Verification through Structural Reasoning Progression

    Authors: Liwen Zheng, Chaozhuo Li, Haoran Jia, Xi Zhang

    Abstract: The growing complexity of factual claims in real-world scenarios presents significant challenges for automated fact verification systems, particularly in accurately aggregating and reasoning over multi-hop evidence. Existing approaches often rely on static or shallow models that fail to capture the evolving structure of reasoning paths, leading to fragmented retrieval and limited interpretability.… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  45. arXiv:2506.06958  [pdf, ps, other

    cs.CY cs.AI cs.MA

    Position: Simulating Society Requires Simulating Thought

    Authors: Chance Jiajie Li, Jiayi Wu, Zhenze Mo, Ao Qu, Yuhan Tang, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Liang, Luis Alonso, Kent Larson

    Abstract: Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior -- it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior -- primarily through prompting and supervised fine-tuning. Yet they often lack internal coherence, causal reasoning,… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  46. arXiv:2506.06727  [pdf, ps, other

    cs.AI cs.CV

    VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs

    Authors: Can Li, Ting Zhang, Mei Wang, Hua Huang

    Abstract: Large Multimodal Models (LMMs) have demonstrated remarkable problem-solving capabilities across various domains. However, their ability to perform mathematical reasoning when answer options are represented as images--an essential aspect of multi-image comprehension--remains underexplored. To bridge this gap, we introduce VisioMath, a benchmark designed to evaluate mathematical reasoning in multimo… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  47. arXiv:2506.06710  [pdf, ps, other

    cs.CV eess.IV

    A Systematic Investigation on Deep Learning-Based Omnidirectional Image and Video Super-Resolution

    Authors: Qianqian Zhao, Chunle Guo, Tianyi Zhang, Junpei Zhang, Peiyang Jia, Tan Su, Wenjie Jiang, Chongyi Li

    Abstract: Omnidirectional image and video super-resolution is a crucial research topic in low-level vision, playing an essential role in virtual reality and augmented reality applications. Its goal is to reconstruct high-resolution images or video frames from low-resolution inputs, thereby enhancing detail preservation and enabling more accurate scene analysis and interpretation. In recent years, numerous i… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  48. arXiv:2506.06690  [pdf, ps, other

    cs.RO cs.CV

    SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

    Authors: Hao Wang, Chengkai Hou, Xianglong Li, Yankai Fu, Chenxuan Li, Ning Chen, Gaole Dai, Jiaming Liu, Tiejun Huang, Shanghang Zhang

    Abstract: Learning to control high-speed objects in the real world remains a challenging frontier in robotics. Table tennis serves as an ideal testbed for this problem, demanding both rapid interception of fast-moving balls and precise adjustment of their trajectories. This task presents two fundamental challenges: it requires a high-precision vision system capable of accurately predicting ball trajectories… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  49. arXiv:2506.06541  [pdf, ps, other

    cs.DB cs.AI cs.MA

    KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

    Authors: Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Sivaprasad Sudhir, Om Chabra, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska

    Abstract: Constructing real-world data-to-insight pipelines often involves data extraction from data lakes, data integration across heterogeneous data sources, and diverse operations from data cleaning to analysis. The design and implementation of data science pipelines require domain knowledge, technical expertise, and even project-specific insights. AI systems have shown remarkable reasoning, coding, and… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  50. arXiv:2506.06240  [pdf, ps, other

    cs.CL

    Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge

    Authors: Yi Sui, Chaozhuo Li, Chen Zhang, Dawei song, Qiuchi Li

    Abstract: Retrieval-augmented generation (RAG) is a cost-effective approach to mitigate the hallucination of Large Language Models (LLMs) by incorporating the retrieved external knowledge into the generation process. However, external knowledge may conflict with the parametric knowledge of LLMs. Furthermore, current LLMs lack inherent mechanisms for resolving such knowledge conflicts, making traditional RAG… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.