Skip to main content

Showing 1–50 of 626 results for author: Xiao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05816  [pdf, ps, other

    cs.AI cs.CE cs.CL

    Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity

    Authors: Shuai Zhao, Yulin Zhang, Luwei Xiao, Xinyi Wu, Yanhao Jia, Zhongliang Guo, Xiaobao Wu, Cong-Duy Nguyen, Guoming Zhang, Anh Tuan Luu

    Abstract: Despite the remarkable progress of large language models (LLMs) across various domains, their capacity to predict retinopathy of prematurity (ROP) risk remains largely unexplored. To address this gap, we introduce a novel Chinese benchmark dataset, termed CROP, comprising 993 admission records annotated with low, medium, and high-risk labels. To systematically examine the predictive capabilities a… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.04263  [pdf, ps, other

    cs.RO

    SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

    Authors: Liwen Xiao, Zhiyu Pan, Zhicheng Wang, Zhiguo Cao, Wei Li

    Abstract: Accurate prediction of multi-agent future trajectories is crucial for autonomous driving systems to make safe and efficient decisions. Trajectory refinement has emerged as a key strategy to enhance prediction accuracy. However, existing refinement methods often overlook the topological relationships between trajectories, which are vital for improving prediction precision. Inspired by braid theory,… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  3. arXiv:2507.03868  [pdf, ps, other

    cs.AI cs.CE cs.CY cs.MM

    From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM

    Authors: Xinyi Wu, Yanhao Jia, Luwei Xiao, Shuai Zhao, Fengkuang Chiang, Erik Cambria

    Abstract: In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  4. arXiv:2507.02119  [pdf, ps, other

    cs.LG

    Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

    Authors: Shikai Qiu, Lechao Xiao, Andrew Gordon Wilson, Jeffrey Pennington, Atish Agarwala

    Abstract: What scaling limits govern neural network training dynamics when model size and training time grow in tandem? We show that despite the complex interactions between architecture, training algorithms, and data, compute-optimally trained models exhibit a remarkably precise universality. Specifically, loss curves from models of varying sizes collapse onto a single universal curve when training compute… ▽ More

    Submitted 7 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: ICML 25. Code available at https://github.com/shikaiqiu/supercollapse

  5. arXiv:2507.01040  [pdf, ps, other

    cs.LG cs.AI cs.NE cs.PF

    Fast Clifford Neural Layers

    Authors: Tianxiang Xia, Max Neuwinger, Lin Xiao

    Abstract: Clifford Neural Layers improve PDE modeling by introducing Clifford Algebra into neural networks. In this project we focus on optimizing the inference of 2/3D Clifford convolutional layers and multivector activation layers for one core CPU performance. Overall, by testing on a real network block involving Clifford convolutional layers and multivector activation layers, we observe that our implem… ▽ More

    Submitted 22 June, 2025; originally announced July 2025.

    Comments: 7 pages content-wise

  6. arXiv:2506.23643  [pdf, ps, other

    cs.IR

    Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation

    Authors: Yifan Wang, Weinan Gan, Longtao Xiao, Jieming Zhu, Heng Chang, Haozhao Wang, Rui Zhang, Zhenhua Dong, Ruiming Tang, Ruixuan Li

    Abstract: Generative recommendation (GR) typically encodes behavioral or semantic aspects of item information into discrete tokens, leveraging the standard autoregressive (AR) generation paradigm to make predictions. However, existing methods tend to overlook their intrinsic relationship, that is, the semantic usually provides some reasonable explainability "$\textbf{why}$" for the behavior "… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures

  7. arXiv:2506.22401  [pdf, ps, other

    cs.LG math.OC

    Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL

    Authors: Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

    Abstract: Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  8. arXiv:2506.17627  [pdf, ps, other

    cs.SE

    CodeMorph: Mitigating Data Leakage in Large Language Model Assessment

    Authors: Hongzhou Rao, Yanjie Zhao, Wenjie Zhu, Ling Xiao, Meizhen Wang, Haoyu Wang

    Abstract: Concerns about benchmark leakage in large language models for code (Code LLMs) have raised issues of data contamination and inflated evaluation metrics. The diversity and inaccessibility of many training datasets make it difficult to prevent data leakage entirely, even with time lag strategies. Consequently, generating new datasets through code perturbation has become essential. However, existing… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: Accepted by ICSE 2025 (Industry Challenge Track)

  9. arXiv:2506.17188  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Towards AI Search Paradigm

    Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin

    Abstract: In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from simple factual queries to complex m… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  10. arXiv:2506.15227  [pdf, ps, other

    cs.SE

    Large Language Models for Unit Testing: A Systematic Literature Review

    Authors: Quanjun Zhang, Chunrong Fang, Siqi Gu, Ye Shang, Zhenyu Chen, Liang Xiao

    Abstract: Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly redu… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  11. arXiv:2506.12829  [pdf, ps, other

    stat.ML cs.LG

    General and Estimable Learning Bound Unifying Covariate and Concept Shifts

    Authors: Hongbo Chen, Li Charlie Xia

    Abstract: Generalization under distribution shift remains a core challenge in modern machine learning, yet existing learning bound theory is limited to narrow, idealized settings and is non-estimable from samples. In this paper, we bridge the gap between theory and practical applications. We first show that existing bounds become loose and non-estimable because their concept shift definition breaks when the… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  12. arXiv:2506.09997  [pdf, ps, other

    cs.GR cs.AI cs.CV cs.LG

    DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

    Authors: Chieh Hubert Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang, Yuheng Ren, Richard Newcombe, Zhao Dong, Zhengqin Li

    Abstract: We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to stati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Project page: https://hubert0527.github.io/dgslrm/

  13. arXiv:2506.09738  [pdf, ps, other

    cs.LG

    Towards Multi-modal Graph Large Language Model

    Authors: Xin Wang, Zeyang Zhang, Linxin Xiao, Haibo Chen, Chendi Ge, Wenwu Zhu

    Abstract: Multi-modal graphs, which integrate diverse multi-modal features and relations, are ubiquitous in real-world applications. However, existing multi-modal graph learning methods are typically trained from scratch for specific graph data and tasks, failing to generalize across various multi-modal graph data and tasks. To bridge this gap, we explore the potential of Multi-modal Graph Large Language Mo… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  14. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  15. arXiv:2506.08158  [pdf, ps, other

    cs.CL

    ETT-CKGE: Efficient Task-driven Tokens for Continual Knowledge Graph Embedding

    Authors: Lijing Zhu, Qizhen Lan, Qing Tian, Wenbo Sun, Li Yang, Lu Xia, Yixin Xie, Xi Xiao, Tiehang Duan, Cui Tao, Shuteng Niu

    Abstract: Continual Knowledge Graph Embedding (CKGE) seeks to integrate new knowledge while preserving past information. However, existing methods struggle with efficiency and scalability due to two key limitations: (1) suboptimal knowledge preservation between snapshots caused by manually designed node/relation importance scores that ignore graph dependencies relevant to the downstream task, and (2) comput… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  16. arXiv:2506.06270  [pdf, ps, other

    cs.IR

    RecGPT: A Foundation Model for Sequential Recommendation

    Authors: Yangqin Jiang, Xubin Ren, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang

    Abstract: This work addresses a fundamental barrier in recommender systems: the inability to generalize across domains without extensive retraining. Traditional ID-based approaches fail entirely in cold-start and cross-domain scenarios where new users or items lack sufficient interaction history. Inspired by foundation models' cross-domain success, we develop a foundation model for sequential recommendation… ▽ More

    Submitted 12 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  17. arXiv:2506.05280  [pdf, ps, other

    cs.CV

    Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

    Authors: Nan Wang, Yuantao Chen, Lixing Xiao, Weiqing Xiao, Bohan Li, Zhaoxi Chen, Chongjie Ye, Shaocong Xu, Saining Zhang, Ziyang Yan, Pierre Merriaux, Lei Lei, Tianfan Xue, Hao Zhao

    Abstract: Neural rendering techniques, including NeRF and Gaussian Splatting (GS), rely on photometric consistency to produce high-quality reconstructions. However, in real-world scenarios, it is challenging to guarantee perfect photometric consistency in acquired images. Appearance codes have been widely used to address this issue, but their modeling capability is limited, as a single code is applied to th… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: Project page: https://bigcileng.github.io/bilateral-driving ; Code: https://github.com/BigCiLeng/bilateral-driving

  18. arXiv:2506.04576  [pdf, ps, other

    cs.IT

    Sparse Phase Retrieval with Redundant Dictionary via $\ell_q (0<q\le 1)$-Analysis Model

    Authors: Haiye Huo, Li Xiao

    Abstract: Sparse phase retrieval with redundant dictionary is to reconstruct the signals of interest that are (nearly) sparse in a redundant dictionary or frame from the phaseless measurements via the optimization models. Gao [7] presented conditions on the measurement matrix, called null space property (NSP) and strong dictionary restricted isometry property (S-DRIP), for exact and stable recovery of dicti… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 21 Pages

  19. arXiv:2506.03961  [pdf, ps, other

    cs.IT

    Stable recovery of complex dictionary-sparse signals from phaseless measurements

    Authors: Lianxing Xia, Haiye Huo

    Abstract: Dictionary-sparse phase retrieval, which is also known as phase retrieval with redundant dictionary, aims to reconstruct an original dictionary-sparse signal from its measurements without phase information. It is proved that if the measurement matrix $A$ satisfies null space property (NSP)/strong dictionary restricted isometry property (S-DRIP), then the dictionary-sparse signal can be exactly/sta… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 17 pages

  20. arXiv:2506.03928  [pdf, ps, other

    cs.CV

    Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample

    Authors: Ze Feng, Jiang-Jiang Liu, Sen Yang, Lingyu Xiao, Xiaofan Li, Wankou Yang, Jingdong Wang

    Abstract: In this work, we study the Efficient Multimodal Large Language Model. Redundant vision tokens consume a significant amount of computational memory and resources. Therefore, many previous works compress them in the Vision Projector to reduce the number of vision tokens. However, simply compressing in the Vision Projector can lead to the loss of visual information, especially for tasks that rely on… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  21. arXiv:2506.00968  [pdf, ps, other

    cs.AI

    PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation

    Authors: Linhan Xia, Mingzhan Yang, Guohui Yuan, Shengnan Tao, Yujing Qiu, Guo Yu, Kai Lei

    Abstract: Mainstream Word Sense Disambiguation (WSD) approaches have employed BERT to extract semantics from both context and definitions of senses to determine the most suitable sense of a target word, achieving notable performance. However, there are two limitations in these approaches. First, previous studies failed to balance the representation of token-level (local) and sequence-level (global) semantic… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  22. arXiv:2505.22649  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Pre-training for Recommendation Unlearning

    Authors: Guoxuan Chen, Lianghao Xia, Chao Huang

    Abstract: Modern recommender systems powered by Graph Neural Networks (GNNs) excel at modeling complex user-item interactions, yet increasingly face scenarios requiring selective forgetting of training data. Beyond user requests to remove specific interactions due to privacy concerns or preference changes, regulatory frameworks mandate recommender systems' ability to eliminate the influence of certain user… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to SIGIR 2025 Oral

  23. arXiv:2505.19815  [pdf, ps, other

    cs.CL cs.AI

    Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

    Authors: Junnan Liu, Hongwei Liu, Linchen Xiao, Shudong Liu, Taolin Zhang, Zihan Ma, Songyang Zhang, Kai Chen

    Abstract: We propose a novel framework for comprehending the reasoning capabilities of large language models (LLMs) through the perspective of meta-learning. By conceptualizing reasoning trajectories as pseudo-gradient descent updates to the LLM's parameters, we identify parallels between LLM reasoning and various meta-learning paradigms. We formalize the training process for reasoning tasks as a meta-learn… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  24. arXiv:2505.18932  [pdf, ps, other

    cs.CV

    Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency

    Authors: Hyunho Ha, Lei Xiao, Christian Richardt, Thu Nguyen-Phuoc, Changil Kim, Min H. Kim, Douglas Lanman, Numair Khan

    Abstract: We introduce a novel geometry-guided online video view synthesis method with enhanced view and temporal consistency. Traditional approaches achieve high-quality synthesis from dense multi-view camera setups but require significant computational resources. In contrast, selective-input methods reduce this cost but often compromise quality, leading to multi-view and temporal inconsistencies such as f… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025. Project website: https://nkhan2.github.io/projects/geometry-guided-2025/index.html

  25. arXiv:2505.18705  [pdf, ps, other

    cs.AI

    AI-Researcher: Autonomous Scientific Innovation

    Authors: Jiabin Tang, Lianghao Xia, Zhonghang Li, Chao Huang

    Abstract: The powerful reasoning capabilities of Large Language Models (LLMs) in mathematics and coding, combined with their ability to automate complex tasks through agentic frameworks, present unprecedented opportunities for accelerating scientific innovation. In this paper, we introduce AI-Researcher, a fully autonomous research system that transforms how AI-driven scientific discovery is conducted and e… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Code on github: https://github.com/HKUDS/AI-Researcher

  26. arXiv:2505.18668  [pdf, ps, other

    cs.CV cs.CL

    ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation

    Authors: Zhen Li, Duan Li, Yukai Guo, Xinyuan Guo, Bowen Li, Lanxi Xiao, Shenyu Qiao, Jiashu Chen, Zijian Wu, Hui Zhang, Xinhuan Shu, Shixia Liu

    Abstract: Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the under… ▽ More

    Submitted 7 June, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: 56 pages

  27. arXiv:2505.17050  [pdf, ps, other

    cs.CL cs.AI cs.CE cs.CY cs.MM

    Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning

    Authors: Yanhao Jia, Xinyi Wu, Qinglin Zhang, Yiran Qin, Luwei Xiao, Shuai Zhao

    Abstract: Project-Based Learning (PBL) involves a variety of highly correlated multimodal data, making it a vital educational approach within STEM disciplines. With the rapid development of multimodal large language models (MLLMs), researchers have begun exploring their potential to enhance tasks such as information retrieval, knowledge comprehension, and data generation in educational settings. However, ex… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  28. arXiv:2505.15536  [pdf, ps, other

    eess.SY cs.DC

    DeepCEE: Efficient Cross-Region Model Distributed Training System under Heterogeneous GPUs and Networks

    Authors: Jinquan Wang, Xiaojian Liao, Xuzhao Liu, Jiashun Suo, Zhisheng Huo, Chenhao Zhang, Xiangrong Xu, Runnan Shen, Xilong Xie, Limin Xiao

    Abstract: Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model… ▽ More

    Submitted 27 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  29. arXiv:2505.11141  [pdf, ps, other

    cs.CV cs.AI

    Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans

    Authors: Yansheng Qiu, Li Xiao, Zhaopan Xu, Pengfei Zhou, Zheng Wang, Kaipeng Zhang

    Abstract: The goal of achieving Artificial General Intelligence (AGI) is to imitate humans and surpass them. Models such as OpenAI's o1, o3, and DeepSeek's R1 have demonstrated that large language models (LLMs) with human-like reasoning capabilities exhibit exceptional performance and are being gradually integrated into multimodal large language models (MLLMs). However, whether these models possess capabili… ▽ More

    Submitted 23 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  30. arXiv:2505.10433  [pdf, ps, other

    cs.GT

    Bridging Theory and Perception in Fair Division: A Study on Comparative and Fair Share Notions

    Authors: Hadi Hosseini, Joshua Kavner, Samarth Khanna, Sujoy Sikdar, Lirong Xia

    Abstract: The allocation of resources among multiple agents is a fundamental problem in both economics and computer science. In these settings, fairness plays a crucial role in ensuring social acceptability and practical implementation of resource allocation algorithms. Traditional fair division solutions have given rise to a variety of approximate fairness notions, often as a response to the challenges pos… ▽ More

    Submitted 9 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

    Comments: 29 pages, 10 figures

  31. arXiv:2505.10388  [pdf, ps, other

    cs.GT

    Aggregating Information and Preferences with Bounded-Size Deviations

    Authors: Qishen Han, Grant Schoenebeck, Biaoshuai Tao, Lirong Xia

    Abstract: We investigate a voting scenario with two groups of agents whose preferences depend on a ground truth that cannot be directly observed. The majority's preferences align with the ground truth, while the minorities disagree. Focusing on strategic behavior, we analyze situations where agents can form coalitions up to a certain capacity and adopt the concept of ex-ante Bayesian $k$-strong equilibrium,… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  32. arXiv:2505.10377  [pdf, ps, other

    cs.GT

    The Art of Two-Round Voting

    Authors: Qishen Han, Grant Schoenebeck, Biaoshuai Tao, Lirong Xia

    Abstract: We study the voting problem with two alternatives where voters' preferences depend on a not-directly-observable state variable. While equilibria in the one-round voting mechanisms lead to a good decision, they are usually hard to compute and follow. We consider the two-round voting mechanism where the first round serves as a polling stage and the winning alternative only depends on the outcome of… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  33. arXiv:2505.06625  [pdf, ps, other

    cs.AR cs.AI cs.OS

    CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

    Authors: Tianhao Cai, Liang Wang, Limin Xiao, Meng Han, Zeyu Wang, Lin Sun, Xiaojian Liao

    Abstract: With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 7 pages, 9 figures. This paper has been accepted to the 2025 Design Automation Conference (DAC)

  34. arXiv:2504.20026  [pdf, other

    cs.CV cs.AI

    LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

    Authors: Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Cheng Zhang, Yufeng Zhu, Carl S. Marshall, Yufeng Ren, Richard Newcombe, Zhao Dong

    Abstract: We present Large Inverse Rendering Model (LIRM), a transformer architecture that jointly reconstructs high-quality shape, materials, and radiance fields with view-dependent effects in less than a second. Our model builds upon the recent Large Reconstruction Models (LRMs) that achieve state-of-the-art sparse-view reconstruction quality. However, existing LRMs struggle to reconstruct unseen parts ac… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025

  35. arXiv:2504.19746  [pdf, other

    cs.LG cs.AR

    FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs

    Authors: Xilong Xie, Liang Wang, Limin Xiao, Meng Han, Lin Sun, Shuai Zheng, Xiangrong Xu

    Abstract: Large language models (LLMs) have significantly advanced the natural language processing paradigm but impose substantial demands on memory and computational resources. Quantization is one of the most effective ways to reduce memory consumption of LLMs. However, advanced single-precision quantization methods experience significant accuracy degradation when quantizing to ultra-low bits. Existing mix… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: DATE 2025

  36. arXiv:2504.19191  [pdf, other

    cs.CL

    WuNeng: Hybrid State with Attention

    Authors: Liu Xiao, Li Zhiyuan, Lin Yueyu

    Abstract: The WuNeng architecture introduces a novel approach to enhancing the expressivity and power of large language models by integrating recurrent neural network (RNN)-based RWKV-7 with advanced attention mechanisms, prioritizing heightened contextual coherence over reducing KV cache size. Building upon the hybrid-head concept from Hymba, WuNeng augments standard multi-head attention with additional RW… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  37. arXiv:2504.17761  [pdf, ps, other

    cs.CV

    Step1X-Edit: A Practical Framework for General Image Editing

    Authors: Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, Daxin Jiang

    Abstract: In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. These models demonstrate an impressive aptitude for fulfilling a vast majority of user-driven editing requirements, marking a significant advancement in the field of… ▽ More

    Submitted 23 June, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: code: https://github.com/stepfun-ai/Step1X-Edit

  38. arXiv:2504.15848  [pdf, other

    cs.CL

    Exploring Cognitive and Aesthetic Causality for Multimodal Aspect-Based Sentiment Analysis

    Authors: Luwei Xiao, Rui Mao, Shuai Zhao, Qika Lin, Yanhao Jia, Liang He, Erik Cambria

    Abstract: Multimodal aspect-based sentiment classification (MASC) is an emerging task due to an increase in user-generated multimodal content on social platforms, aimed at predicting sentiment polarity toward specific aspect targets (i.e., entities or attributes explicitly mentioned in text-image pairs). Despite extensive efforts and significant achievements in existing MASC, substantial gaps remain in unde… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted by TAFFC 2025

  39. arXiv:2504.15021  [pdf

    cs.DC cs.LG cs.PF

    Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?

    Authors: Xinglei Dou, Lei Liu, Limin Xiao

    Abstract: Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services. OSML+ intelligently schedules the cache and main memory bandwidth resources at the memory hierarchy and the computing core resources simultaneously. OSML+ uses a multi-model collaborative learning approach during its scheduling and thu… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 25 pages, 14 figures, to be published in ACM Transactions on Storage

  40. arXiv:2504.14439  [pdf, other

    cs.LG cs.AI cs.CL

    LoRe: Personalizing LLMs via Low-Rank Reward Modeling

    Authors: Avinandan Bose, Zhihan Xiong, Yuejie Chi, Simon Shaolei Du, Lin Xiao, Maryam Fazel

    Abstract: Personalizing large language models (LLMs) to accommodate diverse user preferences is essential for enhancing alignment and user satisfaction. Traditional reinforcement learning from human feedback (RLHF) approaches often rely on monolithic value representations, limiting their ability to adapt to individual preferences. We introduce a novel framework that leverages low-rank preference modeling to… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  41. arXiv:2504.14260  [pdf, other

    cs.CV cs.CL

    Cross-attention for State-based model RWKV-7

    Authors: Liu Xiao, Li Zhiyuan, Lin Yueyu

    Abstract: We introduce CrossWKV, a novel cross-attention mechanism for the state-based RWKV-7 model, designed to enhance the expressive power of text-to-image generation. Leveraging RWKV-7's linear-complexity Weighted Key-Value (WKV) architecture, CrossWKV integrates text and image modalities in a single pass, utilizing a generalized delta rule with vector-valued gating and low-rank adaptations (LoRA) to ac… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  42. arXiv:2504.13054  [pdf, other

    cs.CL cs.AI

    Aspect-Based Summarization with Self-Aspect Retrieval Enhanced Generation

    Authors: Yichao Feng, Shuai Zhao, Yueqiu Li, Luwei Xiao, Xiaobao Wu, Anh Tuan Luu

    Abstract: Aspect-based summarization aims to generate summaries tailored to specific aspects, addressing the resource constraints and limited generalizability of traditional summarization approaches. Recently, large language models have shown promise in this task without the need for training. However, they rely excessively on prompt engineering and face token limits and hallucination challenges, especially… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  43. arXiv:2504.11165  [pdf, ps, other

    cs.CV

    YOLO-RS: Remote Sensing Enhanced Crop Detection Methods

    Authors: Linlin Xiao, Zhang Tiancong, Yutong Jia, Xinyu Nie, Mengyao Wang, Xiaohang Shao

    Abstract: With the rapid development of remote sensing technology, crop classification and health detection based on deep learning have gradually become a research hotspot. However, the existing target detection methods show poor performance when dealing with small targets in remote sensing images, especially in the case of complex background and image mixing, which is difficult to meet the practical applic… ▽ More

    Submitted 6 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  44. arXiv:2504.08247  [pdf, other

    cs.LG cs.CL

    Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner

    Authors: Liu Xiao, Li Zhiyuan, Lin Yueyu

    Abstract: State-based sequence models like RWKV-7 offer a compelling alternative to Transformer architectures, achieving linear complexity while demonstrating greater expressive power in short-context scenarios and enabling state tracking beyond the \(\text{TC}^0\) complexity class. However, RWKV-7 lacks mechanisms for token-parameter interactions and native scalability, limiting its adaptability and growth… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  45. arXiv:2504.05097  [pdf, ps, other

    cs.CL cs.LG

    State Tuning: State-based Test-Time Scaling on RWKV-7

    Authors: Liu Xiao, Li Zhiyuan, Lin Yueyu

    Abstract: Test-time scaling has emerged as a prominent research direction in machine learning, enabling models to enhance their expressive capabilities during inference.Transformers, renowned for striking a delicate balance between efficiency and expressiveness, have benefited from test-time scaling techniques that leverage an expanding key-value (KV) cache to significantly improve performance.In this paper… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  46. arXiv:2504.04795  [pdf, other

    cs.RO

    Embodied Perception for Test-time Grasping Detection Adaptation with Knowledge Infusion

    Authors: Jin Liu, Jialong Xie, Leibing Xiao, Chaoqun Wang, Fengyu Zhou

    Abstract: It has always been expected that a robot can be easily deployed to unknown scenarios, accomplishing robotic grasping tasks without human intervention. Nevertheless, existing grasp detection approaches are typically off-body techniques and are realized by training various deep neural networks with extensive annotated data support. {In this paper, we propose an embodied test-time adaptation framewor… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  47. arXiv:2504.03289  [pdf, other

    cs.SD cs.CL eess.AS

    RWKVTTS: Yet another TTS based on RWKV-7

    Authors: Lin yueyu, Liu Xiao

    Abstract: Human-AI interaction thrives on intuitive and efficient interfaces, among which voice stands out as a particularly natural and accessible modality. Recent advancements in transformer-based text-to-speech (TTS) systems, such as Fish-Speech, CosyVoice, and MegaTTS 3, have delivered remarkable improvements in quality and realism, driving a significant evolution in the TTS domain. In this paper, we in… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  48. arXiv:2504.02009  [pdf, other

    cs.CY cs.CL

    Urban Computing in the Era of Large Language Models

    Authors: Zhonghang Li, Lianghao Xia, Xubin Ren, Jiabin Tang, Tianyi Chen, Yong Xu, Chao Huang

    Abstract: Urban computing has emerged as a multidisciplinary field that harnesses data-driven technologies to address challenges and improve urban living. Traditional approaches, while beneficial, often face challenges with generalization, scalability, and contextual understanding. The advent of Large Language Models (LLMs) offers transformative potential in this domain. This survey explores the intersectio… ▽ More

    Submitted 29 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: https://github.com/HKUDS/Awesome-LLM4Urban-Papers

  49. arXiv:2503.22779  [pdf, ps, other

    cs.MA cs.GT cs.LG math.OC

    Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games

    Authors: Junkai Hu, Li Xia

    Abstract: We study a long-run mean-variance team stochastic game (MV-TSG), where each agent shares a common mean-variance objective for the system and takes actions independently to maximize it. MV-TSG has two main challenges. First, the variance metric is neither additive nor Markovian in a dynamic setting. Second, simultaneous policy updates of all agents lead to a non-stationary environment for each indi… ▽ More

    Submitted 12 June, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  50. arXiv:2503.19423  [pdf, other

    stat.AP cs.LG

    A novel forecasting framework combining virtual samples and enhanced Transformer models for tourism demand forecasting

    Authors: Tingting Diao, Xinzhang Wu, Lina Yang, Ling Xiao, Yunxuan Dong

    Abstract: Accurate tourism demand forecasting is hindered by limited historical data and complex spatiotemporal dependencies among tourist origins. A novel forecasting framework integrating virtual sample generation and a novel Transformer predictor addresses constraints arising from restricted data availability. A spatiotemporal GAN produces realistic virtual samples by dynamically modeling spatial correla… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.