Skip to main content

Showing 1–50 of 315 results for author: Zhou, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19840  [pdf, ps, other

    cs.CV

    GenHSI: Controllable Generation of Human-Scene Interaction Videos

    Authors: Zekun Li, Rui Zhou, Rahul Sajnani, Xiaoyan Cong, Daniel Ritchie, Srinath Sridhar

    Abstract: Large-scale pre-trained video diffusion models have exhibited remarkable capabilities in diverse video generation. However, existing solutions face several challenges in using these models to generate long movie-like videos with rich human-object interactions that include unrealistic human-scene interaction, lack of subject identity preservation, and require expensive training. We propose GenHSI,… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  2. arXiv:2506.16703  [pdf, ps, other

    cs.RO

    VLM-Empowered Multi-Mode System for Efficient and Safe Planetary Navigation

    Authors: Sinuo Cheng, Ruyi Zhou, Wenhao Feng, Huaiguang Yang, Haibo Gao, Zongquan Deng, Liang Ding

    Abstract: The increasingly complex and diverse planetary exploration environment requires more adaptable and flexible rover navigation strategy. In this study, we propose a VLM-empowered multi-mode system to achieve efficient while safe autonomous navigation for planetary rovers. Vision-Language Model (VLM) is used to parse scene information by image inputs to achieve a human-level understanding of terrain… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: accepted by IROS 2025

  3. arXiv:2506.15943  [pdf, ps, other

    cs.LG

    On the optimal regret of collaborative personalized linear bandits

    Authors: Bruce Huang, Ruida Zhou, Lin F. Yang, Suhas Diggavi

    Abstract: Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. Applyi… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 30 pages, 4 figures

  4. arXiv:2506.13961  [pdf, ps, other

    eess.SY cs.AI

    Safe Domains of Attraction for Discrete-Time Nonlinear Systems: Characterization and Verifiable Neural Network Estimation

    Authors: Mohamed Serry, Haoyu Li, Ruikun Zhou, Huan Zhang, Jun Liu

    Abstract: Analysis of nonlinear autonomous systems typically involves estimating domains of attraction, which have been a topic of extensive research interest for decades. Despite that, accurately estimating domains of attraction for nonlinear systems remains a challenging task, where existing methods are conservative or limited to low-dimensional systems. The estimation becomes even more challenging when a… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  5. arXiv:2506.12637  [pdf, ps, other

    cs.CL

    How Grounded is Wikipedia? A Study on Structured Evidential Support

    Authors: William Walden, Kathryn Ricci, Miriam Wanner, Zhengping Jiang, Chandler May, Rongkun Zhou, Benjamin Van Durme

    Abstract: Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work provides a quantitative analysis of the extent to which Wikipedia *is* so grounded and of how readily grounding evidence may be retrieve… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  6. arXiv:2506.12303  [pdf, ps, other

    cs.LG stat.ML

    SPIRE: Conditional Personalization for Federated Diffusion Generative Models

    Authors: Kaan Ozkara, Ruida Zhou, Suhas Diggavi

    Abstract: Recent advances in diffusion models have revolutionized generative AI, but their sheer size makes on device personalization, and thus effective federated learning (FL), infeasible. We propose Shared Backbone Personal Identity Representation Embeddings (SPIRE), a framework that casts per client diffusion based generation as conditional generation in FL. SPIRE factorizes the network into (i) a high… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  7. arXiv:2506.11437  [pdf, ps, other

    math.CO cs.SI

    Social Networks: Enumerating Maximal Community Patterns in $c$-Closed Graphs

    Authors: Gabriela Bourla, Kaixin Wang, Fan Wei, Runtian Zhou

    Abstract: Fox, Seshadhri, Roughgarden, Wei, and Wein (SICOMP 2020) introduced the model of $c$-closed graphs--a distribution-free model motivated by triadic closure, one of the most pervasive structural signatures of social networks. While enumerating maximal cliques in general graphs can take exponential time, it is known that in $c$-closed graphs, maximal cliques and maximal complete bipartite subgraphs c… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 38 pages

  8. arXiv:2506.06521  [pdf, ps, other

    cs.LG stat.ML

    Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

    Authors: Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du

    Abstract: We consider the gap-dependent regret bounds for episodic MDPs. We show that the Monotonic Value Propagation (MVP) algorithm achieves a variance-aware gap-dependent regret bound of… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 30 pages

  9. arXiv:2506.05115  [pdf, ps, other

    cs.RO

    Whole-Body Constrained Learning for Legged Locomotion via Hierarchical Optimization

    Authors: Haoyu Wang, Ruyi Zhou, Liang Ding, Tie Liu, Zhelin Zhang, Peng Xu, Haibo Gao, Zongquan Deng

    Abstract: Reinforcement learning (RL) has demonstrated impressive performance in legged locomotion over various challenging environments. However, due to the sim-to-real gap and lack of explainability, unconstrained RL policies deployed in the real world still suffer from inevitable safety issues, such as joint collisions, excessive torque, or foot slippage in low-friction environments. These problems limit… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  10. arXiv:2506.03144  [pdf, ps, other

    cs.CV cs.CL cs.MM

    MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

    Authors: Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li

    Abstract: Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios freq… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Preprint; Project Page, Code, and Dataset at: https://merit-2025.github.io/

  11. arXiv:2506.02875  [pdf, ps, other

    cs.CV

    NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

    Authors: Xiaohong Liu, Xiongkuo Min, Qiang Hu, Xiaoyun Zhang, Jie Guo, Guangtao Zhai, Shushi Wang, Yingjie Zhou, Lu Liu, Jingxin Li, Liu Yang, Farong Wen, Li Xu, Yanwei Jiang, Xilei Zhu, Chunyi Li, Zicheng Zhang, Huiyu Duan, Xiele Wu, Yixuan Gao, Yuqin Cao, Jun Jia, Wei Sun, Jiezhang Cao, Radu Timofte , et al. (70 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking he… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: NTIRE 2025 XGC Quality Assessment Challenge Report. arXiv admin note: text overlap with arXiv:2404.16687

  12. arXiv:2506.01538  [pdf, ps, other

    cs.RO cs.AI

    LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation

    Authors: Guobin Zhu, Rui Zhou, Wenkang Ji, Shiyu Zhao

    Abstract: Although Multi-Agent Reinforcement Learning (MARL) is effective for complex multi-robot tasks, it suffers from low sample efficiency and requires iterative manual reward tuning. Large Language Models (LLMs) have shown promise in single-robot settings, but their application in multi-robot systems remains largely unexplored. This paper introduces a novel LLM-Aided MARL (LAMARL) approach, which integ… ▽ More

    Submitted 3 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Robotics and Automation Letters

  13. arXiv:2506.00027  [pdf, other

    cs.CL

    From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling

    Authors: Zhengyu Chen, Yudong Wang, Teng Xiao, Ruochen Zhou, Xuesheng Yang, Wei Wang, Zhifang Sui, Jingang Wang

    Abstract: Recent advancements in improving the reasoning capabilities of Large Language Models have underscored the efficacy of Process Reward Models (PRMs) in addressing intermediate errors through structured feedback mechanisms. This study analyzes PRMs from multiple perspectives, including training methodologies, scalability, and generalization capabilities. We investigate the interplay between pre-train… ▽ More

    Submitted 24 May, 2025; originally announced June 2025.

  14. arXiv:2505.23341  [pdf, ps, other

    cs.CV

    DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification

    Authors: Daoxi Cao, Hangbei Cheng, Yijin Li, Ruolin Zhou, Xuehan Zhang, Xinyi Li, Binwei Li, Xuancheng Gu, Jianan Zhang, Xueyu Liu, Yongfei Wu

    Abstract: Whole-slide images (WSIs) are critical for cancer diagnosis due to their ultra-high resolution and rich semantic content. However, their massive size and the limited availability of fine-grained annotations pose substantial challenges for conventional supervised learning. We propose DSAGL (Dual-Stream Attention-Guided Learning), a novel weakly supervised classification framework that combines a te… ▽ More

    Submitted 27 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  15. arXiv:2505.19770  [pdf, ps, other

    cs.LG cs.CL

    Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

    Authors: Ruizhe Shi, Minhak Song, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du

    Abstract: We present a fine-grained theoretical analysis of the performance gap between reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) under a representation gap. Our study decomposes this gap into two sources: an explicit representation gap under exact optimization and an implicit representation gap under finite samples. In the exact optimization setting, we char… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 30 pages, 5 figures

  16. arXiv:2505.19108  [pdf, ps, other

    cs.CL cs.AI

    CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models

    Authors: Yongheng Zhang, Xu Liu, Ruoxi Zhou, Qiguang Chen, Hao Fei, Wenpeng Lu, Libo Qin

    Abstract: Investigating hallucination issues in large language models (LLMs) within cross-lingual and cross-modal scenarios can greatly advance the large-scale deployment in real-world applications. Nevertheless, the current studies are limited to a single scenario, either cross-lingual or cross-modal, leaving a gap in the exploration of hallucinations in the joint cross-lingual and cross-modal scenarios. M… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted at ACL 2025 Main Conference

  17. arXiv:2505.15612  [pdf, other

    cs.CL cs.AI cs.LG

    Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

    Authors: Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He

    Abstract: Large Reasoning Models (LRMs) have shown remarkable capabilities in solving complex problems through reinforcement learning (RL), particularly by generating long reasoning traces. However, these extended outputs often exhibit substantial redundancy, which limits the efficiency of LRMs. In this paper, we investigate RL-based approaches to promote reasoning efficiency. Specifically, we first present… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  18. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  19. arXiv:2505.14499  [pdf, ps, other

    cs.CL cs.AI

    Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales

    Authors: Jun Cao, Jiyi Li, Ziwei Yang, Renjie Zhou

    Abstract: There has been growing interest in Multimodal Aspect-Based Sentiment Analysis (MABSA) in recent years. Existing methods predominantly rely on pre-trained small language models (SLMs) to collect information related to aspects and sentiments from both image and text, with an aim to align these two modalities. However, small SLMs possess limited capacity and knowledge, often resulting in inaccurate i… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 15 pages, 2 figures, 6 tables. Accepted by ICONIP2024

  20. arXiv:2505.13856  [pdf, ps, other

    cs.CV

    SuperMapNet for Long-Range and High-Accuracy Vectorized HD Map Construction

    Authors: Ruqin Zhou, Chenguang Dai, Wanshou Jiang, Yongsheng Zhang, Hanyun Wang, San Jiang

    Abstract: Vectorized HD map is essential for autonomous driving. Remarkable work has been achieved in recent years, but there are still major issues: (1) in the generation of the BEV features, single modality-based methods are of limited perception capability, while direct concatenation-based multi-modal methods fail to capture synergies and disparities between different modalities, resulting in limited ran… ▽ More

    Submitted 4 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 13 pages, 9 figures

  21. arXiv:2505.13840  [pdf, other

    cs.CL cs.AI cs.LG

    EfficientLLM: Efficiency in Large Language Models

    Authors: Zhengqing Yuan, Weixiang Sun, Yixin Liu, Huichi Zhou, Rong Zhou, Yiyang Li, Zheyuan Zhang, Wei Song, Yue Huang, Haolong Jia, Keerthiram Murugesan, Yu Wang, Lifang He, Jianfeng Gao, Lichao Sun, Yanfang Ye

    Abstract: Large Language Models (LLMs) have driven significant progress, yet their growing parameter counts and context windows incur prohibitive compute, energy, and monetary costs. We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale. Conducted on a production-class cluster (48xGH200, 8xH200 GPUs), our study systematica… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  22. arXiv:2505.11908  [pdf, other

    cs.CL

    ELITE: Embedding-Less retrieval with Iterative Text Exploration

    Authors: Zhangyu Wang, Siyuan Gao, Rong Zhou, Hao Wang, Li Ning

    Abstract: Large Language Models (LLMs) have achieved impressive progress in natural language processing, but their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks. Retrieval-Augmented Generation (RAG) mitigates this by retrieving relevant information from an external corpus. However, existing RAG systems often rely on embedding-based retrieval trained… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  23. arXiv:2505.06516  [pdf, ps, other

    cs.CV

    Quantum Conflict Measurement in Decision Making for Out-of-Distribution Detection

    Authors: Yilin Dong, Tianyun Zhu, Xinde Li, Jean Dezert, Rigui Zhou, Changming Zhu, Lei Cao, Shuzhi Sam Ge

    Abstract: Quantum Dempster-Shafer Theory (QDST) uses quantum interference effects to derive a quantum mass function (QMF) as a fuzzy metric type from information obtained from various data sources. In addition, QDST uses quantum parallel computing to speed up computation. Nevertheless, the effective management of conflicts between multiple QMFs in QDST is a challenging question. This work aims to address th… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 16 pages, 28 figures

  24. arXiv:2505.01665  [pdf, other

    cs.LG

    Adaptively Point-weighting Curriculum Learning

    Authors: Wensheng Li, Hao Wang, Ruifeng Zhou, Hanting Guan, Chao Zhang, Dacheng Tao

    Abstract: Curriculum learning (CL) is referred to as a training strategy that makes easy samples learned first and then fits hard samples. It imitates the process of humans learning knowledge, and has become a potential manner of effectively training deep networks. In this study, we develop the adaptively point-weighting (APW) curriculum learning algorithm, which adaptively assigns the weight to every train… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  25. arXiv:2504.20389  [pdf, other

    cs.DC quant-ph

    CloudQC: A Network-aware Framework for Multi-tenant Distributed Quantum Computing

    Authors: Ruilin Zhou, Yuhang Gan, Yi Liu, Chen Qian

    Abstract: Distributed quantum computing (DQC) that allows a large quantum circuit to be executed simultaneously on multiple quantum processing units (QPUs) becomes a promising approach to increase the scalability of quantum computing. It is natural to envision the near-future DQC platform as a multi-tenant cluster of QPUs, called a Quantum Cloud. However, no existing DQC work has addressed the two key probl… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Accepted by ICDCS 2025

  26. arXiv:2504.20387  [pdf, other

    cs.PF cs.AR

    DEER: Deep Runahead for Instruction Prefetching on Modern Mobile Workloads

    Authors: Parmida Vahdatniya, Julian Humecki, Henry Kao, Tony Li, Ali Sedaghati, Fang Su, Ruoyu Zhou, Alex Bi, Reza Azimi, Maziar Goudarzi

    Abstract: Mobile workloads incur heavy frontend stalls due to increasingly large code footprints as well as long repeat cycles. Existing instruction-prefetching techniques suffer from low coverage, poor timeliness, or high cost. We provide a SW/HW co-designed I-prefetcher; DEER uses profile analysis to extract metadata information that allow the hardware to prefetch the most likely future instruction cachel… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 13 pages

  27. arXiv:2504.20101  [pdf, other

    cs.DC cs.AI

    GenTorrent: Scaling Large Language Model Serving with An Overley Network

    Authors: Fei Fang, Yifan Hua, Shengze Wang, Ruilin Zhou, Yi Liu, Chen Qian, Xiaoxue Zhang

    Abstract: While significant progress has been made in research and development on open-source and cost-efficient large-language models (LLMs), serving scalability remains a critical challenge, particularly for small organizations and individuals seeking to deploy and test their LLM innovations. Inspired by peer-to-peer networks that leverage decentralized overlay nodes to increase throughput and availabilit… ▽ More

    Submitted 30 April, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

  28. arXiv:2504.19598  [pdf, other

    cs.CV cs.AI

    Lightweight Adapter Learning for More Generalized Remote Sensing Change Detection

    Authors: Dou Quan, Rufan Zhou, Shuang Wang, Ning Huyan, Dong Zhao, Yunan Li, Licheng Jiao

    Abstract: Deep learning methods have shown promising performances in remote sensing image change detection (CD). However, existing methods usually train a dataset-specific deep network for each dataset. Due to the significant differences in the data distribution and labeling between various datasets, the trained dataset-specific deep network has poor generalization performances on other datasets. To solve t… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  29. arXiv:2504.19467  [pdf

    cs.CL cs.AI

    BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text

    Authors: Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder, Yixing Jiang, Valentina Carducci, Richard Wyss, Rishi J Desai, Emily Alsentzer, Leo Anthony Celi, Adam Rodman, Sebastian Schneeweiss, Jonathan H. Chen, Santiago Romero-Brufau, Kueiyu Joshua Lin, Jie Yang

    Abstract: Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, current evaluations of LLMs in clinical contexts remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world electronic health record (EHR) data. O… ▽ More

    Submitted 30 April, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  30. arXiv:2504.19350  [pdf, other

    cs.DS

    Optimal Static Fully Indexable Dictionaries

    Authors: Jingxun Liang, Renfei Zhou

    Abstract: Fully indexable dictionaries (FID) store sets of integer keys while supporting rank/select queries. They serve as basic building blocks in many succinct data structures. Despite the great importance of FIDs, no known FID is succinct with efficient query time when the universe size $U$ is a large polynomial in the number of keys $n$, which is the conventional parameter regime for dictionary problem… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 18 pages, 2 figures; in ICALP 2025

  31. arXiv:2504.18773  [pdf, other

    cs.CV

    Depth as Points: Center Point-based Depth Estimation

    Authors: Zhiheng Tu, Xinjian Huang, Yong He, Ruiyang Zhou, Bo Du, Weitao Wu

    Abstract: The perception of vehicles and pedestrians in urban scenarios is crucial for autonomous driving. This process typically involves complicated data collection, imposes high computational and hardware demands. To address these limitations, we first develop a highly efficient method for generating virtual datasets, which enables the creation of task- and scenario-specific datasets in a short time. Lev… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Depth Esitimation, Key-points, Virtual Datasets, Autonomous Driving

  32. arXiv:2504.16130  [pdf, other

    eess.SP cs.AI cs.LG

    A Self-supervised Learning Method for Raman Spectroscopy based on Masked Autoencoders

    Authors: Pengju Ren, Ri-gui Zhou, Yaochong Li

    Abstract: Raman spectroscopy serves as a powerful and reliable tool for analyzing the chemical information of substances. The integration of Raman spectroscopy with deep learning methods enables rapid qualitative and quantitative analysis of materials. Most existing approaches adopt supervised learning methods. Although supervised learning has achieved satisfactory accuracy in spectral analysis, it is still… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 15 pages, 10 figures

  33. arXiv:2504.14806  [pdf, ps, other

    cs.RO

    An Iterative Task-Driven Framework for Resilient LiDAR Place Recognition in Adverse Weather

    Authors: Xiongwei Zhao, Xieyuanli Chen, Xu Zhu, Xingxiang Xie, Haojie Bai, Congcong Wen, Rundong Zhou, Qihao Sun

    Abstract: LiDAR place recognition (LPR) plays a vital role in autonomous navigation. However, existing LPR methods struggle to maintain robustness under adverse weather conditions such as rain, snow, and fog, where weather-induced noise and point cloud degradation impair LiDAR reliability and perception accuracy. To tackle these challenges, we propose an Iterative Task-Driven Framework (ITDNet), which integ… ▽ More

    Submitted 19 June, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: Submitted to IEEE TVT

  34. arXiv:2504.10743  [pdf, ps, other

    cs.DS math.OC

    Robust Gittins for Stochastic Scheduling

    Authors: Benjamin Moseley, Heather Newman, Kirk Pruhs, Rudy Zhou

    Abstract: A common theme in stochastic optimization problems is that, theoretically, stochastic algorithms need to "know" relatively rich information about the underlying distributions. This is at odds with most applications, where distributions are rough predictions based on historical data. Thus, commonly, stochastic algorithms are making decisions using imperfect predicted distributions, while trying to… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 22 pages

  35. arXiv:2504.07491  [pdf, ps, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (70 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 23 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Updated Kimi-VL-A3B-Thinking-2506 information

  36. arXiv:2504.01450  [pdf, other

    cs.LG cs.CL

    CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models

    Authors: Runlong Zhou, Yi Zhang

    Abstract: Language models often struggle with cross-mode knowledge retrieval -- the ability to access knowledge learned in one format (mode) when queried in another. We demonstrate that models trained on multiple data sources (e.g., Wikipedia and TinyStories) exhibit significantly reduced accuracy when retrieving knowledge in a format different from its original training mode. This paper quantitatively inve… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  37. arXiv:2504.00816  [pdf, ps, other

    cs.CV physics.med-ph

    Two-stage deep learning framework for the restoration of incomplete-ring PET images

    Authors: Yeqi Fang, Rong Zhou

    Abstract: Positron Emission Tomography (PET) is an important molecular imaging tool widely used in medicine. Traditional PET systems rely on complete detector rings for full angular coverage and reliable data collection. However, incomplete-ring PET scanners have emerged due to hardware failures, cost constraints, or specific clinical needs. Standard reconstruction algorithms often suffer from performance d… ▽ More

    Submitted 4 June, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 20 pages, 7 figures

  38. arXiv:2503.24306  [pdf, other

    cs.CV

    Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge

    Authors: Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jonáš Šerých, Michal Neoral, Jiří Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queirós, Estêvão Lima, João L. Vilaça, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen , et al. (15 additional authors not shown)

    Abstract: Understanding tissue motion in surgery is crucial to enable applications in downstream tasks such as segmentation, 3D reconstruction, virtual tissue landmarking, autonomous probe-based scanning, and subtask autonomy. Labeled data are essential to enabling algorithms in these downstream tasks since they allow us to quantify and train algorithms. This paper introduces a point tracking challenge to a… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  39. arXiv:2503.23875  [pdf, other

    cs.RO cs.AI cs.MA

    GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models

    Authors: Wenkang Ji, Huaben Chen, Mingyang Chen, Guobin Zhu, Lufeng Xu, Roderich Groß, Rui Zhou, Ming Cao, Shiyu Zhao

    Abstract: The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has motivated research on methods to automatically create control policies. However, these methods require iterative processes of manually crafting and refining objective functions, thereby prolonging the development… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  40. arXiv:2503.22394  [pdf, other

    cs.CV cs.AI

    Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

    Authors: Rulin Zhou, Wenlong He, An Wang, Qiqi Yao, Haijun Hu, Jiankun Wang, Xi Zhang an Hongliang Ren

    Abstract: Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  41. arXiv:2503.13628  [pdf, other

    cs.DS

    Optimal Non-Oblivious Open Addressing

    Authors: Michael A. Bender, William Kuszmaul, Renfei Zhou

    Abstract: A hash table is said to be open-addressed (or non-obliviously open-addressed) if it stores elements (and free slots) in an array with no additional metadata. Intuitively, open-addressed hash tables must incur a space-time tradeoff: The higher the load factor at which the hash table operates, the longer insertions/deletions/queries should take. In this paper, we show that no such tradeoff exists:… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: 28 pages, 5 figures, in STOC 2025

  42. Shape-Kit: A Design Toolkit for Crafting On-Body Expressive Haptics

    Authors: Ran Zhou, Jianru Ding, Chenfeng Gao, Wanli Qian, Benjamin Erickson, Madeline Balaam, Daniel Leithinger, Ken Nakagaki

    Abstract: Driven by the vision of everyday haptics, the HCI community is advocating for "design touch first" and investigating "how to touch well." However, a gap remains between the exploratory nature of haptic design and technical reproducibility. We present Shape-Kit, a hybrid design toolkit embodying our "crafting haptics" metaphor, where hand touch is transduced into dynamic pin-based sensations that c… ▽ More

    Submitted 30 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: Full paper accepted to 2025 CHI Conference on Human Factors in Computing Systems (CHI'25) Updated acknowledgments and funding information

    ACM Class: H.5.2

  43. arXiv:2503.10437  [pdf, other

    cs.CV

    4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

    Authors: Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister

    Abstract: Learning 4D language fields to enable time-sensitive, open-ended language queries in dynamic scenes is essential for many real-world applications. While LangSplat successfully grounds CLIP features into 3D Gaussian representations, achieving precision and efficiency in 3D static scenes, it lacks the ability to handle dynamic 4D fields as CLIP, designed for static image-text tasks, cannot capture t… ▽ More

    Submitted 31 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project Page: https://4d-langsplat.github.io

  44. arXiv:2503.08942  [pdf, ps, other

    cs.LG

    Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

    Authors: Runlong Zhou, Maryam Fazel, Simon S. Du

    Abstract: Reinforcement learning from human feedback (RLHF) has become essential for improving language model capabilities, but traditional approaches rely on the assumption that human preferences follow a transitive Bradley-Terry model. This assumption fails to capture the non-transitive nature of populational human preferences. Nash learning from human feedback (NLHF), targeting non-transitive preferences… ▽ More

    Submitted 8 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  45. arXiv:2503.07877  [pdf, other

    stat.ML cs.IT cs.LG

    Cost-Aware Optimal Pairwise Pure Exploration

    Authors: Di Wu, Chengshuai Shi, Ruida Zhou, Cong Shen

    Abstract: Pure exploration is one of the fundamental problems in multi-armed bandits (MAB). However, existing works mostly focus on specific pure exploration tasks, without a holistic view of the general pure exploration problem. This work fills this gap by introducing a versatile framework to study pure exploration, with a focus on identifying the pairwise relationships between targeted arm pairs. Moreover… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: AISTATS 2025

  46. arXiv:2503.06072  [pdf, other

    cs.CL cs.AI

    Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning

    Authors: Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific per… ▽ More

    Submitted 20 May, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: 87 pages, 21 figures, 9 tables

  47. LiteChain: A Lightweight Blockchain for Verifiable and Scalable Federated Learning in Massive Edge Networks

    Authors: Handi Chen, Rui Zhou, Yun-Hin Chan, Zhihan Jiang, Xianhao Chen, Edith C. H. Ngai

    Abstract: Leveraging blockchain in Federated Learning (FL) emerges as a new paradigm for secure collaborative learning on Massive Edge Networks (MENs). As the scale of MENs increases, it becomes more difficult to implement and manage a blockchain among edge devices due to complex communication topologies, heterogeneous computation capabilities, and limited storage capacities. Moreover, the lack of a standar… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  48. arXiv:2503.04099  [pdf, other

    cs.CL cs.AI

    Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English

    Authors: Runtao Zhou, Guangya Wan, Saadia Gabriel, Sheng Li, Alexander J Gates, Maarten Sap, Thomas Hartvigsen

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning tasks, leading to their widespread deployment. However, recent studies have highlighted concerning biases in these models, particularly in their handling of dialectal variations like African American English (AAE). In this work, we systematically investigate dialectal disparities in LLM reasoning tasks. We develop… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: ARR Under Review, First two authors contribute equally

  49. arXiv:2503.01773  [pdf, other

    cs.CL

    Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

    Authors: Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Carlos Niebles, Mor Geva, Junxian He, Jiajun Wu, Manling Li

    Abstract: Large Vision Language Models (VLMs) have long struggled with spatial reasoning tasks. Surprisingly, even simple spatial reasoning tasks, such as recognizing "under" or "behind" relationships between only two objects, pose significant challenges for current VLMs. In this work, we study the spatial reasoning challenge from the lens of mechanistic interpretability, diving into the model's internal st… ▽ More

    Submitted 4 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  50. arXiv:2502.20040  [pdf, other

    eess.AS cs.AI cs.SD

    CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

    Authors: Nian Shao, Rui Zhou, Pengyu Wang, Xian Li, Ying Fang, Yujie Yang, Xiaofei Li

    Abstract: In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. The proposed network takes as input the noisy and reverberant microphone recording and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech wavefo… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Submission to IEEE/ACM Trans. on TASLP