Skip to main content

Showing 1–50 of 220 results for author: Fan, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22784  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Single-Frame Point-Pixel Registration via Supervised Cross-Modal Feature Matching

    Authors: Yu Han, Zhiwei Huang, Yanting Zhang, Fangjun Ding, Shen Cai, Rui Fan

    Abstract: Point-pixel registration between LiDAR point clouds and camera images is a fundamental yet challenging task in autonomous driving and robotic perception. A key difficulty lies in the modality gap between unstructured point clouds and structured images, especially under sparse single-frame LiDAR settings. Existing methods typically extract features separately from point clouds and images, then rely… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  2. arXiv:2506.14965  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

    Authors: Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 38 pages, 9 figures. Under review

  3. arXiv:2506.14130  [pdf, ps, other

    cs.CV cs.AI cs.RO

    KDMOS:Knowledge Distillation for Motion Segmentation

    Authors: Chunyu Cao, Jintao Cheng, Zeyu Chen, Linfan Zhan, Rui Fan, Zhijian He, Xiaoyu Tang

    Abstract: Motion Object Segmentation (MOS) is crucial for autonomous driving, as it enhances localization, path planning, map construction, scene flow estimation, and future state prediction. While existing methods achieve strong performance, balancing accuracy and real-time inference remains a challenge. To address this, we propose a logits-based knowledge distillation framework for MOS, aiming to improve… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  4. arXiv:2506.04518   

    eess.AS cs.CL

    Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

    Authors: Haibin Wu, Yuxuan Hu, Ruchao Fan, Xiaofei Wang, Kenichi Kumatani, Bo Ren, Jianwei Yu, Heng Lu, Lijuan Wang, Yao Qian, Jinyu Li

    Abstract: Speech language models (Speech LMs) enable end-to-end speech-text modelling within a single model, offering a promising direction for spoken dialogue systems. The choice of speech-text jointly decoding paradigm plays a critical role in performance, efficiency, and alignment quality. In this work, we systematically compare representative joint speech-text decoding strategies-including the interleav… ▽ More

    Submitted 12 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: Our company need to do internal review

  5. arXiv:2505.20041  [pdf, ps, other

    cs.CV

    DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization

    Authors: Jianxin Huang, Jiahang Li, Sergey Vityazev, Alexander Dvorkovich, Rui Fan

    Abstract: RGB-D scene parsing methods effectively capture both semantic and geometric features of the environment, demonstrating great potential under challenging conditions such as extreme weather and low lighting. However, existing RGB-D scene parsing methods predominantly rely on supervised training strategies, which require a large amount of manually annotated pixel-level labels that are both time-consu… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 5 pages, 2 figures, accepted by IEEE Signal Processing Letters

  6. arXiv:2505.02025  [pdf, other

    cs.CV

    A Birotation Solution for Relative Pose Problems

    Authors: Hongbo Zhao, Ziwei Long, Mengtan Zhang, Hanli Wang, Qijun Chen, Rui Fan

    Abstract: Relative pose estimation, a fundamental computer vision problem, has been extensively studied for decades. Existing methods either estimate and decompose the essential matrix or directly estimate the rotation and translation to obtain the solution. In this article, we break the mold by tackling this traditional problem with a novel birotation solution. We first introduce three basis transformation… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  7. arXiv:2504.13828  [pdf, other

    cs.CL cs.AI

    Generative AI Act II: Test Time Scaling Drives Cognition Engineering

    Authors: Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu

    Abstract: The first generation of Large Language Models - what might be called "Act I" of generative AI (2020-2023) - achieved remarkable success through massive parameter and data scaling, yet exhibited fundamental limitations such as knowledge latency, shallow reasoning, and constrained cognitive processes. During this era, prompt engineering emerged as our primary interface with AI, enabling dialogue-lev… ▽ More

    Submitted 28 April, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: v3: add the comparison to existing work part; fix some errors

  8. arXiv:2503.18082  [pdf, other

    cs.CV eess.IV

    Vehicular Road Crack Detection with Deep Learning: A New Online Benchmark for Comprehensive Evaluation of Existing Algorithms

    Authors: Nachuan Ma, Zhengfei Song, Qiang Hu, Chuang-Wei Liu, Yu Han, Yanting Zhang, Rui Fan, Lihua Xie

    Abstract: In the emerging field of urban digital twins (UDTs), advancing intelligent road inspection (IRI) vehicles with automatic road crack detection systems is essential for maintaining civil infrastructure. Over the past decade, deep learning-based road crack detection methods have been developed to detect cracks more efficiently, accurately, and objectively, with the goal of replacing manual visual ins… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  9. arXiv:2503.18073  [pdf, other

    cs.CV cs.RO

    PanopticSplatting: End-to-End Panoptic Gaussian Splatting

    Authors: Yuxuan Xie, Xuan Yu, Changjian Jiang, Sitong Mao, Shunbo Zhou, Rui Fan, Rong Xiong, Yue Wang

    Abstract: Open-vocabulary panoptic reconstruction is a challenging task for simultaneous scene reconstruction and understanding. Recently, methods have been proposed for 3D scene understanding based on Gaussian splatting. However, these methods are multi-staged, suffering from the accumulated errors and the dependence of hand-designed components. To streamline the pipeline and achieve global optimization, w… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 8 pages, 6 figures

  10. arXiv:2503.14084  [pdf, other

    eess.IV cs.LG

    Semantic Communication in Dynamic Channel Scenarios: Collaborative Optimization of Dual-Pipeline Joint Source-Channel Coding and Personalized Federated Learning

    Authors: Xingrun Yan, Shiyuan Zuo, Yifeng Lyu, Rongfei Fan, Han Hu

    Abstract: Semantic communication is designed to tackle issues like bandwidth constraints and high latency in communication systems. However, in complex network topologies with multiple users, the enormous combinations of client data and channel state information (CSI) pose significant challenges for existing semantic communication architectures. To improve the generalization ability of semantic communicatio… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  11. arXiv:2503.01743  [pdf, other

    cs.CL cs.AI cs.LG

    Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

    Authors: Microsoft, :, Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, Dong Chen, Dongdong Chen, Junkun Chen, Weizhu Chen, Yen-Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami , et al. (51 additional authors not shown)

    Abstract: We introduce Phi-4-Mini and Phi-4-Multimodal, compact yet highly capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model trained on high-quality web and synthetic data, significantly outperforming recent open-source models of similar size and matching the performance of models twice its size on math and coding tasks requiring complex reasoning. This achievement… ▽ More

    Submitted 7 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: 39 pages

  12. arXiv:2502.16907  [pdf, other

    cs.CV cs.AI

    MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation

    Authors: Jiehao Luo, Jintao Cheng, Xiaoyu Tang, Qingwen Zhang, Bohuan Xue, Rui Fan

    Abstract: Scene flow estimation aims to predict 3D motion from consecutive point cloud frames, which is of great interest in autonomous driving field. Existing methods face challenges such as insufficient spatio-temporal modeling and inherent loss of fine-grained feature during voxelization. However, the success of Mamba, a representative state space model (SSM) that enables global modeling with linear comp… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  13. arXiv:2502.14309  [pdf, ps, other

    cs.LG cs.IT

    On Theoretical Limits of Learning with Label Differential Privacy

    Authors: Puning Zhao, Chuan Ma, Li Shen, Shaowei Wang, Rongfei Fan

    Abstract: Label differential privacy (DP) is designed for learning problems involving private labels and public features. While various methods have been proposed for learning under label DP, the theoretical limits remain largely unexplored. In this paper, we investigate the fundamental limits of learning with label DP in both local and central models for both classification and regression tasks, characteri… ▽ More

    Submitted 2 March, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  14. arXiv:2502.12102  [pdf

    cs.AI cs.ET

    Relational Norms for Human-AI Cooperation

    Authors: Brian D. Earp, Sebastian Porsdam Mann, Mateo Aboy, Edmond Awad, Monika Betzler, Marietjie Botes, Rachel Calcott, Mina Caraccio, Nick Chater, Mark Coeckelbergh, Mihaela Constantinescu, Hossein Dabbagh, Kate Devlin, Xiaojun Ding, Vilius Dranseika, Jim A. C. Everett, Ruiping Fan, Faisal Feroz, Kathryn B. Francis, Cindy Friedman, Orsolya Friedrich, Iason Gabriel, Ivar Hannikainen, Julie Hellmann, Arasj Khodadade Jahrome , et al. (37 additional authors not shown)

    Abstract: How we should design and interact with social artificial intelligence depends on the socio-relational role the AI is meant to emulate or occupy. In human society, relationships such as teacher-student, parent-child, neighbors, siblings, or employer-employee are governed by specific norms that prescribe or proscribe cooperative functions including hierarchy, care, transaction, and mating. These nor… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 76 pages, 2 figures

  15. arXiv:2502.08191  [pdf, other

    cs.SD eess.AS

    DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions

    Authors: Ke Xue, Rongfei Fan, Shanping Yu, Chang Sun, Jianping An

    Abstract: Target speaker extraction focuses on extracting a target speech signal from an environment with multiple speakers by leveraging an enrollment. Existing methods predominantly rely on speaker embeddings obtained from the enrollment, potentially disregarding the contextual information and the internal interactions between the mixture and enrollment. In this paper, we propose a novel DualStream Contex… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  16. arXiv:2502.06219  [pdf, other

    cs.CV

    Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing

    Authors: Sicen Guo, Tianyou Wen, Chuang-Wei Liu, Qijun Chen, Rui Fan

    Abstract: Recent vision foundation models (VFMs), typically based on Vision Transformer (ViT), have significantly advanced numerous computer vision tasks. Despite their success in tasks focused solely on RGB images, the potential of VFMs in RGB-depth driving scene parsing remains largely under-explored. In this article, we take one step toward this emerging research area by investigating a feasible techniqu… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 10 pages, 5 figures

  17. arXiv:2502.04517  [pdf, other

    cs.LG cs.CL

    Towards Cost-Effective Reward Guided Text Generation

    Authors: Ahmad Rashid, Ruotian Wu, Rongqi Fan, Hongliang Li, Agustinus Kristiadi, Pascal Poupart

    Abstract: Reward-guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback (RLHF). RGTG methods can align baseline language models to human preferences without further training like in standard RLHF methods. However, they rely on a reward model to score each candidate token generated by the language model at inference, incurring significant test-… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  18. arXiv:2502.00801  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Environment-Driven Online LiDAR-Camera Extrinsic Calibration

    Authors: Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan

    Abstract: LiDAR-camera extrinsic calibration (LCEC) is crucial for multi-modal data fusion in mechatronics. Existing methods, whether target-based or target-free, typically rely on customized calibration targets or fixed scene types, limiting their practicality in real-world applications. To address these challenges, we introduce EdO-LCEC, the first environment-driven online calibration approach. Unlike tra… ▽ More

    Submitted 28 June, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

  19. arXiv:2502.00712  [pdf, other

    eess.IV cs.AI cs.CV

    Registration-Enhanced Segmentation Method for Prostate Cancer in Ultrasound Images

    Authors: Shengtian Sang, Hassan Jahanandish, Cynthia Xinran Li, Indrani Bhattachary, Jeong Hoon Lee, Lichun Zhang, Sulaiman Vesal, Pejman Ghanouni, Richard Fan, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Prostate cancer is a major cause of cancer-related deaths in men, where early detection greatly improves survival rates. Although MRI-TRUS fusion biopsy offers superior accuracy by combining MRI's detailed visualization with TRUS's real-time guidance, it is a complex and time-intensive procedure that relies heavily on manual annotations, leading to potential errors. To address these challenges, we… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  20. arXiv:2502.00366  [pdf

    eess.IV cs.CV

    Prostate-Specific Foundation Models for Enhanced Detection of Clinically Significant Cancer

    Authors: Jeong Hoon Lee, Cynthia Xinran Li, Hassan Jahanandish, Indrani Bhattacharya, Sulaiman Vesal, Lichun Zhang, Shengtian Sang, Moon Hyung Choi, Simon John Christoph Soerensen, Steve Ran Zhou, Elijah Richard Sommer, Richard Fan, Pejman Ghanouni, Yuze Song, Tyler M. Seibert, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Accurate prostate cancer diagnosis remains challenging. Even when using MRI, radiologists exhibit low specificity and significant inter-observer variability, leading to potential delays or inaccuracies in identifying clinically significant cancers. This leads to numerous unnecessary biopsies and risks of missing clinically significant cancers. Here we present prostate vision contrastive network (P… ▽ More

    Submitted 4 February, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: 44pages

  21. arXiv:2502.00146  [pdf

    eess.IV cs.AI cs.CV

    Multimodal MRI-Ultrasound AI for Prostate Cancer Detection Outperforms Radiologist MRI Interpretation: A Multi-Center Study

    Authors: Hassan Jahanandish, Shengtian Sang, Cynthia Xinran Li, Sulaiman Vesal, Indrani Bhattacharya, Jeong Hoon Lee, Richard Fan, Geoffrey A. Sonna, Mirabela Rusu

    Abstract: Pre-biopsy magnetic resonance imaging (MRI) is increasingly used to target suspicious prostate lesions. This has led to artificial intelligence (AI) applications improving MRI-based detection of clinically significant prostate cancer (CsPCa). However, MRI-detected lesions must still be mapped to transrectal ultrasound (TRUS) images during biopsy, which results in missing CsPCa. This study systemat… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

  22. arXiv:2501.12084  [pdf, other

    cs.DC cs.AR cs.PF

    Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis

    Authors: Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Hongyuan Liu, Qiang Wang, Xiaowen Chu

    Abstract: Modern GPUs, with their specialized hardware like tensor cores, are essential for demanding AI and deep learning applications. This study presents a comprehensive, multi-level microbenchmarking analysis of the NVIDIA Hopper GPU architecture, delving into its performance characteristics and novel features. We benchmark Hopper's memory subsystem latency and throughput, comparing its L2 partitioned c… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.13499

  23. arXiv:2501.08880  [pdf, other

    cs.RO

    SLC$^2$-SLAM: Semantic-guided Loop Closure using Shared Latent Code for NeRF SLAM

    Authors: Yuhang Ming, Di Ma, Weichen Dai, Han Yang, Rui Fan, Guofeng Zhang, Wanzeng Kong

    Abstract: Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure using Shared Latent Code, dubbed SLC$^2$-SLAM. We argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local… ▽ More

    Submitted 18 March, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: Accepted to RAL. 8 pages, 5 figures, 5 tables

  24. arXiv:2501.07124  [pdf, other

    cs.LG

    LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch

    Authors: Zhengzhong Liu, Bowen Tan, Hongyi Wang, Willie Neiswanger, Tianhua Tao, Haonan Li, Fajri Koto, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Liqun Ma, Liping Tang, Nikhil Ranjan, Yonghao Zhuang, Guowei He, Renxi Wang, Mingkai Deng, Robin Algayres, Yuanzhi Li, Zhiqiang Shen, Preslav Nakov, Eric Xing

    Abstract: We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest LLMs trained?" remains unclear within the community. The implementation details for such high-capacity models are often protected due to business considerations… ▽ More

    Submitted 17 January, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  25. arXiv:2412.17699  [pdf, other

    cs.CV

    Establishing Reality-Virtuality Interconnections in Urban Digital Twins for Superior Intelligent Road Inspection

    Authors: Yikang Zhang, Chuang-Wei Liu, Jiahang Li, Yingbing Chen, Jie Cheng, Rui Fan

    Abstract: Road inspection is essential for ensuring road maintenance and traffic safety, as road defects gradually emerge and compromise road functionality. Traditional methods, which rely on manual evaluations, are labor-intensive, costly, and time-consuming. Although data-driven approaches are gaining traction, the scarcity and spatial sparsity of road defects in the real world pose significant challenges… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 13 pages, 9 figures

  26. arXiv:2412.17589  [pdf, other

    cs.AI cs.LG

    PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

    Authors: Yanheng He, Jiahe Jin, Shijie Xia, Jiadi Su, Runze Fan, Haoyang Zou, Xiangkun Hu, Pengfei Liu

    Abstract: Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step tow… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  27. arXiv:2412.17111  [pdf, other

    cs.CL

    Learning to Adapt to Low-Resource Paraphrase Generation

    Authors: Zhigen Li, Yanmeng Wang, Rizhao Fan, Ye Wang, Jianfeng Li, Shaojun Wang

    Abstract: Paraphrase generation is a longstanding NLP task and achieves great success with the aid of large corpora. However, transferring a paraphrasing model to another domain encounters the problem of domain shifting especially when the data is sparse. At the same time, widely using large pre-trained language models (PLMs) faces the overfitting problem when training on scarce labeled data. To mitigate th… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

    Journal ref: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), pages 1014 - 1022

  28. arXiv:2412.11210  [pdf, other

    cs.CV

    ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction

    Authors: Yi Feng, Yu Han, Xijing Zhang, Tanghui Li, Yanting Zhang, Rui Fan

    Abstract: Inferring the 3D structure of a scene from a single image is an ill-posed and challenging problem in the field of vision-centric autonomous driving. Existing methods usually employ neural radiance fields to produce voxelized 3D occupancy, lacking instance-level semantic reasoning and temporal photometric consistency. In this paper, we propose ViPOcc, which leverages the visual priors from vision f… ▽ More

    Submitted 10 January, 2025; v1 submitted 15 December, 2024; originally announced December 2024.

    Comments: accepted to AAAI25

  29. arXiv:2412.10997  [pdf, other

    eess.IV cs.CV cs.LG

    Mask Enhanced Deeply Supervised Prostate Cancer Detection on B-mode Micro-Ultrasound

    Authors: Lichun Zhang, Steve Ran Zhou, Moon Hyung Choi, Jeong Hoon Lee, Shengtian Sang, Adam Kinnaird, Wayne G. Brisbane, Giovanni Lughezzani, Davide Maffei, Vittorio Fasulo, Patrick Albers, Sulaiman Vesal, Wei Shao, Ahmed N. El Kaffas, Richard E. Fan, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Prostate cancer is a leading cause of cancer-related deaths among men. The recent development of high frequency, micro-ultrasound imaging offers improved resolution compared to conventional ultrasound and potentially a better ability to differentiate clinically significant cancer from normal tissue. However, the features of prostate cancer remain subtle, with ambiguous borders with normal tissue a… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  30. Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments

    Authors: Jianhao Jiao, Ruoyu Geng, Yuanhang Li, Ren Xin, Bowen Yang, Jin Wu, Lujia Wang, Ming Liu, Rui Fan, Dimitrios Kanoulas

    Abstract: The creation of a metric-semantic map, which encodes human-prior knowledge, represents a high-level abstraction of environments. However, constructing such a map poses challenges related to the fusion of multi-modal sensor data, the attainment of real-time mapping performance, and the preservation of structural and semantic information consistency. In this paper, we introduce an online metric-sema… ▽ More

    Submitted 29 November, 2024; originally announced December 2024.

    Comments: 12 pages, 9 figures, accepted to IEEE Transactions on Automation Science and Engineering

  31. arXiv:2411.03717  [pdf, other

    cs.CV

    These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion

    Authors: Chuang-Wei Liu, Yikang Zhang, Qijun Chen, Ioannis Pitas, Rui Fan

    Abstract: Stereo matching has emerged as a cost-effective solution for road surface 3D reconstruction, garnering significant attention towards improving both computational efficiency and accuracy. This article introduces decisive disparity diffusion (D3Stereo), marking the first exploration of dense deep feature matching that adapts pre-trained deep convolutional neural networks (DCNNs) to previously unseen… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: 13 pages, 7 figures

  32. arXiv:2411.02047  [pdf, other

    cs.LG stat.ML

    Theory-inspired Label Shift Adaptation via Aligned Distribution Mixture

    Authors: Ruidong Fan, Xiao Ouyang, Hong Tao, Yuhua Qian, Chenping Hou

    Abstract: As a prominent challenge in addressing real-world issues within a dynamic environment, label shift, which refers to the learning setting where the source (training) and target (testing) label distributions do not match, has recently received increasing attention. Existing label shift methods solely use unlabeled target samples to estimate the target label distribution, and do not involve them duri… ▽ More

    Submitted 5 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

  33. arXiv:2410.19274  [pdf, other

    cs.LG cs.AI cs.OS cs.PF

    Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management

    Authors: Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various domains, yet deploying them on mobile devices remains an arduous challenge due to their extensive computational and memory demands. While lightweight LLMs have been developed to fit mobile environments, they suffer from degraded model accuracy. In contrast, sparsity-based techniques minimize DRAM usage by selectively tran… ▽ More

    Submitted 29 October, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

  34. arXiv:2410.05146  [pdf, other

    cs.CL cs.AI eess.AS

    CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation

    Authors: Rui Zhao, Jinyu Li, Ruchao Fan, Matt Post

    Abstract: Models for streaming speech translation (ST) can achieve high accuracy and low latency if they're developed with vast amounts of paired audio in the source language and written text in the target language. Yet, these text labels for the target language are often pseudo labels due to the prohibitive cost of manual ST data labeling. In this paper, we introduce a methodology named Connectionist Tempo… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted by IEEE Spoken Language Technology Workshop (SLT 2024)

  35. arXiv:2409.05474  [pdf, other

    cs.CV cs.GR

    PVP-Recon: Progressive View Planning via Warping Consistency for Sparse-View Surface Reconstruction

    Authors: Sheng Ye, Yuze He, Matthieu Lin, Jenny Sheng, Ruoyu Fan, Yiheng Han, Yubin Hu, Ran Yi, Yu-Hui Wen, Yong-Jin Liu, Wenping Wang

    Abstract: Neural implicit representations have revolutionized dense multi-view surface reconstruction, yet their performance significantly diminishes with sparse input views. A few pioneering works have sought to tackle the challenge of sparse-view reconstruction by leveraging additional geometric priors or multi-scene generalizability. However, they are still hindered by the imperfect choice of input views… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  36. arXiv:2408.09891  [pdf, ps, other

    cs.LG cs.CR cs.DS

    Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

    Authors: Puning Zhao, Jiafei Wu, Zhe Liu, Chong Wang, Rongfei Fan, Qingming Li

    Abstract: We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of $d$ in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. O… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  37. arXiv:2408.09762  [pdf, other

    cs.LG

    Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets

    Authors: Xingrun Yan, Shiyuan Zuo, Rongfei Fan, Han Hu, Li Shen, Puning Zhao, Yong Luo

    Abstract: In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck. Hierarchical federated learning (HFL) that poses multiple edge servers (ESs) between clients and the PS can partially alleviate communication pressure but still needs the aggregation of model parameters from multiple ESs at the PS. T… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  38. arXiv:2408.09655  [pdf, other

    cs.LG stat.ML

    Contextual Bandits for Unbounded Context Distributions

    Authors: Puning Zhao, Rongfei Fan, Shaowei Wang, Li Shen, Qixin Zhang, Zong Ke, Tianhang Zheng

    Abstract: Nonparametric contextual bandit is an important model of sequential decision making problems. Under $α$-Tsybakov margin condition, existing research has established a regret bound of $\tilde{O}\left(T^{1-\frac{α+1}{d+2}}\right)$ for bounded supports. However, the optimal regret with unbounded contexts has not been analyzed. The challenge of solving contextual bandit problems with unbounded support… ▽ More

    Submitted 7 May, 2025; v1 submitted 18 August, 2024; originally announced August 2024.

  39. arXiv:2408.09539  [pdf, other

    cs.LG cs.DC

    Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets

    Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu

    Abstract: In practical federated learning (FL) systems, the presence of malicious Byzantine attacks and data heterogeneity often introduces biases into the learning process. However, existing Byzantine-robust methods typically only achieve a compromise between adaptability to different loss function types (including both strongly convex and non-convex) and robustness to heterogeneous datasets, but with non-… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  40. arXiv:2408.01803  [pdf, other

    cs.LG cs.CL

    STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

    Authors: Peijie Dong, Lujun Li, Yuedong Zhong, Dayou Du, Ruibo Fan, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu

    Abstract: In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision. Although LLMs have achieved remarkable performance, their memory-bound nature during the inference stage hinders the adoption of resource-constrained devices. Reducing weights to 1-bit precision through binarization substantially enhances computational efficiency. We observe that so… ▽ More

    Submitted 7 October, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  41. arXiv:2407.21631  [pdf, other

    cs.CV

    RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion

    Authors: Jianxin Huang, Jiahang Li, Ning Jia, Yuxiang Sun, Chengju Liu, Qijun Chen, Rui Fan

    Abstract: Task-specific data-fusion networks have marked considerable achievements in urban scene parsing. Among these networks, our recently proposed RoadFormer successfully extracts heterogeneous features from RGB images and surface normal maps and fuses these features through attention mechanisms, demonstrating compelling efficacy in RGB-Normal road scene parsing. However, its performance significantly d… ▽ More

    Submitted 22 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures, accepted by Transactions on Intelligent Vehicles 2024

  42. arXiv:2407.21530  [pdf, other

    cs.CL cs.LG

    Data Contamination Report from the 2024 CONDA Shared Task

    Authors: Oscar Sainz, Iker GarcĂ­a-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, Pengfei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao , et al. (3 additional authors not shown)

    Abstract: The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in cur… ▽ More

    Submitted 4 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database

  43. arXiv:2407.18038  [pdf, other

    cs.CV cs.RO

    TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework

    Authors: Guanfeng Tang, Zhiyuan Wu, Jiahang Li, Ping Zhong, Xieyuanli Chen, Huiming Lu, Rui Fan

    Abstract: Semantic segmentation and stereo matching, respectively analogous to the ventral and dorsal streams in our human brain, are two key components of autonomous driving perception systems. Addressing these two tasks with separate networks is no longer the mainstream direction in developing computer vision algorithms, particularly with the recent advances in large vision models and embodied artificial… ▽ More

    Submitted 10 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  44. arXiv:2407.05283  [pdf, other

    cs.CV

    SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

    Authors: Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

    Abstract: Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reco… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Intelligent Vehicles. Code is available at https://mias.group/SCIPaD

  45. arXiv:2406.15252  [pdf, other

    cs.CV cs.AI

    VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

    Authors: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

    Abstract: The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov… ▽ More

    Submitted 14 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  46. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in China

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Dehai Lang, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He , et al. (19 additional authors not shown)

    Abstract: The accurate and timely diagnosis of acute aortic syndromes (AAS) in patients presenting with acute chest pain remains a clinical challenge. Aortic CT angiography (CTA) is the imaging protocol of choice in patients with suspected AAS. However, due to economic and workflow constraints in China, the majority of suspected patients initially undergo non-contrast CT as the initial imaging testing, and… ▽ More

    Submitted 23 April, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

  47. arXiv:2406.12753  [pdf, other

    cs.CL cs.AI

    OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

    Authors: Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang , et al. (3 additional authors not shown)

    Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni… ▽ More

    Submitted 6 March, 2025; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by NeurIPS 2024

  48. arXiv:2406.10512  [pdf, other

    eess.AS cs.SD

    SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR

    Authors: Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan

    Abstract: Recently, speech foundation models have gained popularity due to their superiority in finetuning downstream ASR tasks. However, models finetuned on certain domains, such as LibriSpeech (adult read speech), behave poorly on other domains (child or noisy speech). One solution could be collecting as much labeled and diverse data as possible for joint finetuning on various domains. However, collecting… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to ICASSP 2024 SASB Workshop

  49. arXiv:2406.10507  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

    Authors: Ruchao Fan, Natarajan Balaji Shankar, Abeer Alwan

    Abstract: Speech foundation models (SFMs) have achieved state-of-the-art results for various speech tasks in supervised (e.g. Whisper) or self-supervised systems (e.g. WavLM). However, the performance of SFMs for child ASR has not been systematically studied. In addition, there is no benchmark for child ASR with standard evaluations, making the comparisons of novel ideas difficult. In this paper, we initiat… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in Interspeech 2024

  50. arXiv:2406.04485  [pdf, other

    cs.AI cs.CV

    GenAI Arena: An Open Evaluation Platform for Generative Models

    Authors: Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, Wenhu Chen

    Abstract: Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the n… ▽ More

    Submitted 11 November, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 9 pages,7 figures

    Journal ref: NeurIPS 2024