Skip to main content

Showing 1–50 of 153 results for author: Ren, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.00982  [pdf, other

    cs.LG cs.DC

    Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization

    Authors: Shunxian Gu, Chaoqun You, Bangbang Ren, Lailong Luo, Junxu Xia, Deke Guo

    Abstract: Scaling deep neural network (DNN) training to more devices can reduce time-to-solution. However, it is impractical for users with limited computing resources. FOSI, as a hybrid order optimizer, converges faster than conventional optimizers by taking advantage of both gradient information and curvature information when updating the DNN model. Therefore, it provides a new chance for accelerating DNN… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  2. arXiv:2504.18768  [pdf, other

    cs.GR cs.CV

    TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians

    Authors: Letian Huang, Dongwei Ye, Jialin Dan, Chengzhi Tao, Huiwen Liu, Kun Zhou, Bo Ren, Yuanqi Li, Yanwen Guo, Jie Guo

    Abstract: The emergence of neural and Gaussian-based radiance field methods has led to considerable advancements in novel view synthesis and 3D object reconstruction. Nonetheless, specular reflection and refraction continue to pose significant challenges due to the instability and incorrect overfitting of radiance fields to high-frequency light variations. Currently, even 3D Gaussian Splatting (3D-GS), as a… ▽ More

    Submitted 1 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: accepted by SIGGRAPH 2025; https://letianhuang.github.io/transparentgs/

  3. arXiv:2504.14249  [pdf, other

    cs.CV

    Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Zongwei Wu, Yawei Li, Yidi Li, Danda Pani Paudel, Radu Timofte, Ming-Hsuan Yang, Luc Van Gool, Nicu Sebe

    Abstract: Restoring any degraded image efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices. Traditional solutions typically involve training dedicated models per degradation, resulting in inefficiency and redundancy. More recent approaches either introduce additional modules to learn visual prompts, significantly increasing mo… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Efficient All in One Image Restoration

  4. arXiv:2504.12276  [pdf, other

    cs.CV

    The Tenth NTIRE 2025 Image Denoising Challenge Report

    Authors: Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Jingyu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi , et al. (69 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  5. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  6. arXiv:2504.10685  [pdf, other

    cs.CV cs.AI

    NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results

    Authors: Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Timofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kaijin Zhang, Qingpeng Nong, Xiugang Dong, Hong Gao, Xiangsheng Zhou, Jiancheng Pan, Yanxing Liu, Xiao He, Jiahao Li, Yuze Sun, Xiaomeng Huang, Zhenyu Zhang, Ran Ma, Yuhan Liu, Zijian Zhuang, Shuai Yi, Yixiong Zou , et al. (37 additional authors not shown)

    Abstract: Cross-Domain Few-Shot Object Detection (CD-FSOD) poses significant challenges to existing object detection and few-shot detection models when applied across domains. In conjunction with NTIRE 2025, we organized the 1st CD-FSOD Challenge, aiming to advance the performance of current object detectors on entirely novel target domains with only limited labeled data. The challenge attracted 152 registe… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: accepted by CVPRW 25 @ NTIRE

  7. arXiv:2504.03553  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MA

    Agentic Knowledgeable Self-awareness

    Authors: Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

    Abstract: Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness dur… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Work in progress

  8. arXiv:2503.18052  [pdf, other

    cs.CV

    SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

    Authors: Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel

    Abstract: Recognizing arbitrary or previously unseen categories is essential for comprehensive real-world 3D scene understanding. Currently, all existing methods rely on 2D or textual modalities during training, or together at inference. This highlights a clear absence of a model capable of processing 3D data alone for learning semantics end-to-end, along with the necessary data to train such a model. Meanw… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Our code, model, and dataset will be released at https://github.com/unique1i/SceneSplat

  9. arXiv:2503.18016  [pdf, other

    cs.CV

    Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

    Authors: Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, Xuming Hu

    Abstract: Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI), particularly in enhancing the capabilities of large language models (LLMs) by enabling access to external, reliable, and up-to-date knowledge sources. In the context of AI-Generated Content (AIGC), RAG has proven invaluable by augmenting model outputs with supplementary, relevant information, t… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 19 pages, 10 figures

  10. arXiv:2503.17825  [pdf, other

    cs.CV

    Fractal-IR: A Unified Framework for Efficient and Scalable Image Restoration

    Authors: Yawei Li, Bin Ren, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Nicu Sebe, Ming-Hsuan Yang, Luca Benini

    Abstract: While vision transformers achieve significant breakthroughs in various image restoration (IR) tasks, it is still challenging to efficiently scale them across multiple types of degradations and resolutions. In this paper, we propose Fractal-IR, a fractal-based design that progressively refines degraded images by repeatedly expanding local information into broader regions. This fractal architecture… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  11. arXiv:2503.03329  [pdf

    cs.CV physics.med-ph

    Deep Learning-Based Diffusion MRI Tractography: Integrating Spatial and Anatomical Information

    Authors: Yiqiong Yang, Yitian Yuan, Baoxing Ren, Ye Wu, Yanqiu Feng, Xinyuan Zhang

    Abstract: Diffusion MRI tractography technique enables non-invasive visualization of the white matter pathways in the brain. It plays a crucial role in neuroscience and clinical fields by facilitating the study of brain connectivity and neurological disorders. However, the accuracy of reconstructed tractograms has been a longstanding challenge. Recently, deep learning methods have been applied to improve tr… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  12. arXiv:2503.01743  [pdf, other

    cs.CL cs.AI cs.LG

    Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

    Authors: Microsoft, :, Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, Dong Chen, Dongdong Chen, Junkun Chen, Weizhu Chen, Yen-Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami , et al. (51 additional authors not shown)

    Abstract: We introduce Phi-4-Mini and Phi-4-Multimodal, compact yet highly capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model trained on high-quality web and synthetic data, significantly outperforming recent open-source models of similar size and matching the performance of models twice its size on math and coding tasks requiring complex reasoning. This achievement… ▽ More

    Submitted 7 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: 39 pages

  13. arXiv:2502.10388  [pdf, other

    cs.CL

    Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction

    Authors: WonJin Yoon, Boyu Ren, Spencer Thomas, Chanwhi Kim, Guergana Savova, Mei-Hua Hall, Timothy Miller

    Abstract: Recent progress in large language models (LLMs) has enabled the automated processing of lengthy documents even without supervised training on a task-specific dataset. Yet, their zero-shot performance in complex tasks as opposed to straightforward information extraction tasks remains suboptimal. One feasible approach for tasks with lengthy, complex input is to first summarize the document and then… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  14. arXiv:2502.06915  [pdf, other

    cs.DC cs.LG

    Analytic Personalized Federated Meta-Learning

    Authors: Shunxian Gu, Chaoqun You, Deke Guo, Zhihao Qu, Bangbang Ren, Zaipeng Xie, Lailong Luo

    Abstract: Analytic Federated Learning (AFL) is an enhanced gradient-free federated learning (FL) paradigm designed to accelerate training by updating the global model in a single step with closed-form least-square (LS) solutions. However, the obtained global model suffers performance degradation across clients with heterogeneous data distribution. Meta-learning is a common approach to tackle this problem by… ▽ More

    Submitted 1 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  15. arXiv:2502.04268  [pdf, other

    cs.CV cs.AI

    Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

    Authors: Yi Yu, Botao Ren, Peiyuan Zhang, Mingxin Liu, Junwei Luo, Shaofeng Zhang, Feipeng Da, Junchi Yan, Xue Yang

    Abstract: With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging task setting with the layout among instances and present Point2RBox-v2. At the core are three principles: 1) Gaussian overlap loss. It learns an upper bound for ea… ▽ More

    Submitted 6 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: 11 pages, 5 figures, 10 tables

  16. arXiv:2501.08982  [pdf, other

    cs.CV

    CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation

    Authors: Qi Ma, Runyi Yang, Bin Ren, Nicu Sebe, Ender Konukoglu, Luc Van Gool, Danda Pani Paudel

    Abstract: Localizing textual descriptions within large-scale 3D scenes presents inherent ambiguities, such as identifying all traffic lights in a city. Addressing this, we introduce a method to generate distributions of camera poses conditioned on textual descriptions, facilitating robust reasoning for broadly defined concepts. Our approach employs a diffusion-based architecture to refine noisy 6DoF camer… ▽ More

    Submitted 3 February, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  17. arXiv:2501.02321  [pdf, other

    cs.CY

    KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation

    Authors: Yulong Li, Bolin Ren, Ke Hu, Changyuan Liu, Zhengyong Jiang, Kang Dang, Jionglong Su

    Abstract: Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to b… ▽ More

    Submitted 13 January, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

    Comments: AAAI 2025

  18. ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle

    Authors: Yinchuan Wang, Bin Ren, Xiang Zhang, Pengyu Wang, Chaoqun Wang, Rui Song, Yibin Li, Max Q. -H. Meng

    Abstract: LiDAR-based SLAM is recognized as one effective method to offer localization guidance in rough environments. However, off-the-shelf LiDAR-based SLAM methods suffer from significant pose estimation drifts, particularly components relevant to the vertical direction, when passing to uneven terrains. This deficiency typically leads to a conspicuously distorted global map. In this article, a LiDAR-base… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: This article has been accepted by Journal of Field Robotics

  19. arXiv:2501.01484  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG

    Sequencing Silicates in the IRS Debris Disk Catalog I: Methodology for Unsupervised Clustering

    Authors: Cicero X. Lu, Tushar Mittal, Christine H. Chen, Alexis Y. Li, Kadin Worthen, B. A. Sargent, Carey M. Lisse, G. C. Sloan, Dean C. Hines, Dan M. Watson, Isabel Rebollido, Bin B. Ren, Joel D. Green

    Abstract: Debris disks, which consist of dust, planetesimals, planets, and gas, offer a unique window into the mineralogical composition of their parent bodies, especially during the critical phase of terrestrial planet formation spanning 10 to a few hundred million years. Observations from the $\textit{Spitzer}$ Space Telescope have unveiled thousands of debris disks, yet systematic studies remain scarce,… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: 23 pages, 16 figures, Accepted to ApJS, $\texttt{CLUES}$ software available on GitHub

  20. arXiv:2412.10680  [pdf, other

    cs.CV cs.IR cs.MM

    UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval

    Authors: Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua

    Abstract: Universal Cross-Domain Retrieval (UCDR) retrieves relevant images from unseen domains and classes without semantic labels, ensuring robust generalization. Existing methods commonly employ prompt tuning with pre-trained vision-language models but are inherently limited by static prompts, reducing adaptability. We propose UCDR-Adapter, which enhances pre-trained models with adapters and dynamic prom… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: Accepted to WACV 2025. Project link: https://github.com/fine68/UCDR2024

  21. arXiv:2412.07237  [pdf, other

    cs.CV cs.AI cs.RO

    ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

    Authors: Jiayi Su, Youhe Feng, Zheng Li, Jinhua Song, Yangfan He, Botao Ren, Botian Xu

    Abstract: This paper presents a novel framework for modeling and conditional generation of 3D articulated objects. Troubled by flexibility-quality tradeoffs, existing methods are often limited to using predefined structures or retrieving shapes from static datasets. To address these challenges, we parameterize an articulated object as a tree of tokens and employ a transformer to generate both the object's h… ▽ More

    Submitted 3 April, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: CVPR 2025. impl. repo: https://github.com/ShuYuMo2003/ArtFormer

  22. arXiv:2411.18588  [pdf, other

    cs.CV

    Hierarchical Information Flow for Generalized Efficient Image Restoration

    Authors: Yawei Li, Bin Ren, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Nicu Sebe, Ming-Hsuan Yang, Luca Benini

    Abstract: While vision transformers show promise in numerous image restoration (IR) tasks, the challenge remains in efficiently generalizing and scaling up a model for multiple IR tasks. To strike a balance between efficiency and model capacity for a generalized transformer-based IR method, we propose a hierarchical information flow mechanism for image restoration, dubbed Hi-IR, which progressively propagat… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  23. arXiv:2411.15542  [pdf, other

    cs.CV

    Hierarchical Cross-Attention Network for Virtual Try-On

    Authors: Hao Tang, Bin Ren, Pingping Wu, Nicu Sebe

    Abstract: In this paper, we present an innovative solution for the challenges of the virtual try-on task: our novel Hierarchical Cross-Attention Network (HCANet). HCANet is crafted with two primary stages: geometric matching and try-on, each playing a crucial role in delivering realistic virtual try-on outcomes. A key feature of HCANet is the incorporation of a novel Hierarchical Cross-Attention (HCA) block… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  24. arXiv:2411.09834  [pdf, other

    cs.CL cs.AI

    A Benchmark for Long-Form Medical Question Answering

    Authors: Pedram Hosseini, Jessica M. Sin, Bing Ren, Bryceton G. Thomas, Elnaz Nouri, Ali Farahanchi, Saeed Hassanpour

    Abstract: There is a lack of benchmarks for evaluating large language models (LLMs) in long-form medical question answering (QA). Most existing medical QA evaluation benchmarks focus on automatic metrics and multiple-choice questions. While valuable, these benchmarks fail to fully capture or assess the complexities of real-world clinical applications where LLMs are being deployed. Furthermore, existing stud… ▽ More

    Submitted 19 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

    Comments: AIM-FM: Advancements in Medical Foundation Models Workshop, 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

  25. arXiv:2410.19690  [pdf

    cs.CV

    Deep Learning for Classification of Inflammatory Bowel Disease Activity in Whole Slide Images of Colonic Histopathology

    Authors: Amit Das, Tanmay Shukla, Naofumi Tomita, Ryland Richards, Laura Vidis, Bing Ren, Saeed Hassanpour

    Abstract: Grading inflammatory bowel disease (IBD) activity using standardized histopathological scoring systems remains challenging due to resource constraints and inter-observer variability. In this study, we developed a deep learning model to classify activity grades in hematoxylin and eosin-stained whole slide images (WSIs) from patients with IBD, offering a robust approach for general pathologists. We… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  26. arXiv:2410.08210  [pdf, other

    cs.CV cs.AI

    PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

    Authors: Botao Ren, Xue Yang, Yi Yu, Junwei Luo, Zhidong Deng

    Abstract: Single point supervised oriented object detection has gained attention and made initial progress within the community. Diverse from those approaches relying on one-shot samples or powerful pretrained models (e.g. SAM), PointOBB has shown promise due to its prior-free feature. In this paper, we propose PointOBB-v2, a simpler, faster, and stronger method to generate pseudo rotated boxes from points… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 13 pages, 4 figures, 5 tables

  27. arXiv:2410.08023  [pdf, other

    cs.CV cs.AI

    GrabDAE: An Innovative Framework for Unsupervised Domain Adaptation Utilizing Grab-Mask and Denoise Auto-Encoder

    Authors: Junzhou Chen, Xuan Wen, Ronghui Zhang, Bingtao Ren, Di Wu, Zhigang Xu, Danwei Wang

    Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt a model trained on a labeled source domain to an unlabeled target domain by addressing the domain shift. Existing Unsupervised Domain Adaptation (UDA) methods often fall short in fully leveraging contextual information from the target domain, leading to suboptimal decision boundary separation during source and target domain alignment. To address t… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  28. arXiv:2409.06714  [pdf, other

    eess.IV cs.CV

    FCDM: A Physics-Guided Bidirectional Frequency Aware Convolution and Diffusion-Based Model for Sinogram Inpainting

    Authors: Jiaze E, Srutarshi Banerjee, Tekin Bicer, Guannan Wang, Yanfu Zhang, Bin Ren

    Abstract: Computed tomography (CT) is widely used in industrial and medical imaging, but sparse-view scanning reduces radiation exposure at the cost of incomplete sinograms and challenging reconstruction. Existing RGB-based inpainting models struggle with severe feature entanglement, while sinogram-specific methods often lack explicit physics constraints. We propose FCDM, a physics-guided, frequency-aware s… ▽ More

    Submitted 8 March, 2025; v1 submitted 26 August, 2024; originally announced September 2024.

  29. arXiv:2408.14600  [pdf, other

    cs.CV

    PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

    Authors: Yidi Li, Jiahao Wen, Bin Ren, Wenhao Li, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features within regions of interest can lead to information loss and limitations in local feature representation. To tackle these challenges, we propose a novel two… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 3D Object Detection

  30. arXiv:2408.14585   

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 17 February, 2025; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: We request to withdraw our paper from arXiv due to unresolved author disagreements about the data interpretation and study conclusions. To maintain scientific integrity, we believe withdrawing the paper is necessary. We regret any confusion caused

  31. arXiv:2408.14498  [pdf, other

    stat.ML cs.LG

    Multi-Normal Prototypes Learning for Weakly Supervised Anomaly Detection

    Authors: Zhijin Dong, Hongzhi Liu, Boyuan Ren, Weimin Xiong, Zhonghai Wu

    Abstract: Anomaly detection is a crucial task in various domains. Most of the existing methods assume the normal sample data clusters around a single central prototype while the real data may consist of multiple categories or subgroups. In addition, existing methods always assume all unlabeled samples are normal while some of them are inevitably being anomalies. To address these issues, we propose a novel a… ▽ More

    Submitted 30 November, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  32. arXiv:2408.10906  [pdf, other

    cs.CV

    ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

    Authors: Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Danda Pani Paudel

    Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, w… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  33. arXiv:2408.01946  [pdf, other

    cs.CV

    Masked Angle-Aware Autoencoder for Remote Sensing Images

    Authors: Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao

    Abstract: To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} o… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by ECCV 2024

  34. arXiv:2407.13372  [pdf, other

    cs.CV

    Restore Anything Model via Efficient Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Zongwei Wu, Yawei Li, Yidi Li, Danda Pani Paudel, Radu Timofte, Ming-Hsuan Yang, Nicu Sebe

    Abstract: With the proliferation of mobile devices, the need for an efficient model to restore any degraded image has become increasingly significant and impactful. Traditional approaches typically involve training dedicated models for each specific degradation, resulting in inefficiency and redundancy. More recent solutions either introduce additional modules to learn visual prompts significantly increasin… ▽ More

    Submitted 18 December, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Efficient Any Image Restoration

  35. arXiv:2407.05862  [pdf, other

    cs.CV

    Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

    Authors: Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-ba… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

  36. arXiv:2406.06216  [pdf, other

    cs.CV

    Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

    Authors: Xin Jin, Pengyi Jiao, Zheng-Peng Duan, Xingchao Yang, Chun-Le Guo, Bo Ren, Chongyi Li

    Abstract: Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly usi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  37. arXiv:2405.20008  [pdf, other

    cs.CV

    Sharing Key Semantics in Transformer Makes Efficient Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Ming-Hsuan Yang, Nicu Sebe

    Abstract: Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the emergence of Vision Transformers (ViTs) has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated ob… ▽ More

    Submitted 18 December, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by NeurIPS2024

  38. arXiv:2404.19358  [pdf, other

    cs.IT

    QML-IB: Quantized Collaborative Intelligence between Multiple Devices and the Mobile Network

    Authors: Jingchen Peng, Boxiang Ren, Lu Yang, Chenghui Peng, Panpan Niu, Hao Wu

    Abstract: The integration of artificial intelligence (AI) and mobile networks is regarded as one of the most important scenarios for 6G. In 6G, a major objective is to realize the efficient transmission of task-relevant data. Then a key problem arises, how to design collaborative AI models for the device side and the network side, so that the transmitted data between the device and the network is efficient… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  39. arXiv:2404.13528  [pdf, other

    cs.LG cs.AI cs.DC

    SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

    Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

    Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  40. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  41. arXiv:2404.07560  [pdf, other

    cs.RO cs.AI

    Socially Pertinent Robots in Gerontological Healthcare

    Authors: Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard , et al. (19 additional authors not shown)

    Abstract: Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilitie… ▽ More

    Submitted 11 February, 2025; v1 submitted 11 April, 2024; originally announced April 2024.

  42. arXiv:2404.04140  [pdf, other

    cs.CV cs.LG

    Context-Aware Aerial Object Detection: Leveraging Inter-Object and Background Relationships

    Authors: Botao Ren, Botian Xu, Xue Yang, Yifan Pu, Jingyi Wang, Zhidong Deng

    Abstract: In most modern object detection pipelines, the detection proposals are processed independently given the feature map. Therefore, they overlook the underlying relationships between objects and the surrounding background, which could have provided additional context for accurate detection. Because aerial imagery is almost orthographic, the spatial relations in image space closely align with those in… ▽ More

    Submitted 28 November, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  43. arXiv:2403.00176  [pdf, other

    cs.LG cs.AI cs.PL

    SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

    Authors: Wei Niu, Gagan Agrawal, Bin Ren

    Abstract: Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a class… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  44. arXiv:2402.02634  [pdf, other

    cs.CV cs.LG eess.IV

    Key-Graph Transformer for Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 6 figures

  45. arXiv:2402.02339  [pdf, other

    cs.CV cs.AI cs.LG

    Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

    Authors: Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

    Abstract: Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, whi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  46. arXiv:2402.02088  [pdf, other

    cs.CV

    Mitigating Prior Shape Bias in Point Clouds via Differentiable Center Learning

    Authors: Zhe Li, Ziyang Zhang, Jinglin Zhao, Zheng Wang, Bocheng Ren, Debin Liu, Laurence T. Yang

    Abstract: Masked autoencoding and generative pretraining have achieved remarkable success in computer vision and natural language processing, and more recently, they have been extended to the point cloud domain. Nevertheless, existing point cloud models suffer from the issue of information leakage due to the pre-sampling of center points, which leads to trivial proxy tasks for the models. These approaches p… ▽ More

    Submitted 11 October, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  47. arXiv:2402.02045  [pdf, other

    cs.CV

    MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

    Authors: Zhe Li, Laurence T. Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, Stan Z. Li

    Abstract: The scarcity of annotated data has sparked significant interest in unsupervised pre-training methods that leverage medical reports as auxiliary signals for medical visual representation learning. However, existing research overlooks the multi-granularity nature of medical visual representation and lacks suitable contrastive learning techniques to improve the models' generalizability across differe… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  48. arXiv:2312.08520  [pdf, other

    cs.AI

    Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report)

    Authors: Dong Li, Ruoming Jin, Bin Ren

    Abstract: Inspired by the success of contrastive learning, we systematically examine recommendation losses, including listwise (softmax), pairwise (BPR), and pointwise (MSE and CCL) losses. In this endeavor, we introduce InfoNCE+, an optimized generalization of InfoNCE with balance coefficients, and highlight its performance advantages, particularly when aligned with our new decoupled contrastive loss, MINE… ▽ More

    Submitted 4 November, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: This manuscript was initially submitted for review in August 2023

  49. arXiv:2312.05460  [pdf, other

    stat.ML cs.LG

    Multi-source domain adaptation for regression

    Authors: Yujie Wu, Giovanni Parmigiani, Boyu Ren

    Abstract: Multi-source domain adaptation (DA) aims at leveraging information from more than one source domain to make predictions in a target domain, where different domains may have different data distributions. Most existing methods for multi-source DA focus on classification problems while there is only limited investigation in the regression settings. In this paper, we fill in this gap through a two-ste… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  50. arXiv:2312.03032  [pdf, other

    cs.CV

    ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models

    Authors: Weijie Wang, Wenqi Ren, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Nicu Sebe, Bruno Lepri

    Abstract: State-of-the-art 3D point cloud registration methods rely on labeled 3D datasets for training, which limits their practical applications in real-world scenarios and often hinders generalization to unseen scenes. Leveraging the zero-shot capabilities of foundation models offers a promising solution to these challenges. In this paper, we introduce ZeroReg, a zero-shot registration approach that util… ▽ More

    Submitted 15 December, 2024; v1 submitted 5 December, 2023; originally announced December 2023.