Skip to main content

Showing 1–50 of 247 results for author: Zhong, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02222  [pdf, ps, other

    cs.CV

    High-Fidelity Differential-information Driven Binary Vision Transformer

    Authors: Tian Gao, Zhiyuan Zhang, Kaijie Yin, Xu-Cheng Zhong, Hui Kong

    Abstract: The binarization of vision transformers (ViTs) offers a promising approach to addressing the trade-off between high computational/storage demands and the constraints of edge-device deployment. However, existing binary ViT methods often suffer from severe performance degradation or rely heavily on full-precision modules. To address these issues, we propose DIDB-ViT, a novel binary ViT that is highl… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2507.00411  [pdf, ps, other

    cs.LG

    Diffusion Disambiguation Models for Partial Label Learning

    Authors: Jinfu Fan, Xiaohui Zhong, Kangrui Ren, Jiangnan Li, Linqing Huang

    Abstract: Learning from ambiguous labels is a long-standing problem in practical machine learning applications. The purpose of \emph{partial label learning} (PLL) is to identify the ground-truth label from a set of candidate labels associated with a given instance. Inspired by the remarkable performance of diffusion models in various generation tasks, this paper explores their potential to denoise ambiguous… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  3. arXiv:2506.20370  [pdf, ps, other

    cs.CV cs.LG cs.MM

    InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking

    Authors: Abdullah All Tanvir, Xin Zhong

    Abstract: This paper introduces a novel deep learning framework for robust image zero-watermarking based on distortion-invariant feature learning. As a zero-watermarking scheme, our method leaves the original image unaltered and learns a reference signature through optimization in the feature space. The proposed framework consists of two key modules. In the first module, a feature extractor is trained via n… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  4. arXiv:2506.15084  [pdf, ps, other

    cs.SE cs.CV cs.HC

    An Empirical Study of Bugs in Data Visualization Libraries

    Authors: Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, Chengnian Sun

    Abstract: Data visualization (DataViz) libraries play a crucial role in presentation, data analysis, and application development, underscoring the importance of their accuracy in transforming data into visual representations. Incorrect visualizations can adversely impact user experience, distort information conveyance, and influence user perception and decision-making processes. Visual bugs in these librari… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Proc. ACM Softw. Eng. 2, FSE

  5. arXiv:2506.13144  [pdf, ps, other

    cs.DB

    EnhanceGraph: A Continuously Enhanced Graph-based Index for High-dimensional Approximate Nearest Neighbor Search

    Authors: Xiaoyao Zhong, Jiabao Jin, Peng Cheng, Mingyu Yang, Haoyang Li, Zhitao Shen, Heng Tao Shen, Jingkuan Song

    Abstract: Recently, Approximate Nearest Neighbor Search in high-dimensional vector spaces has garnered considerable attention due to the rapid advancement of deep learning techniques. We observed that a substantial amount of search and construction logs are generated throughout the lifespan of a graph-based index. However, these two types of valuable logs are not fully exploited due to the static nature of… ▽ More

    Submitted 23 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  6. arXiv:2506.08438  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Learning to Lead: Incentivizing Strategic Agents in the Dark

    Authors: Yuchen Wu, Xinyi Zhong, Zhuoran Yang

    Abstract: We study an online learning version of the generalized principal-agent model, where a principal interacts repeatedly with a strategic agent possessing private types, private rewards, and taking unobservable actions. The agent is non-myopic, optimizing a discounted sum of future rewards and may strategically misreport types to manipulate the principal's learning. The principal, observing only her o… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 81 pages, 7 figures

  7. arXiv:2506.03210  [pdf, other

    cs.LG cs.AI physics.ao-ph

    FuXi-Ocean: A Global Ocean Forecasting System with Sub-Daily Resolution

    Authors: Qiusheng Huang, Yuan Niu, Xiaohui Zhong, Anboyu Guo, Lei Chen, Dianjun Zhang, Xuefeng Zhang, Hao Li

    Abstract: Accurate, high-resolution ocean forecasting is crucial for maritime operations and environmental monitoring. While traditional numerical models are capable of producing sub-daily, eddy-resolving forecasts, they are computationally intensive and face challenges in maintaining accuracy at fine spatial and temporal scales. In contrast, recent data-driven approaches offer improved computational effici… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  8. arXiv:2506.02911  [pdf, other

    cs.CL cs.AI cs.CE cs.HC cs.LG

    Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning

    Authors: Yin Fang, Qiao Jin, Guangzhi Xiong, Bowen Jin, Xianrui Zhong, Siru Ouyang, Aidong Zhang, Jiawei Han, Zhiyong Lu

    Abstract: Cell type annotation is a key task in analyzing the heterogeneity of single-cell RNA sequencing data. Although recent foundation models automate this process, they typically annotate cells independently, without considering batch-level cellular context or providing explanatory reasoning. In contrast, human experts often annotate distinct cell types for different cell clusters based on their domain… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 28 pages; 16 tables; 7 figures; Code: https://github.com/ncbi-nlp/cell-o1

  9. Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination

    Authors: Xinliu Zhong, Kayhan Batmanghelich, Li Sun

    Abstract: Vision-language models pre-trained on large scale of unlabeled biomedical images and associated reports learn generalizable semantic representations. These multi-modal representations can benefit various downstream tasks in the biomedical domain. Contrastive learning is widely used to pre-train vision-language models for general natural images and associated captions. Despite its popularity, we fo… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 6 pages, 1 figure, accepted by 2024 IEEE Conference on Artificial Intelligence (CAI)

    Journal ref: 2024 IEEE Conference on Artificial Intelligence (CAI), 2024, 480-485

  10. arXiv:2506.01356  [pdf, ps, other

    cs.LG cs.RO eess.SY

    Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion

    Authors: Haoyu Li, Xiangru Zhong, Bin Hu, Huan Zhang

    Abstract: Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimations of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  11. arXiv:2506.00970  [pdf, ps, other

    cs.RO

    Globally Consistent RGB-D SLAM with 2D Gaussian Splatting

    Authors: Xingguang Zhong, Yue Pan, Liren Jin, Marija Popović, Jens Behley, Cyrill Stachniss

    Abstract: Recently, 3D Gaussian splatting-based RGB-D SLAM displays remarkable performance of high-fidelity 3D reconstruction. However, the lack of depth rendering consistency and efficient loop closure limits the quality of its geometric reconstructions and its ability to perform globally consistent mapping online. In this paper, we present 2DGS-SLAM, an RGB-D SLAM system using 2D Gaussian splatting as the… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 18 pages

  12. Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification

    Authors: Xinliu Zhong, Leo Hwa Liang, Angela S. Koh, Yeo Si Yong

    Abstract: Traditional diagnostic methods like colonoscopy are invasive yet critical tools necessary for accurately diagnosing colorectal cancer (CRC). Detection of CRC at early stages is crucial for increasing patient survival rates. However, colonoscopy is dependent on obtaining adequate and high-quality endoscopic images. Prolonged invasive procedures are inherently risky for patients, while suboptimal or… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 6 pages, 15 figures

    Journal ref: 2024 IEEE Conference on Artificial Intelligence (CAI), 2024, 839-844

  13. arXiv:2505.24739  [pdf, ps, other

    eess.IV cs.CV

    Contrast-Invariant Self-supervised Segmentation for Quantitative Placental MRI

    Authors: Xinliu Zhong, Ruiying Liu, Emily S. Nichols, Xuzhe Zhang, Andrew F. Laine, Emma G. Duerden, Yun Wang

    Abstract: Accurate placental segmentation is essential for quantitative analysis of the placenta. However, this task is particularly challenging in T2*-weighted placental imaging due to: (1) weak and inconsistent boundary contrast across individual echoes; (2) the absence of manual ground truth annotations for all echo times; and (3) motion artifacts across echoes caused by fetal and maternal movement. In t… ▽ More

    Submitted 4 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 8 pages, 20 figures

  14. arXiv:2505.23365  [pdf, ps, other

    cs.CV

    MCFNet: A Multimodal Collaborative Fusion Network for Fine-Grained Semantic Classification

    Authors: Yang Qiao, Xiaoyu Zhong, Xiaofeng Gu, Zhiguo Yu

    Abstract: Multimodal information processing has become increasingly important for enhancing image classification performance. However, the intricate and implicit dependencies across different modalities often hinder conventional methods from effectively capturing fine-grained semantic interactions, thereby limiting their applicability in high-precision classification tasks. To address this issue, we propose… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  15. arXiv:2505.20694  [pdf, ps, other

    cs.CV cs.LG

    Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets

    Authors: Xulin Gu, Xinhao Zhong, Zhixing Wei, Yimin Zhou, Shuoyang Sun, Bin Chen, Hongpeng Wang, Yuan Luo

    Abstract: Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video d… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  16. arXiv:2505.16138  [pdf, other

    cs.LG cs.DC

    Multimodal Online Federated Learning with Modality Missing in Internet of Things

    Authors: Heqiang Wang, Xiang Liu, Xiaoxiong Zhong, Lixing Chen, Fangming Liu, Weizhe Zhang

    Abstract: The Internet of Things (IoT) ecosystem generates vast amounts of multimodal data from heterogeneous sources such as sensors, cameras, and microphones. As edge intelligence continues to evolve, IoT devices have progressed from simple data collection units to nodes capable of executing complex computational tasks. This evolution necessitates the adoption of distributed learning strategies to effecti… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  17. arXiv:2505.15398  [pdf, ps, other

    cs.CV

    Expanding Zero-Shot Object Counting with Rich Prompts

    Authors: Huilin Zhu, Senyao Li, Jingling Yuan, Zhengwei Yang, Yu Guo, Wenxuan Liu, Xian Zhong, Shengfeng He

    Abstract: Expanding pre-trained zero-shot counting models to handle unseen categories requires more than simply adding new prompts, as this approach does not achieve the necessary alignment between text and visual features for accurate counting. We introduce RichCount, the first framework to address these limitations, employing a two-stage training strategy that enhances text encoding and strengthens the mo… ▽ More

    Submitted 26 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  18. arXiv:2505.14522  [pdf, ps, other

    cs.LG

    Interpretable Dual-Stream Learning for Local Wind Hazard Prediction in Vulnerable Communities

    Authors: Mahmuda Akhter Nishu, Chenyu Huang, Milad Roohi, Xin Zhong

    Abstract: Wind hazards such as tornadoes and straight-line winds frequently affect vulnerable communities in the Great Plains of the United States, where limited infrastructure and sparse data coverage hinder effective emergency response. Existing forecasting systems focus primarily on meteorological elements and often fail to capture community-specific vulnerabilities, limiting their utility for localized… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  19. arXiv:2505.13300  [pdf, other

    cs.CV

    DD-Ranking: Rethinking the Evaluation of Dataset Distillation

    Authors: Zekai Li, Xinhao Zhong, Samir Khaki, Zhiyuan Liang, Yuhao Zhou, Mingjia Shi, Ziqiao Wang, Xuanlei Zhao, Wangbo Zhao, Ziheng Qin, Mengxuan Wu, Pengfei Zhou, Haonan Wang, David Junhao Zhang, Jia-Wei Liu, Shaobo Wang, Dai Liu, Linfeng Zhang, Guang Li, Kun Wang, Zheng Zhu, Zhiheng Ma, Joey Tianyi Zhou, Jiancheng Lv, Yaochu Jin , et al. (27 additional authors not shown)

    Abstract: In recent years, dataset distillation has provided a reliable solution for data compression, where models trained on the resulting smaller synthetic datasets achieve performance comparable to those trained on the original datasets. To further improve the performance of synthetic datasets, various training pipelines and optimization objectives have been proposed, greatly advancing the field of data… ▽ More

    Submitted 21 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 20 pages, 4 figures

  20. arXiv:2505.10464  [pdf, ps, other

    eess.IV cs.CV

    HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation

    Authors: Jiaming Liang, Lihuan Dai, Xiaoqi Sheng, Xiangguang Chen, Chun Yao, Guihua Tao, Qibin Leng, Hongmin Cai, Xi Zhong

    Abstract: Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial res… ▽ More

    Submitted 26 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

    Comments: This work has been provisionally accepted for MICCAI 2025

  21. arXiv:2505.10018  [pdf, ps, other

    cs.RO

    LEMON-Mapping: Loop-Enhanced Large-Scale Multi-Session Point Cloud Merging and Optimization for Globally Consistent Mapping

    Authors: Lijie Wang, Xiaoyi Zhong, Ziyi Xu, Kaixin Chai, Anke Zhao, Tianyu Zhao, Changjian Jiang, Qianhao Wang, Fei Gao

    Abstract: Multi-robot collaboration is becoming increasingly critical and presents significant challenges in modern robotics, especially for building a globally consistent, accurate map. Traditional multi-robot pose graph optimization (PGO) methods ensure basic global consistency but ignore the geometric structure of the map, and only use loop closures as constraints between pose nodes, leading to divergenc… ▽ More

    Submitted 4 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  22. arXiv:2505.07671  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Benchmarking Retrieval-Augmented Generation for Chemistry

    Authors: Xianrui Zhong, Bowen Jin, Siru Ouyang, Yanzhen Shen, Qiao Jin, Yin Fang, Zhiyong Lu, Jiawei Han

    Abstract: Retrieval-augmented generation (RAG) has emerged as a powerful framework for enhancing large language models (LLMs) with external knowledge, particularly in scientific domains that demand specialized and dynamic information. Despite its promise, the application of RAG in the chemistry domain remains underexplored, primarily due to the lack of high-quality, domain-specific corpora and well-curated… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  23. arXiv:2505.07233  [pdf, ps, other

    cs.CL cs.AI

    DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation

    Authors: Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han

    Abstract: Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often under-explored component of these systems is the reranker. Since irrelevant documents in RAG systems can mislead the generator, the reranker plays a vital role in refining retrieved documents to enhance… ▽ More

    Submitted 15 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 24 pages, 7 figures, 15 tables

  24. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  25. arXiv:2505.06269  [pdf

    cs.LG

    A machine learning model for skillful climate system prediction

    Authors: Chenguang Zhou, Lei Chen, Xiaohui Zhong, Bo Lu, Hao Li, Libo Wu, Jie Wu, Jiahui Hu, Zesheng Dou, Pang-Chi Hsu, Xiaoye Zhang

    Abstract: Climate system models (CSMs), through integrating cross-sphere interactions among the atmosphere, ocean, land, and cryosphere, have emerged as pivotal tools for deciphering climate dynamics and improving forecasting capabilities. Recent breakthroughs in artificial intelligence (AI)-driven meteorological modeling have demonstrated remarkable success in single-sphere systems and partially spheres co… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  26. arXiv:2505.05504  [pdf, other

    eess.IV cs.CV

    Image Restoration via Multi-domain Learning

    Authors: Xingyu Jiang, Ning Gao, Xiuhui Zhang, Hongkun Dou, Shaowen Fu, Xiaoqing Zhong, Hongjue Li, Yue Deng

    Abstract: Due to adverse atmospheric and imaging conditions, natural images suffer from various degradation phenomena. Consequently, image restoration has emerged as a key solution and garnered substantial attention. Although recent Transformer architectures have demonstrated impressive success across various restoration tasks, their considerable model complexity poses significant challenges for both traini… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  27. arXiv:2505.00394  [pdf, other

    cs.CV

    SOTA: Spike-Navigated Optimal TrAnsport Saliency Region Detection in Composite-bias Videos

    Authors: Wenxuan Liu, Yao Deng, Kang Chen, Xian Zhong, Zhaofei Yu, Tiejun Huang

    Abstract: Existing saliency detection methods struggle in real-world scenarios due to motion blur and occlusions. In contrast, spike cameras, with their high temporal resolution, significantly enhance visual saliency maps. However, the composite noise inherent to spike camera imaging introduces discontinuities in saliency detection. Low-quality samples further distort model predictions, leading to saliency… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted to IJCAI 2025

  28. arXiv:2504.20530  [pdf, other

    cs.CV

    Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer

    Authors: Wenxuan Liu, Xian Zhong, Zhuo Zhou, Siyuan Yang, Chia-Wen Lin, Alex Chichung Kot

    Abstract: Action recognition in unmanned aerial vehicles (UAVs) poses unique challenges due to significant view variations along the vertical spatial axis. Unlike traditional ground-based settings, UAVs capture actions from a wide range of altitudes, resulting in considerable appearance discrepancies. We introduce a multi-view formulation tailored to varying UAV altitudes and empirically observe a partial o… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 11 pages

  29. arXiv:2504.19136  [pdf, ps, other

    cs.CV cs.AI eess.IV

    PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification

    Authors: Huiling Zheng, Xian Zhong, Bin Liu, Yi Xiao, Bihan Wen, Xiaofeng Li

    Abstract: The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and underutilized spectral complementarity. Existing methods often fail to decouple shared structural features from modality-complementary radiometric attributes, causing feature conflicts and information loss. To address this, we propose Phase-Amplitude Deco… ▽ More

    Submitted 3 July, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: 13 pages, 8 figures

  30. arXiv:2504.17102  [pdf, other

    math.OC cs.LG eess.SY

    Neural Contraction Metrics with Formal Guarantees for Discrete-Time Nonlinear Dynamical Systems

    Authors: Haoyu Li, Xiangru Zhong, Bin Hu, Huan Zhang

    Abstract: Contraction metrics are crucial in control theory because they provide a powerful framework for analyzing stability, robustness, and convergence of various dynamical systems. However, identifying these metrics for complex nonlinear systems remains an open challenge due to the lack of scalable and effective tools. This paper explores the approach of learning verifiable contraction metrics parametri… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Accepted by L4DC 2025

  31. arXiv:2504.15827  [pdf, other

    cs.LG cs.AI

    DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers

    Authors: Xuyang Zhong, Haochen Luo, Chen Liu

    Abstract: Existing machine unlearning (MU) approaches exhibit significant sensitivity to hyperparameters, requiring meticulous tuning that limits practical deployment. In this work, we first empirically demonstrate the instability and suboptimal performance of existing popular MU methods when deployed in different scenarios. To address this issue, we propose Dual Optimizer (DualOptim), which incorporates ad… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  32. arXiv:2504.08781  [pdf, other

    cs.CL cs.AI cs.IR

    Efficient Evaluation of Large Language Models via Collaborative Filtering

    Authors: Xu-Xiang Zhong, Chao Yi, Han-Jia Ye

    Abstract: With the development of Large Language Models (LLMs), numerous benchmarks have been proposed to measure and compare the capabilities of different LLMs. However, evaluating LLMs is costly due to the large number of test instances and their slow inference speed. In this paper, we aim to explore how to efficiently estimate a model's real performance on a given benchmark based on its evaluation result… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  33. arXiv:2504.03337  [pdf, other

    cs.CV

    QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning

    Authors: Quanxing Xu, Ling Zhou, Xian Zhong, Feifei Zhang, Rubing Huang, Chia-Wen Lin

    Abstract: Existing debiasing approaches in Visual Question Answering (VQA) primarily focus on enhancing visual learning, integrating auxiliary models, or employing data augmentation strategies. However, these methods exhibit two major drawbacks. First, current debiasing techniques fail to capture the superior relation between images and texts because prevalent learning frameworks do not enable models to ext… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  34. arXiv:2504.02605  [pdf, other

    cs.SE cs.AI cs.CL

    Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

    Authors: Daoguang Zan, Zhirong Huang, Wei Liu, Hanwu Chen, Linhao Zhang, Shulin Xin, Lu Chen, Qi Liu, Xiaojian Zhong, Aoyan Li, Siyao Liu, Yongsheng Xiao, Liangqiang Chen, Yuyu Zhang, Jing Su, Tianyu Liu, Rui Long, Kai Shen, Liang Xiang

    Abstract: The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue. However, existing benchmarks, such as SWE-bench, focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across diverse software ecosystems. To address this, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering Jav… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  35. arXiv:2503.24389  [pdf, other

    cs.CV cs.NE

    SU-YOLO: Spiking Neural Network for Efficient Underwater Object Detection

    Authors: Chenyang Li, Wenxuan Liu, Guoqiang Gong, Xiaobo Ding, Xian Zhong

    Abstract: Underwater object detection is critical for oceanic research and industrial safety inspections. However, the complex optical environment and the limited resources of underwater equipment pose significant challenges to achieving high accuracy and low power consumption. To address these issues, we propose Spiking Underwater YOLO (SU-YOLO), a Spiking Neural Network (SNN) model. Leveraging the lightwe… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  36. arXiv:2503.23480  [pdf, other

    cs.RO

    Improving Indoor Localization Accuracy by Using an Efficient Implicit Neural Map Representation

    Authors: Haofei Kuang, Yue Pan, Xingguang Zhong, Louis Wiesmann, Jens Behley, Cyrill Stachniss

    Abstract: Globally localizing a mobile robot in a known map is often a foundation for enabling robots to navigate and operate autonomously. In indoor environments, traditional Monte Carlo localization based on occupancy grid maps is considered the gold standard, but its accuracy is limited by the representation capabilities of the occupancy grid map. In this paper, we address the problem of building an effe… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 8 pages, 5 figures. Accepted to ICRA 2025

  37. arXiv:2503.21657  [pdf, other

    cs.LG cs.AI cs.CL

    Model Assembly Learning with Heterogeneous Layer Weight Merging

    Authors: Yi-Kai Zhang, Jin Wang, Xu-Xiang Zhong, De-Chuan Zhan, Han-Jia Ye

    Abstract: Model merging acquires general capabilities without extra data or training by combining multiple models' parameters. Previous approaches achieve linear mode connectivity by aligning parameters into the same loss basin using permutation invariance. In this paper, we introduce Model Assembly Learning (MAL), a novel paradigm for model merging that iteratively integrates parameters from diverse models… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: ICLR 2025 Workshop on Neural Network Weights as a New Data Modality

  38. arXiv:2503.20315  [pdf, other

    cs.CV

    SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams

    Authors: Hanwen Liang, Xian Zhong, Wenxuan Liu, Yajing Zheng, Wenxin Huang, Zhaofei Yu, Tiejun Huang

    Abstract: Restoring clear frames from rainy videos presents a significant challenge due to the rapid motion of rain streaks. Traditional frame-based visual sensors, which capture scene content synchronously, struggle to capture the fast-moving details of rain accurately. In recent years, neuromorphic sensors have introduced a new paradigm for dynamic scene perception, offering microsecond temporal resolutio… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  39. arXiv:2503.19940  [pdf, other

    physics.ao-ph cs.AI cs.LG

    FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling

    Authors: Qiusheng Huang, Xiaohui Zhong, Xu Fan, Lei Chen, Hao Li

    Abstract: Similar to conventional video generation, current deep learning-based weather prediction frameworks often lack explicit physical constraints, leading to unphysical outputs that limit their reliability for operational forecasting. Among various physical processes requiring proper representation, radiation plays a fundamental role as it drives Earth's weather and climate systems. However, accurate s… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  40. arXiv:2503.18094  [pdf, other

    cs.CV

    Anomize: Better Open Vocabulary Video Anomaly Detection

    Authors: Fei Li, Wenxuan Liu, Jingjing Chen, Ruixu Zhang, Yuran Wang, Xian Zhong, Zheng Wang

    Abstract: Open Vocabulary Video Anomaly Detection (OVVAD) seeks to detect and classify both base and novel anomalies. However, existing methods face two specific challenges related to novel anomalies. The first challenge is detection ambiguity, where the model struggles to assign accurate anomaly scores to unfamiliar anomalies. The second challenge is categorization confusion, where novel anomalies are ofte… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  41. arXiv:2503.17911  [pdf, ps, other

    cs.DB

    VSAG: An Optimized Search Framework for Graph-based Approximate Nearest Neighbor Search

    Authors: Xiaoyao Zhong, Haotian Li, Jiabao Jin, Mingyu Yang, Deming Chu, Xiangyu Wang, Zhitao Shen, Wei Jia, George Gu, Yi Xie, Xuemin Lin, Heng Tao Shen, Jingkuan Song, Peng Cheng

    Abstract: Approximate nearest neighbor search (ANNS) is a fundamental problem in vector databases and AI infrastructures. Recent graph-based ANNS algorithms have achieved high search accuracy with practical efficiency. Despite the advancements, these algorithms still face performance bottlenecks in production, due to the random memory access patterns of graph-based search and the high computational overhead… ▽ More

    Submitted 12 June, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

    Comments: the report of open-source library VSAG (https://github.com/antgroup/vsag)

  42. arXiv:2503.13805  [pdf, other

    cs.CV cs.LG cs.MM

    Text-Guided Image Invariant Feature Learning for Robust Image Watermarking

    Authors: Muhammad Ahtesham, Xin Zhong

    Abstract: Ensuring robustness in image watermarking is crucial for and maintaining content integrity under diverse transformations. Recent self-supervised learning (SSL) approaches, such as DINO, have been leveraged for watermarking but primarily focus on general feature representation rather than explicitly learning invariant features. In this work, we propose a novel text-guided invariant feature learning… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  43. arXiv:2503.11408  [pdf, other

    cs.LG cs.AI

    A Neural Network Architecture Based on Attention Gate Mechanism for 3D Magnetotelluric Forward Modeling

    Authors: Xin Zhong, Weiwei Ling, Kejia Pan, Pinxia Wu, Jiajing Zhang, Zhiliang Zhan, Wenbo Xiao

    Abstract: Traditional three-dimensional magnetotelluric (MT) numerical forward modeling methods, such as the finite element method (FEM) and finite volume method (FVM), suffer from high computational costs and low efficiency due to limitations in mesh refinement and computational resources. We propose a novel neural network architecture named MTAGU-Net, which integrates an attention gating mechanism for 3D… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 12 pages, 16 figures

  44. arXiv:2503.05110  [pdf, other

    cs.SD eess.AS

    UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation

    Authors: Weiguang Chen, Junjie Zhang, Jielong Yang, Eng Siong Chng, Xionghu Zhong

    Abstract: Array-geometry-agnostic speech separation (AGA-SS) aims to develop an effective separation method regardless of the microphone array geometry. Conventional methods rely on permutation-free operations, such as summation or attention mechanisms, to capture spatial information. However, these approaches often incur high computational costs or disrupt the effective use of spatial information during in… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 5 pages, Prepirnt

  45. arXiv:2503.02689  [pdf, other

    cs.CV

    STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

    Authors: Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang

    Abstract: Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-… ▽ More

    Submitted 29 April, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  46. arXiv:2503.00308  [pdf, other

    cs.CV

    Abstract Rendering: Computing All that is Seen in Gaussian Splat Scenes

    Authors: Yangge Li, Chenxi Ji, Xiangru Zhong, Huan Zhang, Sayan Mitra

    Abstract: We introduce abstract rendering, a method for computing a set of images by rendering a scene from a continuously varying range of camera positions. The resulting abstract image-which encodes an infinite collection of possible renderings-is represented using constraints on the image matrix, enabling rigorous uncertainty propagation through the rendering process. This capability is particularly valu… ▽ More

    Submitted 4 March, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

  47. arXiv:2502.21041  [pdf, other

    cs.LG cs.AI

    Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing

    Authors: Xuyang Zhong, Yixiao Huang, Chen Liu

    Abstract: This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_0$ norm. We demonstrate the challenges of employing $1$-step attacks on $l_0$ bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in $l_0$ adversarial training is caused by sub-optimal perturbat… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  48. arXiv:2502.20695  [pdf, other

    cs.IR

    Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search

    Authors: Yang Shi, Yiping Sun, Jiaolong Du, Xiaocheng Zhong, Zhiyong Wang, Yao Hu

    Abstract: Approximate Nearest Neighbor Search (ANNS) is essential for modern data-driven applications that require efficient retrieval of top-k results from massive vector databases. Although existing graph-based ANNS algorithms achieve a high recall rate on billion-scale datasets, their slow construction speed and limited scalability hinder their applicability to large-scale industrial scenarios. In this p… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW'25

  49. arXiv:2502.18519  [pdf, other

    eess.IV cs.AI cs.CV

    FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor Recognition

    Authors: Linshan Wu, Jiaxin Zhuang, Yanning Zhou, Sunan He, Jiabo Ma, Luyang Luo, Xi Wang, Xuefeng Ni, Xiaoling Zhong, Mingxiang Wu, Yinghua Zhao, Xiaohui Duan, Varut Vardhanabhuti, Pranav Rajpurkar, Hao Chen

    Abstract: Tumor is a leading cause of death worldwide, with an estimated 10 million deaths attributed to tumor-related diseases every year. AI-driven tumor recognition unlocks new possibilities for more precise and intelligent tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, which demands extensive annotation efforts by radiologists. To tackle t… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  50. arXiv:2502.17967  [pdf, other

    cs.LG cs.AI cs.CL cs.MA q-fin.ST

    LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena

    Authors: Tianmi Ma, Jiawei Du, Wenxin Huang, Wenjie Wang, Liang Xie, Xian Zhong, Joey Tianyi Zhou

    Abstract: Recent advancements in large language models (LLMs) have significantly improved performance in natural language processing tasks. However, their ability to generalize to dynamic, unseen tasks, particularly in numerical reasoning, remains a challenge. Existing benchmarks mainly evaluate LLMs on problems with predefined optimal solutions, which may not align with real-world scenarios where clear ans… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.