Skip to main content

Showing 1–50 of 1,505 results for author: Liu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04613  [pdf

    cs.CV cs.AI

    HiLa: Hierarchical Vision-Language Collaboration for Cancer Survival Prediction

    Authors: Jiaqi Cui, Lu Wen, Yuchen Fei, Bo Liu, Luping Zhou, Dinggang Shen, Yan Wang

    Abstract: Survival prediction using whole-slide images (WSIs) is crucial in cancer re-search. Despite notable success, existing approaches are limited by their reliance on sparse slide-level labels, which hinders the learning of discriminative repre-sentations from gigapixel WSIs. Recently, vision language (VL) models, which incorporate additional language supervision, have emerged as a promising solu-tion.… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI2025

  2. arXiv:2507.03886  [pdf, ps, other

    cs.CV

    ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments

    Authors: Guile Wu, Dongfeng Bai, Bingbing Liu

    Abstract: This work focuses on modeling dynamic urban environments for autonomous driving simulation. Contemporary data-driven methods using neural radiance fields have achieved photorealistic driving scene modeling, but they suffer from low rendering efficacy. Recently, some approaches have explored 3D Gaussian splatting for modeling dynamic urban scenes, enabling high-fidelity reconstruction and real-time… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Technical report

  3. arXiv:2507.01785  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

    Authors: Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang

    Abstract: Data quality is a critical driver of large language model performance, yet existing model-based selection methods focus almost exclusively on English. We introduce MuRating, a scalable framework that transfers high-quality English data-quality signals into a single rater for 17 target languages. MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-qualit… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.01735  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

    Authors: Kai Chen, Ruiyuan Gao, Lanqing Hong, Hang Xu, Xu Jia, Holger Caesar, Dengxin Dai, Bingbing Liu, Dzmitry Tsishkou, Songcen Xu, Chunjing Xu, Qiang Xu, Huchuan Lu, Dit-Yan Yeung

    Abstract: In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: ECCV 2024. Workshop page: https://coda-dataset.github.io/w-coda2024/

  5. arXiv:2507.01535  [pdf, ps, other

    cs.CV

    TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking

    Authors: Bingxi Liu, Calvin Chen, Junhao Li, Guyang Yu, Haoqian Song, Xuchen Liu, Jinqiang Cui, Hong Zhang

    Abstract: The Vision Transformer (ViT) model has long struggled with the challenge of quadratic complexity, a limitation that becomes especially critical in unmanned aerial vehicle (UAV) tracking systems, where data must be processed in real time. In this study, we explore the recently proposed State-Space Model, Mamba, leveraging its computational efficiency and capability for long-sequence modeling to eff… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 12 pages

  6. arXiv:2507.00304  [pdf

    cs.LG cs.NI

    MamNet: A Novel Hybrid Model for Time-Series Forecasting and Frequency Pattern Analysis in Network Traffic

    Authors: Yujun Zhang, Runlong Li, Xiaoxiang Liang, Xinhao Yang, Tian Su, Bo Liu, Yan Zhou

    Abstract: The abnormal fluctuations in network traffic may indicate potential security threats or system failures. Therefore, efficient network traffic prediction and anomaly detection methods are crucial for network security and traffic management. This paper proposes a novel network traffic prediction and anomaly detection model, MamNet, which integrates time-domain modeling and frequency-domain feature e… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 16 pages

  7. arXiv:2506.24119  [pdf, ps, other

    cs.AI cs.CL cs.LG

    SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

    Authors: Bo Liu, Leon Guertler, Simon Yu, Zichen Liu, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques

    Abstract: Recent advances in reinforcement learning have shown that language models can develop sophisticated reasoning through training on tasks with verifiable rewards, but these approaches depend on human-curated problem-answer pairs and domain-specific reward engineering. We introduce SPIRAL, a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving ve… ▽ More

    Submitted 30 June, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

    Comments: Work in Progress

  8. arXiv:2506.22866  [pdf, ps, other

    cs.CV cs.AI

    Region-Aware CAM: High-Resolution Weakly-Supervised Defect Segmentation via Salient Region Perception

    Authors: Hang-Cheng Dong, Lu Zou, Bingguo Liu, Dong Ye, Guodong Liu

    Abstract: Surface defect detection plays a critical role in industrial quality inspection. Recent advances in artificial intelligence have significantly enhanced the automation level of detection processes. However, conventional semantic segmentation and object detection models heavily rely on large-scale annotated datasets, which conflicts with the practical requirements of defect detection tasks. This pap… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  9. arXiv:2506.21895  [pdf, ps, other

    cs.CV

    Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning

    Authors: Fangling Jiang, Qi Li, Weining Wang, Gang Wang, Bing Liu, Zhenan Sun

    Abstract: Recently the emergence of novel presentation attacks has drawn increasing attention to face anti-spoofing. However, existing methods tend to memorize data patterns from the training set, resulting in poor generalization to unknown attack types across different scenarios and limited interpretability. To address these challenges, this paper presents a reinforcement fine-tuning-based face anti-spoofi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  10. arXiv:2506.20114  [pdf, ps, other

    stat.ML cs.LG

    Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

    Authors: Brian Liu, Rahul Mazumder, Peter Radchenko

    Abstract: Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually exami… ▽ More

    Submitted 2 July, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

  11. arXiv:2506.19781  [pdf, ps, other

    cs.RO cs.NI

    The Starlink Robot: A Platform and Dataset for Mobile Satellite Communication

    Authors: Boyi Liu, Qianyi Zhang, Qiang Yang, Jianhao Jiao, Jagmohan Chauhan, Dimitrios Kanoulas

    Abstract: The integration of satellite communication into mobile devices represents a paradigm shift in connectivity, yet the performance characteristics under motion and environmental occlusion remain poorly understood. We present the Starlink Robot, the first mobile robotic platform equipped with Starlink satellite internet, comprehensive sensor suite including upward-facing camera, LiDAR, and IMU, design… ▽ More

    Submitted 26 June, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

  12. arXiv:2506.19468  [pdf, ps, other

    cs.CL cs.AI

    MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages

    Authors: Wenhan Han, Yifan Zhang, Zhixun Chen, Binbin Liu, Haobin Lin, Bingni Zhang, Taifeng Wang, Mykola Pechenizkiy, Meng Fang, Yin Zheng

    Abstract: Multilingual large language models (LLMs) are advancing rapidly, with new models frequently claiming support for an increasing number of languages. However, existing evaluation datasets are limited and lack cross-lingual alignment, leaving assessments of multilingual capabilities fragmented in both language and skill coverage. To address this, we introduce MuBench, a benchmark covering 61 language… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  13. arXiv:2506.19094  [pdf, ps, other

    q-bio.NC cs.CE

    Accurate identification of communication between multiple interacting neural populations

    Authors: Belle Liu, Jacob Sacks, Matthew D. Golub

    Abstract: Neural recording technologies now enable simultaneous recording of population activity across many brain regions, motivating the development of data-driven models of communication between brain regions. However, existing models can struggle to disentangle the sources that influence recorded neural populations, leading to inaccurate portraits of inter-regional communication. Here, we introduce Mult… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Journal ref: Forty-second International Conference on Machine Learning (2025)

  14. arXiv:2506.19030  [pdf, ps, other

    cs.NI

    WiLLM: An Open Wireless LLM Communication System

    Authors: Boyi Liu, Yongguang Lu, Jianguo Zhao, Qiang Yang, Wen Wu, Lin Chen, Jagmohan Chauhan, Jun Zhang

    Abstract: The rapid evolution of LLMs threatens to overwhelm existing wireless infrastructure, necessitating architectural innovations for burgeoning mobile LLM services. This paper introduces WiLLM, the first open-source wireless system specifically designed for these services. First, we establish a new paradigm by deploying LLMs in core networks (CNs) with abundant GPUs. This enables distributed inference… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  15. arXiv:2506.17939  [pdf, ps, other

    cs.CV cs.AI

    GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning

    Authors: Bo Liu, Xiangyu Zhao, Along He, Yidi Chen, Huazhu Fu, Xiao-Ming Wu

    Abstract: Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images. While recent advances in multi-modal learning have significantly improved performance, current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and tr… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Work in Progress

  16. arXiv:2506.17600  [pdf, ps, other

    cs.IR

    A novel fast short-time root music method for vibration monitoring of high-speed spindles

    Authors: Huiguang Zhang, Baoguo Liu, Wei Feng, Zongtang Li

    Abstract: Ultra-high-speed spindle bearings challenge traditional vibration monitoring due to broadband noise, non-stationarity, and limited time-frequency resolution. We present a fast Short-Time Root-MUSIC (fSTrM) algorithm that exploits FFT-accelerated Lanczos bidiagonalization to reduce computational complexity from $\mathcal{O}(N^3)$ to $SN\log_2N+S^2(N+S)+M^2(N+M)$ while preserving parametric supe… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  17. arXiv:2506.15864  [pdf, ps, other

    cs.LG

    Improving Rectified Flow with Boundary Conditions

    Authors: Xixi Hu, Runlong Liao, Keyang Xu, Bo Liu, Yeqing Li, Eugene Ie, Hongliang Fei, Qiang Liu

    Abstract: Rectified Flow offers a simple and effective approach to high-quality generative modeling by learning a velocity field. However, we identify a limitation in directly modeling the velocity with an unconstrained neural network: the learned velocity often fails to satisfy certain boundary conditions, leading to inaccurate velocity field estimations that deviate from the desired ODE. This issue is par… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 14 pages

  18. arXiv:2506.15181  [pdf, ps, other

    cs.LG

    ImprovDML: Improved Trade-off in Private Byzantine-Resilient Distributed Machine Learning

    Authors: Bing Liu, Chengcheng Zhao, Li Chai, Peng Cheng, Yaonan Wang

    Abstract: Jointly addressing Byzantine attacks and privacy leakage in distributed machine learning (DML) has become an important issue. A common strategy involves integrating Byzantine-resilient aggregation rules with differential privacy mechanisms. However, the incorporation of these techniques often results in a significant degradation in model accuracy. To address this issue, we propose a decentralized… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  19. arXiv:2506.14813  [pdf, ps, other

    cs.LG cs.AI

    Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks

    Authors: Yuxuan Jiang, Ziming Zhou, Boyu Xu, Beijie Liu, Runhui Xu, Peng Huang

    Abstract: Training deep learning (DL) models is a complex process, making it prone to silent errors that are challenging to detect and diagnose. This paper presents TRAINCHECK, a framework that takes a proactive checking approach to address silent training errors. TRAINCHECK automatically infers invariants tailored for DL training. It uses these invariants to proactively detect silent errors during the trai… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 19 pages, to appear in 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI '25)

  20. arXiv:2506.14436  [pdf, ps, other

    cs.LG

    MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

    Authors: Shen Yuan, Yin Zheng, Taifeng Wang, Binbin Liu, Hongteng Xu

    Abstract: Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ''model MoE-ization'' strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular valu… ▽ More

    Submitted 30 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: 24 pages, 6 figures

  21. arXiv:2506.14418  [pdf, ps, other

    cs.CV cs.AI

    Compositional Attribute Imbalance in Vision Datasets

    Authors: Jiayi Chen, Yanbiao Ma, Andi Zhang, Weidong Tang, Wei Dai, Bowei Liu

    Abstract: Visual attribute imbalance is a common yet underexplored issue in image classification, significantly impacting model performance and generalization. In this work, we first define the first-level and second-level attributes of images and then introduce a CLIP-based framework to construct a visual attribute dictionary, enabling automatic evaluation of image attributes. By systematically analyzing b… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  22. arXiv:2506.13317  [pdf, ps, other

    cs.IT eess.SP

    A Contemporary Survey on Fluid Antenna Systems: Fundamentals and Networking Perspectives

    Authors: Hanjiang Hong, Kai-Kit Wong, Hao Xu, Xinghao Guo, Farshad Rostami Ghadi, Yu Chen, Yin Xu, Chan-Byoung Chae, Baiyang Liu, Kin-Fai Tong, Yangyang Zhang

    Abstract: The explosive growth of teletraffic, fueled by the convergence of cyber-physical systems and data-intensive applications, such as the Internet of Things (IoT), autonomous systems, and immersive communications, demands a multidisciplinary suite of innovative solutions across the physical and network layers. Fluid antenna systems (FAS) represent a transformative advancement in antenna design, offeri… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  23. arXiv:2506.13183  [pdf, ps, other

    cs.CV

    MT-PCR: A Hybrid Mamba-Transformer with Spatial Serialization for Hierarchical Point Cloud Registration

    Authors: Bingxi Liu, An Liu, Hao Chen, Jinqiang Cui, Yiqun Wang, Hong Zhang

    Abstract: Point cloud registration (PCR) is a fundamental task in 3D computer vision and robotics. Most existing learning-based PCR methods rely on Transformers, which suffer from quadratic computational complexity. This limitation restricts the resolution of point clouds that can be processed, inevitably leading to information loss. In contrast, Mamba-a recently proposed model based on state space models (… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 11 Pages

  24. arXiv:2506.13133  [pdf, ps, other

    cs.CV

    EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition

    Authors: Bingxi Liu, Hao Chen, Shiyi Guo, Yihong Wu, Jinqiang Cui, Hong Zhang

    Abstract: Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision in which re-ranking based on local features is commonly employed to improve performance. In robotics, VPR is also referred to as Loop Closure Detection, which emphasizes spatial-temporal verification within a sequence. However, designing local features specifically for VPR is impractical, and relying on m… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 17 Pages

  25. arXiv:2506.13073  [pdf, ps, other

    cs.CV

    SuperPlace: The Renaissance of Classical Feature Aggregation for Visual Place Recognition in the Era of Foundation Models

    Authors: Bingxi Liu, Pengju Zhang, Li He, Hao Chen, Shiyi Guo, Yihong Wu, Jinqiang Cui, Hong Zhang

    Abstract: Recent visual place recognition (VPR) approaches have leveraged foundation models (FM) and introduced novel aggregation techniques. However, these methods have failed to fully exploit key concepts of FM, such as the effective utilization of extensive training sets, and they have overlooked the potential of classical aggregation methods, such as GeM and NetVLAD. Building on these insights, we reviv… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 11 pages

  26. arXiv:2506.13036  [pdf

    cs.LG

    Forecast-Then-Optimize Deep Learning Methods

    Authors: Jinhang Jiang, Nan Wu, Ben Liu, Mei Feng, Xin Ji, Karthik Srinivasan

    Abstract: Time series forecasting underpins vital decision-making across various sectors, yet raw predictions from sophisticated models often harbor systematic errors and biases. We examine the Forecast-Then-Optimize (FTO) framework, pioneering its systematic synopsis. Unlike conventional Predict-Then-Optimize (PTO) methods, FTO explicitly refines forecasts through optimization techniques such as ensemble m… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 44 pages, 2 figures

  27. arXiv:2506.12530  [pdf, ps, other

    cs.CV

    Towards Seamless Borders: A Method for Mitigating Inconsistencies in Image Inpainting and Outpainting

    Authors: Xingzhong Hou, Jie Wu, Boxiao Liu, Yi Zhang, Guanglu Song, Yunpeng Liu, Yu Liu, Haihang You

    Abstract: Image inpainting is the task of reconstructing missing or damaged parts of an image in a way that seamlessly blends with the surrounding content. With the advent of advanced generative models, especially diffusion models and generative adversarial networks, inpainting has achieved remarkable improvements in visual quality and coherence. However, achieving seamless continuity remains a significant… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  28. arXiv:2506.12331  [pdf, ps, other

    cs.MA cs.AI

    IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment

    Authors: Dekun Wu, Frederik Brudy, Bang Liu, Yi Wang

    Abstract: Virtual environments are essential to AI agent research. Existing environments for LLM agent research typically focus on either physical task solving or social simulation, with the former oversimplifying agent individuality and social dynamics, and the latter lacking physical grounding of social behaviors. We introduce IndoorWorld, a heterogeneous multi-agent environment that tightly integrates ph… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  29. arXiv:2506.12040  [pdf, other

    cs.LG cs.AI cs.CV

    BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook

    Authors: Hao Gu, Lujun Li, Zheyu Wang, Bei Liu, Qiyuan Zhu, Sirui Han, Yike Guo

    Abstract: Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to $\pm$1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask… ▽ More

    Submitted 23 May, 2025; originally announced June 2025.

  30. arXiv:2506.08403  [pdf, ps, other

    cs.CL cs.AI

    TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration

    Authors: Weiya Li, Junjie Chen, Bei Li, Boyang Liu, Zichen Wen, Nuanqiao Shan, Xiaoqian Liu, Anping Liu, Huajie Liu, Hu Song, Linfeng Zhang

    Abstract: Machine translation has long been a central task in natural language processing. With the rapid advancement of large language models (LLMs), there has been remarkable progress in translation quality. However, fully realizing the translation potential of LLMs remains an open challenge. Recent studies have explored multi-agent systems to decompose complex translation tasks into collaborative subtask… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: 20 pages, 4 figures, Under review. Code: https://github.com/weiyali126/TACTIC

  31. arXiv:2506.08359  [pdf, ps, other

    cs.CL

    DEAL: Disentangling Transformer Head Activations for LLM Steering

    Authors: Li-Ming Zhan, Bo Liu, Zexin Lu, Chengqiang Xie, Jiannong Cao, Xiao-Ming Wu

    Abstract: Inference-time steering aims to alter the response characteristics of large language models (LLMs) without modifying their underlying parameters. A critical step in this process is the identification of internal modules within LLMs that are associated with the target behavior. However, current approaches to module selection often depend on superficial cues or ad-hoc heuristics, which can result in… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Preprint

  32. arXiv:2506.08030  [pdf, ps, other

    math.OC cs.LG stat.ML

    MOSS: Multi-Objective Optimization for Stable Rule Sets

    Authors: Brian Liu, Rahul Mazumder

    Abstract: We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective optimization framework. Importantly, MOSS allows a practitioner to rapidly evaluate the trade-off between accuracy and stability in sparse rule sets in order to sel… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  33. arXiv:2506.07542  [pdf

    cs.CV cs.AI

    APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

    Authors: Bowen Liu, Weiyi Zhang, Peranut Chotcomwongse, Xiaolan Chen, Ruoyu Chen, Pawin Pakaymaskul, Niracha Arjkongharn, Nattaporn Vongsa, Xuelian Cheng, Zongyuan Ge, Kun Huang, Xiaohui Li, Yiru Duan, Zhenbang Wang, BaoYe Xie, Qiang Chen, Huazhu Fu, Michael A. Mahr, Jiaqi Qu, Wangyiyang Chen, Shiye Wang, Yubo Tan, Yongjie Li, Mingguang He, Danli Shi , et al. (1 additional authors not shown)

    Abstract: Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  34. arXiv:2506.05569  [pdf, ps, other

    cs.IT eess.SP

    Fluid Antenna System-Assisted Self-Interference Cancellation for In-Band Full Duplex Communications

    Authors: Hanjiang Hong, Kai-Kit Wong, Hao Xu, Yiyan Wu, Sai Xu, Chan-Byoung Chae, Baiyang Liu, Kin-Fai Tong

    Abstract: In-band full-duplex (IBFD) systems are expected to double the spectral efficiency compared to half-duplex systems, provided that loopback self-interference (SI) can be effectively suppressed. The inherent interference mitigation capabilities of the emerging fluid antenna system (FAS) technology make it a promising candidate for addressing the SI challenge in IBFD systems. This paper thus proposes… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  35. arXiv:2506.05343  [pdf, ps, other

    cs.CV

    ContentV: Efficient Training of Video Generation Models with Limited Compute

    Authors: Wenfeng Lin, Renjie Chen, Boyuan Liu, Shiyue Yan, Ruoyu Feng, Jiangchuan Wei, Yichen Zhang, Yimeng Zhou, Chao Feng, Jiao Ran, Qi Wu, Zuotao Liu, Mingyu Guo

    Abstract: Recent advances in video generation demand increasingly efficient training recipes to mitigate escalating computational costs. In this report, we present ContentV, an 8B-parameter text-to-video model that achieves state-of-the-art performance (85.14 on VBench) after training on 256 x 64GB Neural Processing Units (NPUs) for merely four weeks. ContentV generates diverse, high-quality videos across m… ▽ More

    Submitted 11 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: Project Page: https://contentv.github.io

  36. arXiv:2506.03750  [pdf, ps, other

    cs.SI cs.AI

    A Retrieval-Augmented Multi-Agent Framework for Psychiatry Diagnosis

    Authors: Mengxi Xiao, Mang Ye, Ben Liu, Xiaofen Zong, He Li, Jimin Huang, Qianqian Xie, Min Peng

    Abstract: The application of AI in psychiatric diagnosis faces significant challenges, including the subjective nature of mental health assessments, symptom overlap across disorders, and privacy constraints limiting data availability. To address these issues, we present MoodAngels, the first specialized multi-agent framework for mood disorder diagnosis. Our approach combines granular-scale analysis of clini… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 40 pages, 11 figures

    MSC Class: 68T42 ACM Class: J.4

  37. arXiv:2506.03107  [pdf, ps, other

    cs.CV

    ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions

    Authors: Di Chang, Mingdeng Cao, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang

    Abstract: Editing images with instructions to reflect non-rigid motions, camera viewpoint shifts, object deformations, human articulations, and complex interactions, poses a challenging yet underexplored problem in computer vision. Existing approaches and datasets predominantly focus on static scenes or rigid transformations, limiting their capacity to handle expressive edits involving dynamic motion. To ad… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: Website: https://boese0601.github.io/bytemorph Dataset: https://huggingface.co/datasets/ByteDance-Seed/BM-6M Benchmark: https://huggingface.co/datasets/ByteDance-Seed/BM-Bench Code: https://github.com/ByteDance-Seed/BM-code Demo: https://huggingface.co/spaces/Boese0601/ByteMorph-Demo

  38. arXiv:2506.02824  [pdf, ps, other

    cs.RO

    Efficient Tactile Perception with Soft Electrical Impedance Tomography and Pre-trained Transformer

    Authors: Huazhi Dong, Ronald B. Liu, Sihao Teng, Delin Hu, Peisan, E, Francesco Giorgio-Serchi, Yunjie Yang

    Abstract: Tactile sensing is fundamental to robotic systems, enabling interactions through physical contact in multiple tasks. Despite its importance, achieving high-resolution, large-area tactile sensing remains challenging. Electrical Impedance Tomography (EIT) has emerged as a promising approach for large-area, distributed tactile sensing with minimal electrode requirements which can lend itself to addre… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  39. arXiv:2506.00235  [pdf, ps, other

    cs.CL

    MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility

    Authors: Yexiao He, Ang Li, Boyi Liu, Zhewei Yao, Yuxiong He

    Abstract: Healthcare decision-making represents one of the most challenging domains for Artificial Intelligence (AI), requiring the integration of diverse knowledge sources, complex reasoning, and various external analytical tools. Current AI systems often rely on either task-specific models, which offer limited adaptability, or general language models without grounding with specialized external knowledge a… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  40. arXiv:2505.23742  [pdf, ps, other

    cs.CV cs.AI

    MAGREF: Masked Guidance for Any-Reference Video Generation

    Authors: Yufan Deng, Xun Guo, Yuanyang Yin, Jacob Zhiyuan Fang, Yiding Yang, Yizhi Wang, Shenghai Yuan, Angtian Wang, Bo Liu, Haibin Huang, Chongyang Ma

    Abstract: Video generation has made substantial strides with the emergence of deep generative models, especially diffusion-based approaches. However, video generation based on multiple reference subjects still faces significant challenges in maintaining multi-subject consistency and ensuring high generation quality. In this paper, we propose MAGREF, a unified framework for any-reference video generation tha… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project website: https://magref-video.github.io/magref.github.io/

  41. arXiv:2505.23049  [pdf, ps, other

    cs.LG cs.CL

    DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration

    Authors: Tianteng Gu, Bei Liu, Bo Xiao, Ke Zeng, Jiacheng Liu, Yanmin Qian

    Abstract: Pruning is a widely used technique to compress large language models (LLMs) by removing unimportant weights, but it often suffers from significant performance degradation - especially under semi-structured sparsity constraints. Existing pruning methods primarily focus on estimating the importance of individual weights, which limits their ability to preserve critical capabilities of the model. In t… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  42. arXiv:2505.21153  [pdf

    cs.MM cs.HC

    THE WASTIVE: An Interactive Ebb and Flow of Digital Fabrication Waste

    Authors: Yifan Shan, Bo Liu, Sebastian Bidegain, Thijs Roumen

    Abstract: What if digital fabrication waste could observe the world? What would they see? What would they say? "THE WASTIVE" reimagines digital fabrication waste as sentient observers, giving them a poetic voice through interactive art. As viewers approach, the installation awakens, mimicking the rhythmic ebb and flow of ocean waves - a silent dialogue where discarded materials "observe" and respond to huma… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: video demo: https://youtu.be/Yh3dmKYNP-8

  43. arXiv:2505.20857  [pdf, ps, other

    cs.RO

    G-DReaM: Graph-conditioned Diffusion Retargeting across Multiple Embodiments

    Authors: Zhefeng Cao, Ben Liu, Sen Li, Wei Zhang, Hua Chen

    Abstract: Motion retargeting for specific robot from existing motion datasets is one critical step in transferring motion patterns from human behaviors to and across various robots. However, inconsistencies in topological structure, geometrical parameters as well as joint correspondence make it difficult to handle diverse embodiments with a unified retargeting architecture. In this work, we propose a novel… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  44. arXiv:2505.20740  [pdf, ps, other

    cs.AI

    MSEarth: A Benchmark for Multimodal Scientific Comprehension of Earth Science

    Authors: Xiangyu Zhao, Wanghan Xu, Bo Liu, Yuhao Zhou, Fenghua Ling, Ben Fei, Xiaoyu Yue, Lei Bai, Wenlong Zhang, Xiao-Ming Wu

    Abstract: The rapid advancement of multimodal large language models (MLLMs) has unlocked new opportunities to tackle complex scientific challenges. Despite this progress, their application in addressing earth science problems, especially at the graduate level, remains underexplored. A significant barrier is the absence of benchmarks that capture the depth and contextual complexity of geoscientific reasoning… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  45. arXiv:2505.18962  [pdf, ps, other

    cs.CL

    System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

    Authors: Xiaoqiang Wang, Suyuchen Wang, Yun Zhu, Bang Liu

    Abstract: Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps unif… ▽ More

    Submitted 31 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: Work in progress

  46. arXiv:2505.18319  [pdf, ps, other

    cs.CE

    Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

    Authors: Sifan Wu, Huan Zhang, Yizhan Li, Farshid Effaty, Amirreza Ataei, Bang Liu

    Abstract: The emergence of Multimodal Large Language Models (MLLMs) that integrate vision and language modalities has unlocked new potentials for scientific reasoning, outperforming prior benchmarks in both natural language and coding domains. Current materials science evaluation datasets such as MaScQA and SciQA remain largely text-based and fail to capture the visual and research-level analytic complexity… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  47. arXiv:2505.18286  [pdf, ps, other

    cs.MA cs.AI cs.LG

    Single-agent or Multi-agent Systems? Why Not Both?

    Authors: Mingyan Gao, Yanzi Li, Banruo Liu, Yifan Yu, Phillip Wang, Ching-Yu Lin, Fan Lai

    Abstract: Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by long-horizon context tracking and error correction through role-specific agents. However, the design and deployment of MAS incur higher complexity and runtime cost co… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  48. arXiv:2505.17071  [pdf, other

    cs.CL

    What's in a prompt? Language models encode literary style in prompt embeddings

    Authors: Raphaël Sarfati, Haley Moller, Toni J. B. Liu, Nicolas Boullé, Christopher Earls

    Abstract: Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers.… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  49. arXiv:2505.16901  [pdf, ps, other

    cs.SE cs.LG

    Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

    Authors: Hongyuan Tao, Ying Zhang, Zhenhao Tang, Hongen Peng, Xukun Zhu, Bingchang Liu, Yingguang Yang, Ziyin Zhang, Zhaogui Xu, Haipeng Zhang, Linchao Zhu, Rui Wang, Hang Yu, Jianguo Li, Peng Di

    Abstract: Recent advances in Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging. Current solutions predominantly rely on proprietary LLM agents, which introduce unpredictability and limit accessibility, raising concerns about data privacy and model customization. This paper investigates whether open-source LLM… ▽ More

    Submitted 23 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 35 pages, 10 figures

  50. arXiv:2505.16821  [pdf, ps, other

    cs.NI cs.LG eess.SP

    LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols

    Authors: Ziming Liu, Bryan Liu, Alvaro Valcarce, Xiaoli Chu

    Abstract: Integrating large AI models (LAMs) into 6G mobile networks promises to redefine protocol design and control-plane intelligence by enabling autonomous, cognitive network operations. While industry concepts, such as ETSI's Experiential Networked Intelligence (ENI), envision LAM-driven agents for adaptive network slicing and intent-based management, practical implementations still face challenges in… ▽ More

    Submitted 25 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: This work has been submitted to the IEEE for possible publication. Focuses on applying LLMs to 5G RRC protocol generation; primary: cs.NI; cross-list: eess.SP, cs.LG