Skip to main content

Showing 1–50 of 288 results for author: Qin, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04634  [pdf, ps, other

    cs.CV cs.AI

    LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction

    Authors: Yixin Yan, Yang Li, Yuanfan Wang, Xiaozhou Zhou, Beihao Xia, Manjiang Hu, Hongmao Qin

    Abstract: It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial inte… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2507.04456  [pdf, ps, other

    cs.CV

    BiVM: Accurate Binarized Neural Network for Efficient Video Matting

    Authors: Haotong Qin, Xianglong Liu, Xudong Ma, Lei Ke, Yulun Zhang, Jie Luo, Michele Magno

    Abstract: Deep neural networks for real-time video matting suffer significant computational limitations on edge devices, hindering their adoption in widespread applications such as online conferences and short-form video production. Binarization emerges as one of the most common compression approaches with compact 1-bit parameters and efficient bitwise operations. However, accuracy and efficiency limitation… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  3. arXiv:2507.04290  [pdf, ps, other

    cs.CV

    MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation

    Authors: Weilun Feng, Chuanguang Yang, Haotong Qin, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Boyu Diao, Fuzhen Zhuang, Michele Magno, Yongjun Xu, Yingli Tian, Tingwen Huang

    Abstract: Diffusion models have demonstrated remarkable performance on vision generation tasks. However, the high computational complexity hinders its wide application on edge devices. Quantization has emerged as a promising technique for inference acceleration and memory reduction. However, existing quantization methods do not generalize well under extremely low-bit (2-4 bit) quantization. Directly applyin… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  4. arXiv:2507.00849  [pdf, ps, other

    cs.CV

    UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection

    Authors: Wei Li, Jiaman Tang, Yang Li, Beihao Xia, Ligang Tan, Hongmao Qin

    Abstract: Unmanned Aerial Vehicle (UAV) object detection has been widely used in traffic management, agriculture, emergency rescue, etc. However, it faces significant challenges, including occlusions, small object sizes, and irregular shapes. These challenges highlight the necessity for a robust and efficient multimodal UAV object detection method. Mamba has demonstrated considerable potential in multimodal… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: The paper was accepted by the 36th IEEE Intelligent Vehicles Symposium (IEEE IV 2025)

  5. arXiv:2506.20548  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.MM

    Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks

    Authors: Manyi Li, Renshuai Tao, Yufan Liu, Chuangchuang Tan, Haotong Qin, Bing Li, Yunchao Wei, Yao Zhao

    Abstract: With the rapid advancement of deep learning, particularly through generative adversarial networks (GANs) and diffusion models (DMs), AI-generated images, or ``deepfakes", have become nearly indistinguishable from real ones. These images are widely shared across Online Social Networks (OSNs), raising concerns about their misuse. Existing deepfake detection methods overlook the ``block effects" intr… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 20 pages, 10 figures

  6. arXiv:2506.18807  [pdf, ps, other

    cs.CV

    PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications

    Authors: Pietro Bonazzi, Nicola Farronato, Stefan Zihlmann, Haotong Qin, Michele Magno

    Abstract: Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications like smart glasses and IoT devices. We introduce PicoSAM2, a lightweight (1.3M parameters, 336M MACs) promptable segmentation model optimized for edge and in-sensor execution, including the Sony IMX500. It builds on a depthwise separable U-Net, with knowledge distillation and fixed-point prompt encod… ▽ More

    Submitted 24 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  7. arXiv:2506.16961  [pdf, ps, other

    cs.CV eess.IV

    Reversing Flow for Image Restoration

    Authors: Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu

    Abstract: Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation. Existing generative models for image restoration, including diffusion and score-based models, often treat the degradation process as a stochastic transformation, which introduces inefficiency and complexity. In this work, we propose ResFlow, a novel image restorat… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: CVPR2025 Final Version; Corresponding Author: Bing Li

    MSC Class: 68U10 ACM Class: I.4.4

  8. arXiv:2506.16960  [pdf, ps, other

    cs.CV

    Visual-Instructed Degradation Diffusion for All-in-One Image Restoration

    Authors: Wenyang Luo, Haina Qin, Zewen Chen, Libin Wang, Dandan Zheng, Yuming Li, Yufan Liu, Bing Li, Weiming Hu

    Abstract: Image restoration tasks like deblurring, denoising, and dehazing usually need distinct models for each degradation type, restricting their generalization in real-world scenarios with mixed or unknown degradations. In this work, we propose \textbf{Defusion}, a novel all-in-one image restoration framework that utilizes visual instruction-guided degradation diffusion. Unlike existing methods that rel… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: CVPR2025 Final Version; Corresponding Author: Bing Li

    MSC Class: 68U10 ACM Class: I.4.4

  9. arXiv:2506.16121  [pdf, ps, other

    cs.DS

    On the Efficient Discovery of Maximum $k$-Defective Biclique

    Authors: Donghang Cui, Ronghua Li, Qiangqiang Dai, Hongchao Qin, Guoren Wang

    Abstract: The problem of identifying the maximum edge biclique in bipartite graphs has attracted considerable attention in bipartite graph analysis, with numerous real-world applications such as fraud detection, community detection, and online recommendation systems. However, real-world graphs may contain noise or incomplete information, leading to overly restrictive conditions when employing the biclique m… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  10. arXiv:2506.15976  [pdf, ps, other

    cs.CV

    LBMamba: Locally Bi-directional Mamba

    Authors: Jingwei Zhang, Xi Han, Hong Qin, Mahdi S. Hosseini, Dimitris Samaras

    Abstract: Mamba, a State Space Model (SSM) that accelerates training by recasting recurrence as a parallel selective scan, has recently emerged as a linearly-scaling, efficient alternative to self-attention. Because of its unidirectional nature, each state in Mamba only has information of its previous states and is blind to states after. Current Mamba-based computer-vision methods typically overcome this li… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Submitted to TMLR

  11. arXiv:2506.12430  [pdf, ps, other

    cs.CR cs.CV

    Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

    Authors: Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song , et al. (22 additional authors not shown)

    Abstract: Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents finding… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  12. arXiv:2506.10840  [pdf, ps, other

    cs.CV cs.AI

    Post-Training Quantization for Video Matting

    Authors: Tianrui Zhu, Houyuan Chen, Ruihao Gong, Michele Magno, Haotong Qin, Kai Zhang

    Abstract: Video matting is crucial for applications such as film production and virtual reality, yet deploying its computationally intensive models on resource-constrained devices presents challenges. Quantization is a key technique for model compression and acceleration. As an efficient approach, Post-Training Quantization (PTQ) is still in its nascent stages for video matting, facing significant hurdles i… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  13. arXiv:2506.10404  [pdf, ps, other

    cs.LG

    Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height

    Authors: Bryan Shaddy, Brianna Binder, Agnimitra Dasgupta, Haitong Qin, James Haley, Angel Farguell, Kyle Hilburn, Derek V. Mallia, Adam Kochanski, Jan Mandel, Assad Oberai

    Abstract: Increasing wildfire occurrence has spurred growing interest in wildfire spread prediction. However, even the most complex wildfire models diverge from observed progression during multi-day simulations, motivating need for data assimilation. A useful approach to assimilating measurement data into complex coupled atmosphere-wildfire models is to estimate wildfire progression from measurements and us… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  14. arXiv:2506.10311  [pdf, ps, other

    cs.DM

    The Freight Multimodal Transport Problem with Buses and Drones: An Integrated Approach for Last-Mile Delivery

    Authors: E Su, Hu Qin, Jiliu Li, Rui Zhang

    Abstract: This paper proposes a novel freight multimodal transport problem with buses and drones, where buses are responsible for transporting parcels to lockers at bus stops for storage, while drones are used to deliver each parcel from the locker to the corresponding customer. The integrated bus-drone system synergistically expands drone service coverage using the bus network to ensure efficient final del… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  15. arXiv:2506.09782  [pdf, ps, other

    cs.CV cs.AI

    Q-SAM2: Accurate Quantization for Segment Anything Model 2

    Authors: Nicola Farronato, Florian Scheidegger, Mattia Rigotti, Cristiano Malossi, Michele Magno, Haotong Qin

    Abstract: The Segment Anything Model 2 (SAM2) has gained significant attention as a foundational approach for promptable image and video segmentation. However, its expensive computational and memory consumption poses a severe challenge for its application in resource-constrained scenarios. In this paper, we propose an accurate low-bit quantization method for efficient SAM2, termed Q-SAM2. To address the per… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 20 pages

  16. arXiv:2506.07627  [pdf, ps, other

    cs.CV

    Event-Priori-Based Vision-Language Model for Efficient Visual Understanding

    Authors: Haotong Qin, Cheng Hu, Michele Magno

    Abstract: Large Language Model (LLM)-based Vision-Language Models (VLMs) have substantially extended the boundaries of visual understanding capabilities. However, their high computational demands hinder deployment on resource-constrained edge devices. A key source of inefficiency stems from the VLM's need to process dense and redundant visual information. Visual inputs contain significant regions irrelevant… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  17. arXiv:2506.03543  [pdf, ps, other

    cs.AI cs.CY cs.MA

    CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications

    Authors: Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li

    Abstract: Current large language model (LLM) agents lack authentic human psychological processes necessary for genuine digital twins and social AI applications. To address this limitation, we present a computational implementation of Global Workspace Theory (GNWT) that integrates human cognitive architecture principles into LLM agents, creating specialized sub-agents for emotion, memory, social norms, plann… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  18. arXiv:2506.02938  [pdf, ps, other

    cs.CV

    MIND: Material Interface Generation from UDFs for Non-Manifold Surface Reconstruction

    Authors: Xuhui Chen, Fei Hou, Wencheng Wang, Hong Qin, Ying He

    Abstract: Unsigned distance fields (UDFs) are widely used in 3D deep learning due to their ability to represent shapes with arbitrary topology. While prior work has largely focused on learning UDFs from point clouds or multi-view images, extracting meshes from UDFs remains challenging, as the learned fields rarely attain exact zero distances. A common workaround is to reconstruct signed distance fields (SDF… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  19. arXiv:2506.00820  [pdf, ps, other

    cs.CV

    QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration

    Authors: Jiatong Li, Libo Zhu, Haotong Qin, Jingkai Wang, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

    Abstract: Diffusion models have been achieving remarkable performance in face restoration. However, the heavy computations of diffusion models make it difficult to deploy them on devices like smartphones. In this work, we propose QuantFace, a novel low-bit quantization for one-step diffusion face restoration models, where the full-precision (\ie, 32-bit) weights and activations are quantized to 4$\sim$6-bit… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  20. arXiv:2505.22167  [pdf, other

    cs.CV

    Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers

    Authors: Weilun Feng, Chuanguang Yang, Haotong Qin, Xiangqi Li, Yu Wang, Zhulin An, Libo Huang, Boyu Diao, Zixiang Zhao, Yongjun Xu, Michele Magno

    Abstract: Diffusion transformers (DiT) have demonstrated exceptional performance in video generation. However, their large number of parameters and high computational complexity limit their deployment on edge devices. Quantization can reduce storage requirements and accelerate inference by lowering the bit-width of model parameters. Yet, existing quantization methods for image generation models do not gener… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML2025

  21. arXiv:2505.18637  [pdf, ps, other

    cs.IT

    Neural Coding Is Not Always Semantic: Towards The Standardized Coding Workflow in Semantic Communications

    Authors: Hai-Long Qin, Jincheng Dai, Sixian Wang, Xiaoqi Qin, Shuo Shao, Kai Niu, Wenjun Xu, Ping Zhang

    Abstract: Semantic communication, leveraging advanced deep learning techniques, emerges as a new paradigm that meets the requirements of next-generation wireless networks. However, current semantic communication systems, which employ neural coding for feature extraction from raw data, have not adequately addressed the fundamental question: Is general feature extraction through deep neural networks sufficien… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  22. arXiv:2505.15217  [pdf, ps, other

    cs.CV

    Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection

    Authors: Haotian Qin, Dongliang Chang, Yueying Gao, Bingyao Yu, Lei Chen, Zhanyu Ma

    Abstract: Although existing CLIP-based methods for detecting AI-generated images have achieved promising results, they are still limited by severe feature redundancy, which hinders their generalization ability. To address this issue, incorporating an information bottleneck network into the task presents a straightforward solution. However, relying solely on image-corresponding prompts results in suboptimal… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 24 pages, 16 figures

  23. arXiv:2505.12020  [pdf, ps, other

    cs.LG cs.AI

    GeoMaNO: Geometric Mamba Neural Operator for Partial Differential Equations

    Authors: Xi Han, Jingwei Zhang, Dimitris Samaras, Fei Hou, Hong Qin

    Abstract: The neural operator (NO) framework has emerged as a powerful tool for solving partial differential equations (PDEs). Recent NOs are dominated by the Transformer architecture, which offers NOs the capability to capture long-range dependencies in PDE dynamics. However, existing Transformer-based NOs suffer from quadratic complexity, lack geometric rigor, and thus suffer from sub-optimal performance… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  24. arXiv:2505.11678  [pdf, other

    cs.CY

    Fairness-Utility Trade-off via Wasserstein Projection

    Authors: Yan Chen, Zheng Tan, Jose Blanchet, Hanzhang Qin

    Abstract: Ensuring fairness in data-driven decision-making is a critical concern, but existing fairness constraints often involve trade-offs with overall utility. We propose a fairness framework that enforces strong demographic parity-related fairness criteria (with $ε$-tolerance) in propensity score allocation while guaranteeing a minimum total utility. This approach balances equity and utility by calibrat… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  25. arXiv:2505.11497  [pdf, other

    cs.CV

    QVGen: Pushing the Limit of Quantized Video Generative Models

    Authors: Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang

    Abstract: Video diffusion models (DMs) have enabled high-quality video synthesis. Yet, their substantial computational and memory demands pose serious challenges to real-world deployment, even on high-end GPUs. As a commonly adopted solution, quantization has proven notable success in reducing cost for image DMs, while its direct application to video DMs remains ineffective. In this paper, we present QVGen,… ▽ More

    Submitted 23 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: Our code will be released upon acceptance

  26. arXiv:2505.07219  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection

    Authors: Hongda Qin, Xiao Lu, Zhiyong Wei, Yihong Cao, Kailun Yang, Ningjiang Chen

    Abstract: Generalizing an object detector trained on a single domain to multiple unseen domains is a challenging task. Existing methods typically introduce image or feature augmentation to diversify the source domain to raise the robustness of the detector. Vision-Language Model (VLM)-based augmentation techniques have been proven to be effective, but they require that the detector's backbone has the same s… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: The source code and pre-trained models will be publicly available at https://github.com/qinhongda8/LDDS

  27. arXiv:2505.05530  [pdf, other

    cs.LG cs.AI

    Low-bit Model Quantization for Deep Neural Networks: A Survey

    Authors: Kai Liu, Qian Zheng, Kaiwen Tao, Zhiteng Li, Haotong Qin, Wenbo Li, Yong Guo, Xianglong Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

    Abstract: With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an effective weight-lighting technique, has become an indispensable procedure in the whole deployment pipeline. The essence of quantization acceleration is the conver… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: We have systematically collected and reviewed the state-of-the-art quantization methods from the past five years, categorizing them into eight distinct groups. A curated list of model quantization is provided at https://github.com/Kai-Liu001/Awesome-Model-Quantization

  28. arXiv:2505.04258  [pdf, other

    cs.RO cs.CV

    RGB-Event Fusion with Self-Attention for Collision Prediction

    Authors: Pietro Bonazzi, Christian Vogt, Michael Jost, Haotong Qin, Lyes Khacef, Federico Paredes-Valles, Michele Magno

    Abstract: Ensuring robust and real-time obstacle avoidance is critical for the safe operation of autonomous robots in dynamic, real-world environments. This paper proposes a neural network framework for predicting the time and collision position of an unmanned aerial vehicle with a dynamic object, using RGB and event-based vision sensors. The proposed architecture consists of two separate encoder branches,… ▽ More

    Submitted 16 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2504.10400

  29. arXiv:2505.03849  [pdf, ps, other

    cs.LG astro-ph.IM nucl-th physics.comp-ph

    Improved Dimensionality Reduction for Inverse Problems in Nuclear Fusion and High-Energy Astrophysics

    Authors: Jonathan Gorard, Ammar Hakim, Hong Qin, Kyle Parfrey, Shantenu Jha

    Abstract: Many inverse problems in nuclear fusion and high-energy astrophysics research, such as the optimization of tokamak reactor geometries or the inference of black hole parameters from interferometric images, necessitate high-dimensional parameter scans and large ensembles of simulations to be performed. Such inverse problems typically involve large uncertainties, both in the measurement parameters be… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 2 pages. Position paper accepted to DOE-ASCR Inverse Methods for Complex Systems under Uncertainty Workshop (Rockville, MD, United States, June 10-12, 2025)

  30. arXiv:2505.02214  [pdf, other

    cs.LG

    An Empirical Study of Qwen3 Quantization

    Authors: Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, Xianglong Liu

    Abstract: The Qwen series has emerged as a leading family of open-source Large Language Models (LLMs), demonstrating remarkable capabilities in natural language understanding tasks. With the recent release of Qwen3, which exhibits superior performance across diverse benchmarks, there is growing interest in deploying these models efficiently in resource-constrained environments. Low-bit quantization presents… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  31. arXiv:2505.00983  [pdf, other

    cs.LG cs.AI cs.DB cs.SI

    Toward Data-centric Directed Graph Learning: An Entropy-driven Approach

    Authors: Xunkai Li, Zhengyu Wu, Kaichi Yu, Hongchao Qin, Guang Zeng, Rong-Hua Li, Guoren Wang

    Abstract: The directed graph (digraph), as a generalization of undirected graphs, exhibits superior representation capability in modeling complex topology systems and has garnered considerable attention in recent years. Despite the notable efforts made by existing DiGraph Neural Networks (DiGNNs) to leverage directed edges, they still fail to comprehensively delve into the abundant data knowledge concealed… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  32. arXiv:2504.14625  [pdf, other

    cs.AR cs.AI

    Towards Optimal Circuit Generation: Multi-Agent Collaboration Meets Collective Intelligence

    Authors: Haiyan Qin, Jiahao Feng, Xiaotong Feng, Wei W. Xing, Wang Kang

    Abstract: Large language models (LLMs) have transformed code generation, yet their application in hardware design produces gate counts 38\%--1075\% higher than human designs. We present CircuitMind, a multi-agent framework that achieves human-competitive efficiency through three key innovations: syntax locking (constraining generation to basic logic gates), retrieval-augmented generation (enabling knowledge… ▽ More

    Submitted 30 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: 9 pages, 6 figures

  33. arXiv:2504.14560  [pdf, other

    cs.AR cs.AI

    ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning Model

    Authors: Haiyan Qin, Zhiwei Xie, Jingjing Li, Liangchen Li, Xiaotong Feng, Junzhan Liu, Wang Kang

    Abstract: Large Language Models (LLMs) have advanced Verilog code generation significantly, yet face challenges in data quality, reasoning capabilities, and computational efficiency. This paper presents ReasoningV, a novel model employing a hybrid reasoning strategy that integrates trained intrinsic capabilities with dynamic inference adaptation for Verilog code generation. Our framework introduces three co… ▽ More

    Submitted 30 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: 9 pages, 4 figures

  34. arXiv:2504.11514  [pdf, other

    cs.AI cs.RO

    Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models

    Authors: Nicolas Baumann, Cheng Hu, Paviththiren Sivasothilingam, Haotong Qin, Lei Xie, Michele Magno, Luca Benini

    Abstract: Neural Networks (NNs) trained through supervised learning struggle with managing edge-case scenarios common in real-world driving due to the intractability of exhaustive datasets covering all edge-cases, making knowledge-driven approaches, akin to how humans intuitively detect unexpected driving behavior, a suitable complement to data-driven methods. This work proposes a hybrid architecture combin… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  35. arXiv:2504.00457  [pdf, other

    cs.CV cs.AI

    Distilling Multi-view Diffusion Models into 3D Generators

    Authors: Hao Qin, Luyuan Chen, Ming Kong, Mengxu Lu, Qiang Zhu

    Abstract: We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous am… ▽ More

    Submitted 2 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  36. arXiv:2503.21970  [pdf, other

    cs.CV

    Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration

    Authors: Yujie Chen, Haotong Qin, Zhang Zhang, Michelo Magno, Luca Benini, Yawei Li

    Abstract: State-Space Models (SSMs) have attracted considerable attention in Image Restoration (IR) due to their ability to scale linearly sequence length while effectively capturing long-distance dependencies. However, deploying SSMs to edge devices is challenging due to the constraints in memory, computing capacity, and power consumption, underscoring the need for efficient compression strategies. While l… ▽ More

    Submitted 2 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  37. arXiv:2503.21210  [pdf, other

    cs.CV

    FakeReasoning: Towards Generalizable Forgery Detection and Reasoning

    Authors: Yueying Gao, Dongliang Chang, Bingyao Yu, Haotian Qin, Lei Chen, Kongming Liang, Zhanyu Ma

    Abstract: Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery detection model. Moreover, since every pixel in an AI-generated image is synthesized, traditional saliency-based forgery explanation methods are not well suited fo… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  38. arXiv:2503.17876  [pdf, other

    cs.CL cs.IR

    Satisfactory Medical Consultation based on Terminology-Enhanced Information Retrieval and Emotional In-Context Learning

    Authors: Kaiwen Zuo, Jing Tang, Hanbing Qin, Binli Luo, Ligang He, Shiyan Tang

    Abstract: Recent advancements in Large Language Models (LLMs) have marked significant progress in understanding and responding to medical inquiries. However, their performance still falls short of the standards set by professional consultations. This paper introduces a novel framework for medical consultation, comprising two main modules: Terminology-Enhanced Information Retrieval (TEIR) and Emotional In-Co… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: The 46th European Conference on Information Retrieval Workshop

  39. arXiv:2503.17699  [pdf, other

    cs.CV

    MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking

    Authors: Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, Jianan Li

    Abstract: UAV tracking faces significant challenges in real-world scenarios, such as small-size targets and occlusions, which limit the performance of RGB-based trackers. Multispectral images (MSI), which capture additional spectral information, offer a promising solution to these challenges. However, progress in this field has been hindered by the lack of relevant datasets. To address this gap, we introduc… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  40. arXiv:2503.17398   

    eess.SY cs.RO

    Reachable Sets-based Trajectory Planning Combining Reinforcement Learning and iLQR

    Authors: Wenjie Huang, Yang Li, Shijie Yuan, Jingjia Teng, Hongmao Qin, Yougang Bian

    Abstract: The driving risk field is applicable to more complex driving scenarios, providing new approaches for safety decision-making and active vehicle control in intricate environments. However, existing research often overlooks the driving risk field and fails to consider the impact of risk distribution within drivable areas on trajectory planning, which poses challenges for enhancing safety. This paper… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: We sincerely request the withdrawal of this paper. After further research and review, we have found that certain parts of the content contain uncertainties and are not sufficient to support the conclusions previously drawn. To avoid any potential misunderstanding or misguidance to the research community, we have decided to voluntarily withdraw the manuscript

  41. arXiv:2503.16983  [pdf, other

    cs.CV cs.AI

    Enabling Versatile Controls for Video Diffusion Models

    Authors: Xu Zhang, Hao Zhou, Haoming Qin, Xiaobin Lu, Jiaxing Yan, Guanzhong Wang, Zeyu Chen, Yi Liu

    Abstract: Despite substantial progress in text-to-video generation, achieving precise and flexible control over fine-grained spatiotemporal attributes remains a significant unresolved challenge in video generation research. To address these limitations, we introduce VCtrl (also termed PP-VCtrl), a novel framework designed to enable fine-grained control over pre-trained video diffusion models in a unified ma… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: Codes and Supplementary Material: http://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl

  42. arXiv:2503.15029  [pdf, other

    cs.RO cs.CV

    DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

    Authors: Jianbo Zhao, Taiyu Ban, Zhihao Liu, Hangning Zhou, Xiyang Wang, Qibin Zhou, Hailong Qin, Mu Yang, Lei Liu, Bin Li

    Abstract: Accurate and efficient modeling of agent interactions is essential for trajectory generation, the core of autonomous driving systems. Existing methods, scene-centric, agent-centric, and query-centric frameworks, each present distinct advantages and drawbacks, creating an impossible triangle among accuracy, computational time, and memory efficiency. To break this limitation, we propose Directional… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  43. arXiv:2503.13969  [pdf, other

    cs.CV

    SoccerSynth Field: enhancing field detection with synthetic data from virtual soccer simulator

    Authors: HaoBin Qin, Jiale Fang, Keisuke Fujii

    Abstract: Field detection in team sports is an essential task in sports video analysis. However, collecting large-scale and diverse real-world datasets for training detection models is often cost and time-consuming. Synthetic datasets, which allow controlled variability in lighting, textures, and camera angles, will be a promising alternative for addressing these problems. This study addresses the challenge… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  44. arXiv:2503.13906  [pdf, other

    cs.CV cs.AI

    HSOD-BIT-V2: A New Challenging Benchmarkfor Hyperspectral Salient Object Detection

    Authors: Yuhao Qiu, Shuyan Bai, Tingfa Xu, Peifu Liu, Haolin Qin, Jianan Li

    Abstract: Salient Object Detection (SOD) is crucial in computer vision, yet RGB-based methods face limitations in challenging scenes, such as small objects and similar color features. Hyperspectral images provide a promising solution for more accurate Hyperspectral Salient Object Detection (HSOD) by abundant spectral information, while HSOD methods are hindered by the lack of extensive and available dataset… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: AAAI 2025

  45. arXiv:2503.09366  [pdf, other

    cs.CV

    Post-interactive Multimodal Trajectory Prediction for Autonomous Driving

    Authors: Ziyi Huang, Yang Li, Dushuai Li, Yao Mu, Hongmao Qin, Nan Zheng

    Abstract: Modeling the interactions among agents for trajectory prediction of autonomous driving has been challenging due to the inherent uncertainty in agents' behavior. The interactions involved in the predicted trajectories of agents, also called post-interactions, have rarely been considered in trajectory prediction models. To this end, we propose a coarse-to-fine Transformer for multimodal trajectory p… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  46. arXiv:2503.06564  [pdf, other

    cs.CV

    TR-DQ: Time-Rotation Diffusion Quantization

    Authors: Yihua Shao, Deyang Lin, Fanhu Zeng, Minxi Yan, Muyang Zhang, Siyu Chen, Yuxuan Fan, Ziyang Yan, Haozhe Wang, Jingcai Guo, Yan Wang, Haotong Qin, Hao Tang

    Abstract: Diffusion models have been widely adopted in image and video generation. However, their complex network architecture leads to high inference overhead for its generation process. Existing diffusion quantization methods primarily focus on the quantization of the model structure while ignoring the impact of time-steps variation during sampling. At the same time, most current approaches fail to accoun… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  47. arXiv:2503.06075  [pdf, other

    cs.RO

    FSDP: Fast and Safe Data-Driven Overtaking Trajectory Planning for Head-to-Head Autonomous Racing Competitions

    Authors: Cheng Hu, Jihao Huang, Wule Mao, Yonghao Fu, Xuemin Chi, Haotong Qin, Nicolas Baumann, Zhitao Liu, Michele Magno, Lei Xie

    Abstract: Generating overtaking trajectories in autonomous racing is a challenging task, as the trajectory must satisfy the vehicle's dynamics and ensure safety and real-time performance running on resource-constrained hardware. This work proposes the Fast and Safe Data-Driven Planner to address this challenge. Sparse Gaussian predictions are introduced to improve both the computational efficiency and accur… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: submitted to IROS 2025

  48. arXiv:2503.05584  [pdf, other

    cs.CV

    QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution

    Authors: Libo Zhu, Haotong Qin, Kaicheng Yang, Wenbo Li, Yong Guo, Yulun Zhang, Susanto Rahardja, Xiaokang Yang

    Abstract: One-step diffusion-based image super-resolution (OSDSR) models are showing increasingly superior performance nowadays. However, although their denoising steps are reduced to one and they can be quantized to 8-bit to reduce the costs further, there is still significant potential for OSDSR to quantize to lower bits. To explore more possibilities of quantized OSDSR, we propose an efficient method, Qu… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  49. arXiv:2503.02624  [pdf, other

    cs.RO

    Human-aligned Safe Reinforcement Learning for Highway On-Ramp Merging in Dense Traffic

    Authors: Yang Li, Shijie Yuan, Yuan Chang, Xiaolong Chen, Qisong Yang, Zhiyuan Yang, Hongmao Qin

    Abstract: Most reinforcement learning (RL) approaches for the decision-making of autonomous driving consider safety as a reward instead of a cost, which makes it hard to balance the tradeoff between safety and other objectives. Human risk preference has also rarely been incorporated, and the trained policy might be either conservative or aggressive for users. To this end, this study proposes a human-aligned… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 20 pages, 16 figures

  50. arXiv:2503.02508  [pdf, other

    cs.CV eess.IV

    Q&C: When Quantization Meets Cache in Efficient Image Generation

    Authors: Xin Ding, Xin Li, Haotong Qin, Zhibo Chen

    Abstract: Quantization and cache mechanisms are typically applied individually for efficient Diffusion Transformers (DiTs), each demonstrating notable potential for acceleration. However, the promoting effect of combining the two mechanisms on efficient generation remains under-explored. Through empirical investigation, we find that the combination of quantization and cache mechanisms for DiT is not straigh… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 11 pages