Skip to main content

Showing 1–50 of 137 results for author: Qiu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04723  [pdf, ps, other

    cs.CL

    LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework

    Authors: Zecheng Tang, Haitian Wang, Quantong Qiu, Baibei Ji, Ruoxi Sun, Keyan Zhou, Juntao Li, Min Zhang

    Abstract: Long-context processing has become a fundamental capability for large language models~(LLMs). To assess model's long-context performance, numerous long-context evaluation benchmarks have been proposed. However, variations in evaluation settings across these benchmarks lead to inconsistent results, making it difficult to draw reliable comparisons. Besides, the high computational cost of long-contex… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2506.19303  [pdf, ps, other

    cs.RO

    Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference

    Authors: Zexiang Guo, Hengxiang Chen, Xinheng Mai, Qiusang Qiu, Gan Ma, Zhanat Kappassov, Qiang Li, Nutan Chen

    Abstract: Inferring physical properties can significantly enhance robotic manipulation by enabling robots to handle objects safely and efficiently through adaptive grasping strategies. Previous approaches have typically relied on either tactile or visual data, limiting their ability to fully capture properties. We introduce a novel cross-modal perception framework that integrates visual observations with ta… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted by the 2025 International Conference on Climbing and Walking Robots (CLAWAR). These authors contributed equally to this work: Zexiang Guo, Hengxiang Chen, Xinheng Mai

  3. arXiv:2506.15793  [pdf, ps, other

    cs.DS cs.AI

    Linearithmic Clean-up for Vector-Symbolic Key-Value Memory with Kroneker Rotation Products

    Authors: Ruipeng Liu, Qinru Qiu, Simon Khan, Garrett E. Katz

    Abstract: A computational bottleneck in current Vector-Symbolic Architectures (VSAs) is the ``clean-up'' step, which decodes the noisy vectors retrieved from the architecture. Clean-up typically compares noisy vectors against a ``codebook'' of prototype vectors, incurring computational complexity that is quadratic or similar. We present a new codebook representation that supports efficient clean-up, based o… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 10 pages, 10 figures, conference paper

  4. arXiv:2506.07672  [pdf, ps, other

    cs.AI

    MCPWorld: A Unified Benchmarking Testbed for API, GUI, and Hybrid Computer Use Agents

    Authors: Yunhe Yan, Shihe Wang, Jiajun Du, Yexuan Yang, Yuxuan Shan, Qichen Qiu, Xianqing Jia, Xinge Wang, Xin Yuan, Xu Han, Mao Qin, Yinxiao Chen, Chen Peng, Shangguang Wang, Mengwei Xu

    Abstract: (M)LLM-powered computer use agents (CUA) are emerging as a transformative technique to automate human-computer interaction. However, existing CUA benchmarks predominantly target GUI agents, whose evaluation methods are susceptible to UI changes and ignore function interactions exposed by application APIs, e.g., Model Context Protocol (MCP). To this end, we propose MCPWorld, the first automatic CUA… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  5. arXiv:2505.18472  [pdf, other

    cs.RO

    ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning

    Authors: Quan Khanh Luu, Pokuang Zhou, Zhengtong Xu, Zhiyuan Zhang, Qiang Qiu, Yu She

    Abstract: Supervised visuomotor policies have shown strong performance in robotic manipulation but often struggle in tasks with limited visual input, such as operations in confined spaces, dimly lit environments, or scenarios where perceiving the object's properties and state is critical for task success. In such cases, tactile feedback becomes essential for manipulation. While the rapid progress of supervi… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  6. arXiv:2505.10938  [pdf, other

    cs.CL

    Accurate KV Cache Quantization with Outlier Tokens Tracing

    Authors: Yi Su, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang

    Abstract: The impressive capabilities of Large Language Models (LLMs) come at the cost of substantial computational resources during deployment. While KV Cache can significantly reduce recomputation during inference, it also introduces additional memory overhead. KV Cache quantization presents a promising solution, striking a good balance between memory usage and accuracy. Previous research has shown that t… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: ACL2025 Main

  7. arXiv:2505.05834  [pdf, other

    cs.CV

    Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression

    Authors: Chunlai Dong, Haochao Ying, Qibo Qiu, Jinhong Wang, Danny Chen, Jian Wu

    Abstract: Ordinal regression bridges regression and classification by assigning objects to ordered classes. While human experts rely on discriminative patch-level features for decisions, current approaches are limited by the availability of only image-level ordinal labels, overlooking fine-grained patch-level characteristics. In this paper, we propose a Dual-level Fuzzy Learning with Patch Guidance framewor… ▽ More

    Submitted 17 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  8. arXiv:2504.13807  [pdf, other

    cs.RO

    DiffOG: Differentiable Policy Trajectory Optimization with Generalizability

    Authors: Zhengtong Xu, Zichen Miao, Qiang Qiu, Zhe Zhang, Yu She

    Abstract: Imitation learning-based visuomotor policies excel at manipulation tasks but often produce suboptimal action trajectories compared to model-based methods. Directly mapping camera data to actions via neural networks can result in jerky motions and difficulties in meeting critical constraints, compromising safety and robustness in real-world deployment. For tasks that require high robustness or stri… ▽ More

    Submitted 13 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  9. arXiv:2504.01234  [pdf

    cs.MA physics.optics

    First Field-Trial Demonstration of L4 Autonomous Optical Network for Distributed AI Training Communication: An LLM-Powered Multi-AI-Agent Solution

    Authors: Yihao Zhang, Qizhi Qiu, Xiaomin Liu, Dianxuan Fu, Xingyu Liu, Leyan Fei, Yuming Cheng, Lilin Yi, Weisheng Hu, Qunbi Zhuge

    Abstract: We demonstrate the first cross-domain cross-layer level-4 autonomous optical network via a multi-AI-agent system. Field trials show 98 percent task completion rate across the distributed AI training lifecycle-3.2x higher than single agents using state-of-the-art LLMs.

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Submitted to the PDP session of the Optical Fiber Communications Conference (OFC) 2025

  10. arXiv:2503.18337  [pdf, other

    cs.CV

    Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models

    Authors: Zichen Miao, Wei Chen, Qiang Qiu

    Abstract: Transformer-based large pre-trained models have shown remarkable generalization ability, and various parameter-efficient fine-tuning (PEFT) methods have been proposed to customize these models on downstream tasks with minimal computational and memory budgets. Previous PEFT methods are primarily designed from a tensor-decomposition perspective that tries to effectively tune the linear transformatio… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  11. arXiv:2503.13542  [pdf, other

    cs.LG cs.AI

    HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets

    Authors: Lulu Ban, Tao Zhu, Xiangqing Lu, Qi Qiu, Wenyong Han, Shuangjian Li, Liming Chen, Kevin I-Kai Wang, Mingxing Nie, Yaping Wan

    Abstract: Cross-dataset Human Activity Recognition (HAR) suffers from limited model generalization, hindering its practical deployment. To address this critical challenge, inspired by the success of DoReMi in Large Language Models (LLMs), we introduce a data mixture optimization strategy for pre-training HAR models, aiming to improve the recognition performance across heterogeneous datasets. However, direct… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2503.08652  [pdf, other

    cs.LG

    Extra Clients at No Extra Cost: Overcome Data Heterogeneity in Federated Learning with Filter Decomposition

    Authors: Wei Chen, Qiang Qiu

    Abstract: Data heterogeneity is one of the major challenges in federated learning (FL), which results in substantial client variance and slow convergence. In this study, we propose a novel solution: decomposing a convolutional filter in FL into a linear combination of filter subspace elements, i.e., filter atoms. This simple technique transforms global filter aggregation in FL into aggregating filter atoms… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  13. arXiv:2503.06339  [pdf, other

    cs.LG cs.CV

    Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

    Authors: Gaurav Patel, Qiang Qiu

    Abstract: Machine Unlearning has recently garnered significant attention, aiming to selectively remove knowledge associated with specific data while preserving the model's performance on the remaining data. A fundamental challenge in this process is balancing effective unlearning with knowledge retention, as naive optimization of these competing objectives can lead to conflicting gradients, hindering conver… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  14. arXiv:2502.01784  [pdf, other

    cs.RO cs.CV

    VILP: Imitation Learning with Latent Video Planning

    Authors: Zhengtong Xu, Qiang Qiu, Yu She

    Abstract: In the era of generative AI, integrating video generation models into robotics opens new possibilities for the general-purpose robot agent. This paper introduces imitation learning with latent video planning (VILP). We propose a latent video diffusion model to generate predictive robot videos that adhere to temporal consistency to a good degree. Our method is able to generate highly time-aligned v… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  15. arXiv:2412.15491  [pdf, other

    cs.CV

    GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D Generators

    Authors: Hengjia Li, Yang Liu, Yibo Zhao, Haoran Cheng, Yang Yang, Linxuan Xia, Zekai Luo, Qibo Qiu, Boxi Wu, Tu Zheng, Zheng Yang, Deng Cai

    Abstract: Recently, 3D generative domain adaptation has emerged to adapt the pre-trained generator to other domains without collecting massive datasets and camera pose distributions. Typically, they leverage large-scale pre-trained text-to-image diffusion models to synthesize images for the target domain and then fine-tune the 3D model. However, they suffer from the tedious pipeline of data generation, whic… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

  16. arXiv:2412.05840  [pdf, other

    cs.CV

    LVP-CLIP:Revisiting CLIP for Continual Learning with Label Vector Pool

    Authors: Yue Ma, Huantao Ren, Boyu Wang, Jingang Jin, Senem Velipasalar, Qinru Qiu

    Abstract: Continual learning aims to update a model so that it can sequentially learn new tasks without forgetting previously acquired knowledge. Recent continual learning approaches often leverage the vision-language model CLIP for its high-dimensional feature space and cross-modality feature matching. Traditional CLIP-based classification methods identify the most similar text label for a test image by co… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: submitted to CVPR2025

    MSC Class: 68T45 ACM Class: I.2.10; I.4; I.5

  17. arXiv:2411.16713  [pdf, other

    cs.CV

    Conditional Text-to-Image Generation with Reference Guidance

    Authors: Taewook Kim, Ze Wang, Zhengyuan Yang, Jiang Wang, Lijuan Wang, Zicheng Liu, Qiang Qiu

    Abstract: Text-to-image diffusion models have demonstrated tremendous success in synthesizing visually stunning images given textual instructions. Despite remarkable progress in creating high-fidelity visuals, text-to-image models can still struggle with precisely rendering subjects, such as text spelling. To address this challenge, this paper explores using additional conditions of an image that provides v… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

  18. arXiv:2411.16120  [pdf, other

    cs.AI cs.LG

    Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks

    Authors: Rui Zuo, Zifan Wang, Simon Khan, Garrett Ethan Katz, Qinru Qiu

    Abstract: Due to the inherent lack of transparency in deep neural networks, it is challenging for deep reinforcement learning (DRL) agents to gain trust and acceptance from users, especially in safety-critical applications such as medical diagnosis and military operations. Existing methods for explaining an agent's decision either require to retrain the agent using models that support explanation generation… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  19. arXiv:2411.15236  [pdf, other

    cs.CV cs.LG

    Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps

    Authors: Jeeyung Kim, Erfan Esmaeili, Qiang Qiu

    Abstract: In text-to-image diffusion models, the cross-attention map of each text token indicates the specific image regions attended. Comparing these maps of syntactically related tokens provides insights into how well the generated image reflects the text prompt. For example, in the prompt, "a black car and a white clock", the cross-attention maps for "black" and "car" should focus on overlapping regions… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  20. arXiv:2410.17277  [pdf, other

    quant-ph cs.NE

    A practical applicable quantum-classical hybrid ant colony algorithm for the NISQ era

    Authors: Qian Qiu, Liang Zhang, Mohan Wu, Qichun Sun, Xiaogang Li, Da-Chuang Li, Hua Xu

    Abstract: Quantum ant colony optimization (QACO) has drew much attention since it combines the advantages of quantum computing and ant colony optimization (ACO) algorithm overcoming some limitations of the traditional ACO algorithm. However,due to the hardware resource limitations of currently available quantum computers, the practical application of the QACO is still not realized. In this paper, we develop… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.00367

  21. arXiv:2410.15730  [pdf, other

    cs.RO

    MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation

    Authors: Yu Sheng, Runfeng Lin, Lidian Wang, Quecheng Qiu, YanYong Zhang, Yu Zhang, Bei Hua, Jianmin Ji

    Abstract: Combining accurate geometry with rich semantics has been proven to be highly effective for language-guided robotic manipulation. Existing methods for dynamic scenes either fail to update in real-time or rely on additional depth sensors for simple scene editing, limiting their applicability in real-world. In this paper, we introduce MSGField, a representation that uses a collection of 2D Gaussians… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  22. arXiv:2410.10058  [pdf, other

    cs.CV

    Learning to Customize Text-to-Image Diffusion In Diverse Context

    Authors: Taewook Kim, Wei Chen, Qiang Qiu

    Abstract: Most text-to-image customization techniques fine-tune models on a small set of \emph{personal concept} images captured in minimal contexts. This often results in the model becoming overfitted to these training images and unable to generalize to new contexts in future text prompts. Existing customization methods are built on the success of effectively representing personal concepts as textual embed… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  23. arXiv:2410.03190  [pdf, other

    cs.CV

    Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

    Authors: Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu

    Abstract: Recent advancements in timestep-distilled diffusion models have enabled high-quality image generation that rivals non-distilled multi-step models, but with significantly fewer inference steps. While such models are attractive for applications due to the low inference cost and latency, fine-tuning them with a naive diffusion objective would result in degraded and blurry outputs. An intuitive altern… ▽ More

    Submitted 2 March, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  24. arXiv:2410.02078  [pdf, other

    stat.ML cs.CV cs.LG

    Posterior sampling via Langevin dynamics based on generative priors

    Authors: Vishal Purohit, Matthew Repasky, Jianfeng Lu, Qiang Qiu, Yao Xie, Xiuyuan Cheng

    Abstract: Posterior sampling in high-dimensional spaces using generative models holds significant promise for various applications, including but not limited to inverse problems and guided generation tasks. Despite many recent developments, generating diverse posterior samples remains a challenge, as existing methods require restarting the entire generative process for each new sample, making the procedure… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  25. arXiv:2409.14723  [pdf, other

    cs.RO

    ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Using Lightweight Polygon Maps

    Authors: Haiming Gao, Qibo Qiu, Hongyan Liu, Dingkun Liang, Chaoqun Wang, Xuebo Zhang

    Abstract: This paper presents an effective and reliable pose tracking solution, termed ERPoT, for mobile robots operating in large-scale outdoor and challenging indoor environments, underpinned by an innovative prior polygon map. Especially, to overcome the challenge that arises as the map size grows with the expansion of the environment, the novel form of a prior map composed of multiple polygons is propos… ▽ More

    Submitted 23 May, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: 20 pages, 21 figures

    ACM Class: I.2.9

  26. arXiv:2409.01075  [pdf, other

    cs.DC

    Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

    Authors: Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng

    Abstract: Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely heavily on predefined samples to guide the compilation process, which restricts their adaptability and efficiency. These sample-driven methods struggle to effi… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  27. arXiv:2407.14872  [pdf, other

    cs.CV cs.RO

    Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

    Authors: Yanting Yang, Minghao Chen, Qibo Qiu, Jiahao Wu, Wenxiao Wang, Binbin Lin, Ziyu Guan, Xiaofei He

    Abstract: For a general-purpose robot to operate in reality, executing a broad range of instructions across various environments is imperative. Central to the reinforcement learning and planning for such robotic agents is a generalizable reward function. Recent advances in vision-language models, such as CLIP, have shown remarkable performance in the domain of deep learning, paving the way for open-domain v… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 camera-ready

  28. arXiv:2407.08947  [pdf, other

    cs.LG cs.CV

    Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort

    Authors: Jeeyung Kim, Ze Wang, Qiang Qiu

    Abstract: Enhancing model interpretability can address spurious correlations by revealing how models draw their predictions. Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts, albeit at a high cost of human efforts in data annotation. In this paper, we leverage a synergy of multiple foundation models to construct CBM… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  29. arXiv:2407.03668  [pdf, other

    cs.LG eess.SY

    Reliable Projection Based Unsupervised Learning for Semi-Definite QCQP with Application of Beamforming Optimization

    Authors: Xiucheng Wang, Qi Qiu, Nan Cheng

    Abstract: In this paper, we investigate a special class of quadratic-constrained quadratic programming (QCQP) with semi-definite constraints. Traditionally, since such a problem is non-convex and N-hard, the neural network (NN) is regarded as a promising method to obtain a high-performing solution. However, due to the inherent prediction error, it is challenging to ensure all solution output by the NN is fe… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  30. Multi-agent Cooperative Games Using Belief Map Assisted Training

    Authors: Qinwei Huang, Chen Luo, Alex B. Wu, Simon Khan, Hai Li, Qinru Qiu

    Abstract: In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learn… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Journal ref: ECAI 2023. IOS Press, 2023: 1617-1624

  31. arXiv:2406.18569  [pdf, other

    cs.CV cs.AI

    FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

    Authors: Qi Qiu, Tao Zhu, Furong Duan, Kevin I-Kai Wang, Liming Chen, Mingxing Nie, Mingxing Nie

    Abstract: Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  32. arXiv:2406.13897  [pdf, other

    cs.CV

    CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

    Authors: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu

    Abstract: In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

    Comments: Project page: https://sites.google.com/view/clay-3dlm Video: https://youtu.be/YcKFp4U2Voo

  33. arXiv:2406.13568  [pdf, other

    cs.AI

    Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

    Authors: Yuhao Pan, Xiucheng Wang, Nan Cheng, Qi Qiu

    Abstract: With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  34. arXiv:2406.02075  [pdf, other

    cs.LG cs.NE

    ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU

    Authors: Qi Qiu, Tao Zhu, Helin Gong, Liming Chen, Huansheng Ning

    Abstract: Limited by the complexity of basis function (B-spline) calculations, Kolmogorov-Arnold Networks (KAN) suffer from restricted parallel computing capability on GPUs. This paper proposes a novel ReLU-KAN implementation that inherits the core idea of KAN. By adopting ReLU (Rectified Linear Unit) and point-wise multiplication, we simplify the design of KAN's basis function and optimize the computation… ▽ More

    Submitted 12 August, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  35. arXiv:2404.00879  [pdf, other

    cs.CV

    Model-Agnostic Human Preference Inversion in Diffusion Models

    Authors: Jeeyung Kim, Ze Wang, Qiang Qiu

    Abstract: Efficient text-to-image generation remains a challenging task due to the high computational costs associated with the multi-step sampling in diffusion models. Although distillation of pre-trained diffusion models has been successful in reducing sampling steps, low-step image generation often falls short in terms of quality. In this study, we propose a novel sampling design to achieve high-quality… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  36. arXiv:2403.19066  [pdf, other

    cs.CV cs.AI

    Generative Quanta Color Imaging

    Authors: Vishal Purohit, Junjie Luo, Yiheng Chi, Qi Guo, Stanley H. Chan, Qiang Qiu

    Abstract: The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently fi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  37. arXiv:2403.00269  [pdf, other

    cs.CV cs.LG

    Large Convolutional Model Tuning via Filter Subspace

    Authors: Wei Chen, Zichen Miao, Qiang Qiu

    Abstract: Efficient fine-tuning methods are critical to address the high computational and parameter complexity while adapting large pre-trained models to downstream tasks. Our study is inspired by prior research that represents each convolution filter as a linear combination of a small set of filter subspace elements, referred to as filter atoms. In this paper, we propose to fine-tune pre-trained models by… ▽ More

    Submitted 25 February, 2025; v1 submitted 29 February, 2024; originally announced March 2024.

  38. arXiv:2402.11025  [pdf, other

    cs.LG stat.ML

    Training Bayesian Neural Networks with Sparse Subspace Variational Inference

    Authors: Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang

    Abstract: Bayesian neural networks (BNNs) offer uncertainty quantification but come with the downside of substantially increased training and inference costs. Sparse BNNs have been investigated for efficient inference, typically by either slowly introducing sparsity throughout the training or by post-training compression of dense BNNs. The dilemma of how to cut down massive training costs remains, particula… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Journal ref: Published at International Conference on Learning Representations (ICLR) 2024

  39. arXiv:2402.08936  [pdf, other

    cs.CV

    Predictive Temporal Attention on Event-based Video Stream for Energy-efficient Situation Awareness

    Authors: Yiming Bu, Jiayang Liu, Qinru Qiu

    Abstract: The Dynamic Vision Sensor (DVS) is an innovative technology that efficiently captures and encodes visual information in an event-driven manner. By combining it with event-driven neuromorphic processing, the sparsity in DVS camera output can result in high energy efficiency. However, similar to many embedded systems, the off-chip communication between the camera and processor presents a bottleneck… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  40. arXiv:2401.13076  [pdf, other

    cs.RO cs.CV

    SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization

    Authors: Mingyang Li, Yue Ma, Qinru Qiu

    Abstract: Current techniques in Visual Simultaneous Localization and Mapping (VSLAM) estimate camera displacement by comparing image features of consecutive scenes. These algorithms depend on scene continuity, hence requires frequent camera inputs. However, processing images frequently can lead to significant memory usage and computation overhead. In this study, we introduce SemanticSLAM, an end-to-end visu… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 2023 IEEE Symposium Series on Computational Intelligence (SSCI) 6 pages

    Journal ref: 2023 IEEE Symposium Series on Computational Intelligence (SSCI)

  41. arXiv:2312.12721  [pdf, other

    cs.CV

    Cross-Modal Reasoning with Event Correlation for Video Question Answering

    Authors: Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Qinru Qiu, Jian Tang

    Abstract: Video Question Answering (VideoQA) is a very attractive and challenging research direction aiming to understand complex semantics of heterogeneous data from two domains, i.e., the spatio-temporal video content and the word sequence in question. Although various attention mechanisms have been utilized to manage contextualized representations by modeling intra- and inter-modal relationships of the t… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  42. arXiv:2312.11933  [pdf, other

    cs.LG cs.AI

    Dynamic Frequency Domain Graph Convolutional Network for Traffic Forecasting

    Authors: Yujie Li, Zezhi Shao, Yongjun Xu, Qiang Qiu, Zhaogang Cao, Fei Wang

    Abstract: Complex spatial dependencies in transportation networks make traffic prediction extremely challenging. Much existing work is devoted to learning dynamic graph structures among sensors, and the strategy of mining spatial dependencies from traffic data, known as data-driven, tends to be an intuitive and effective approach. However, Time-Shift of traffic patterns and noise induced by random factors h… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  43. arXiv:2311.00240  [pdf, other

    cs.CR

    Intell-dragonfly: A Cybersecurity Attack Surface Generation Engine Based On Artificial Intelligence-generated Content Technology

    Authors: Xingchen Wu, Qin Qiu, Jiaqi Li, Yang Zhao

    Abstract: With the rapid development of the Internet, cyber security issues have become increasingly prominent. Traditional cyber security defense methods are limited in the face of ever-changing threats, so it is critical to seek innovative attack surface generation methods. This study proposes Intell-dragonfly, a cyber security attack surface generation engine based on artificial intelligence generation t… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: 17 pages, 7 figures, 2 tables

  44. arXiv:2310.13295  [pdf, other

    cs.RO cs.AI

    PathRL: An End-to-End Path Generation Method for Collision Avoidance via Deep Reinforcement Learning

    Authors: Wenhao Yu, Jie Peng, Quecheng Qiu, Hanyu Wang, Lu Zhang, Jianmin Ji

    Abstract: Robot navigation using deep reinforcement learning (DRL) has shown great potential in improving the performance of mobile robots. Nevertheless, most existing DRL-based navigation methods primarily focus on training a policy that directly commands the robot with low-level controls, like linear and angular velocities, which leads to unstable speeds and unsmooth trajectories of the robot during the l… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  45. arXiv:2310.08370  [pdf, other

    cs.CV

    UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

    Authors: Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang

    Abstract: In the context of autonomous driving, the significance of effective feature learning is widely acknowledged. While conventional 3D self-supervised pre-training methods have shown widespread success, most methods follow the ideas originally designed for 2D images. In this paper, we present UniPAD, a novel self-supervised learning paradigm applying 3D volumetric differentiable rendering. UniPAD impl… ▽ More

    Submitted 7 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: CVPR2024

  46. arXiv:2309.13235  [pdf, other

    cs.CV

    M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

    Authors: Qibo Qiu, Honghui Yang, Wenxiao Wang, Shun Zhang, Haiming Gao, Haochao Ying, Wei Hua, Xiaofei He

    Abstract: Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  47. arXiv:2308.04118  [pdf, other

    cs.CV cs.MM

    Multimodal Color Recommendation in Vector Graphic Documents

    Authors: Qianru Qiu, Xueting Wang, Mayu Otani

    Abstract: Color selection plays a critical role in graphic document design and requires sufficient consideration of various contexts. However, recommending appropriate colors which harmonize with the other colors and textual contexts in documents is a challenging task, even for experienced designers. In this study, we propose a multimodal masked color model that integrates both color and textual contexts to… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023

  48. arXiv:2308.00287  [pdf, other

    cs.CV cs.LG

    A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

    Authors: Minghao Chen, Zepeng Gao, Shuai Zhao, Qibo Qiu, Wenxiao Wang, Binbin Lin, Xiaofei He

    Abstract: Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the met… ▽ More

    Submitted 18 September, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

  49. arXiv:2307.11314  [pdf

    cs.NE cs.LG

    Neuromorphic Online Learning for Spatiotemporal Patterns with a Forward-only Timeline

    Authors: Zhenhang Zhang, Jingang Jin, Haowen Fang, Qinru Qiu

    Abstract: Spiking neural networks (SNNs) are bio-plausible computing models with high energy efficiency. The temporal dynamics of neurons and synapses enable them to detect temporal patterns and generate sequences. While Backpropagation Through Time (BPTT) is traditionally used to train SNNs, it is not suitable for online learning of embedded applications due to its high computation and memory cost as well… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 9 pages,8 figures

  50. arXiv:2306.01205  [pdf, other

    cs.CV

    SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition

    Authors: Qibo Qiu, Wenxiao Wang, Haochao Ying, Dingkun Liang, Haiming Gao, Xiaofei He

    Abstract: Point cloud-based place recognition is crucial for mobile robots and autonomous vehicles, especially when the global positioning sensor is not accessible. LiDAR points are scattered on the surface of objects and buildings, which have strong shape priors along different axes. To enhance message passing along particular axes, Stacked Asymmetric Convolution Block (SACB) is designed, which is one of t… ▽ More

    Submitted 23 September, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted by Knowledge-Based Systems