Skip to main content

Showing 1–50 of 302 results for author: Zheng, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10191  [pdf

    physics.ao-ph cs.AI cs.LG nlin.CD

    LanTu: Dynamics-Enhanced Deep Learning for Eddy-Resolving Ocean Forecasting

    Authors: Qingyu Zheng, Qi Shao, Guijun Han, Wei Li, Hong Li, Xuan Wang

    Abstract: Mesoscale eddies dominate the spatiotemporal multiscale variability of the ocean, and their impact on the energy cascade of the global ocean cannot be ignored. Eddy-resolving ocean forecasting is providing more reliable protection for fisheries and navigational safety, but also presents significant scientific challenges and high computational costs for traditional numerical models. Artificial inte… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 22 pages, 6 figures

  2. arXiv:2505.06536  [pdf, other

    cs.CV cs.AI

    TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition

    Authors: Feng Liu, Ziwang Fu, Yunlong Wang, Qijian Zheng

    Abstract: The fusion technique is the key to the multimodal emotion recognition task. Recently, cross-modal attention-based fusion methods have demonstrated high performance and strong robustness. However, cross-modal attention suffers from redundant features and does not capture complementary features well. We find that it is not necessary to use the entire information of one modality to reinforce the othe… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2111.02172

  3. arXiv:2505.05530  [pdf, other

    cs.LG cs.AI

    Low-bit Model Quantization for Deep Neural Networks: A Survey

    Authors: Kai Liu, Qian Zheng, Kaiwen Tao, Zhiteng Li, Haotong Qin, Wenbo Li, Yong Guo, Xianglong Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

    Abstract: With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an effective weight-lighting technique, has become an indispensable procedure in the whole deployment pipeline. The essence of quantization acceleration is the conver… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: We have systematically collected and reviewed the state-of-the-art quantization methods from the past five years, categorizing them into eight distinct groups. A curated list of model quantization is provided at https://github.com/Kai-Liu001/Awesome-Model-Quantization

  4. arXiv:2504.12216  [pdf, other

    cs.CL cs.LG

    d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

    Authors: Siyan Zhao, Devaansh Gupta, Qinqing Zheng, Aditya Grover

    Abstract: Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR) generation paradigm. In contrast, non-autoregressive paradigms based on diffusion generate text in a coarse-to-fine manner. Although recent diffusion-based large la… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 25 pages, project page at https://dllm-reasoning.github.io/

  5. arXiv:2504.03101  [pdf, other

    cs.CL

    Single-Pass Document Scanning for Question Answering

    Authors: Weili Cao, Jianyou Wang, Youze Zheng, Longtian Bao, Qirui Zheng, Taylor Berg-Kirkpatrick, Ramamohan Paturi, Leon Bergen

    Abstract: Handling extremely large documents for question answering is challenging: chunk-based embedding methods often lose track of important global context, while full-context transformers can be prohibitively expensive for hundreds of thousands of tokens. We propose a single-pass document scanning approach that processes the entire text in linear time, preserving global coherence while deciding which se… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  6. arXiv:2503.24008  [pdf, other

    cs.CV cs.AI

    H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding

    Authors: Qi Wu, Quanlong Zheng, Yanhao Zhang, Junlin Xie, Jinguo Luo, Kuo Wang, Peng Liu, Qingsong Xie, Ru Zhen, Haonan Lu, Zhenyu Yang

    Abstract: With the rapid development of multimodal models, the demand for assessing video understanding capabilities has been steadily increasing. However, existing benchmarks for evaluating video understanding exhibit significant limitations in coverage, task diversity, and scene adaptability. These shortcomings hinder the accurate assessment of models' comprehensive video understanding capabilities. To ta… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  7. arXiv:2503.21246  [pdf, other

    cs.CV

    DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation

    Authors: Haoyu Zhao, Zhongang Qi, Cong Wang, Qingping Zheng, Guansong Lu, Fei Chen, Hang Xu, Zuxuan Wu

    Abstract: Human image animation has recently gained significant attention due to advancements in generative models. However, existing methods still face two major challenges: (1) architectural limitations, most models rely on U-Net, which underperforms compared to the MM-DiT; and (2) the neglect of textual information, which can enhance controllability. In this work, we introduce DynamiCtrl, a novel framewo… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 11 pages, 10 figures

  8. arXiv:2503.18434  [pdf, other

    cs.CV

    A Simple yet Effective Layout Token in Large Language Models for Document Understanding

    Authors: Zhaoqing Zhu, Chuwei Luo, Zirui Shao, Feiyu Gao, Hangdi Xing, Qi Zheng, Ji Zhang

    Abstract: Recent methods that integrate spatial layouts with text for document understanding in large language models (LLMs) have shown promising results. A commonly used method is to represent layout information as text tokens and interleave them with text content as inputs to the LLMs. However, such a method still demonstrates limitations, as it requires additional position IDs for tokens that are used to… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  9. arXiv:2503.18130  [pdf, other

    cs.LG cs.AI

    Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization

    Authors: Juntao Dai, Taiye Chen, Yaodong Yang, Qian Zheng, Gang Pan

    Abstract: Reinforcement learning from human feedback (RLHF) is an effective method for aligning large language models (LLMs) with human values. However, reward over-optimization remains an open challenge leading to discrepancies between the performance of LLMs under the reward model and the true human objectives. A primary contributor to reward over-optimization is the extrapolation error that arises when t… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Published as a conference paper at ICLR 2025

  10. arXiv:2503.14512  [pdf

    q-bio.QM cs.LG stat.AP stat.ML

    Machine learning algorithms to predict stroke in China based on causal inference of time series analysis

    Authors: Qizhi Zheng, Ayang Zhao, Xinzhu Wang, Yanhong Bai, Zikun Wang, Xiuying Wang, Xianzhang Zeng, Guanghui Dong

    Abstract: Participants: This study employed a combination of Vector Autoregression (VAR) model and Graph Neural Networks (GNN) to systematically construct dynamic causal inference. Multiple classic classification algorithms were compared, including Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting, and Multi Layer Perceptron (MLP). The SMO… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 17 pages

  11. arXiv:2503.11946  [pdf, other

    cs.DC

    CCRSat: A Collaborative Computation Reuse Framework for Satellite Edge Computing Networks

    Authors: Ye Zhang, Zhishu Shen, Dawen Jiang, Xiangrui Liu, Qiushi Zheng, Jiong Jin

    Abstract: In satellite computing applications, such as remote sensing, tasks often involve similar or identical input data, leading to the same processing results. Computation reuse is an emerging paradigm that leverages the execution results of previous tasks to enhance the utilization of computational resources. While this paradigm has been extensively studied in terrestrial networks with abundant computi… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  12. arXiv:2503.07371  [pdf

    cs.CV

    HGO-YOLO: Advancing Anomaly Behavior Detection with Hierarchical Features and Lightweight Optimized Detection

    Authors: Qizhi Zheng, Zhongze Luo, Meiyan Guo, Xinzhu Wang, Renqimuge Wu, Qiu Meng, Guanghui Dong

    Abstract: Accurate and real-time object detection is crucial for anomaly behavior detection, especially in scenarios constrained by hardware limitations, where balancing accuracy and speed is essential for enhancing detection performance. This study proposes a model called HGO-YOLO, which integrates the HGNetv2 architecture into YOLOv8. This combination expands the receptive field and captures a wider range… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 10 pages

  13. arXiv:2503.07077  [pdf, other

    cs.AI

    Rule-Based Conflict-Free Decision Framework in Swarm Confrontation

    Authors: Zhaoqi Dong, Zhinan Wang, Quanqi Zheng, Bin Xu, Lei Chen, Jinhu Lv

    Abstract: Traditional rule-based decision-making methods with interpretable advantage, such as finite state machine, suffer from the jitter or deadlock(JoD) problems in extremely dynamic scenarios. To realize agent swarm confrontation, decision conflicts causing many JoD problems are a key issue to be solved. Here, we propose a novel decision-making framework that integrates probabilistic finite state machi… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  14. arXiv:2503.06163  [pdf, other

    cs.AI cs.CV stat.AP

    VACT: A Video Automatic Causal Testing System and a Benchmark

    Authors: Haotong Yang, Qingyuan Zheng, Yunjian Gao, Yongkun Yang, Yangbo He, Zhouchen Lin, Muhan Zhang

    Abstract: With the rapid advancement of text-conditioned Video Generation Models (VGMs), the quality of generated videos has significantly improved, bringing these models closer to functioning as ``*world simulators*'' and making real-world-level video generation more accessible and cost-effective. However, the generated videos often contain factual inaccuracies and lack understanding of fundamental physica… ▽ More

    Submitted 19 April, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: A preliminary version of this paper has been accepted by workshop SCSL@ICLR 2025

  15. arXiv:2503.06012  [pdf, other

    cs.CV

    End-to-End HOI Reconstruction Transformer with Graph-based Encoding

    Authors: Zhenrong Wang, Qi Zheng, Sihan Ma, Maosheng Ye, Yibing Zhan, Dongjiang Li

    Abstract: With the diversification of human-object interaction (HOI) applications and the success of capturing human meshes, HOI reconstruction has gained widespread attention. Existing mainstream HOI reconstruction methods often rely on explicitly modeling interactions between humans and objects. However, such a way leads to a natural conflict between 3D mesh reconstruction, which emphasizes global structu… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  16. arXiv:2503.01710  [pdf, other

    cs.SD cs.AI eess.AS

    Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

    Authors: Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue

    Abstract: Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a sin… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Submitted to ACL 2025

  17. arXiv:2502.20668  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

    Authors: Xiang Xiang, Zhuo Xu, Yao Deng, Qinhao Zhou, Yifan Liang, Ke Chen, Qingfang Zheng, Yaowei Wang, Xilin Chen, Wen Gao

    Abstract: In open-world remote sensing, deployed models must continuously adapt to a steady influx of new data, which often exhibits various shifts compared to what the model encountered during the training phase. To effectively handle the new data, models are required to detect semantic shifts, adapt to covariate shifts, and continuously update themselves. These challenges give rise to a variety of open-wo… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  18. arXiv:2502.20301  [pdf, other

    cs.CV cs.AI cs.CL

    M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging

    Authors: Jinghao Feng, Qiaoyu Zheng, Chaoyi Wu, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Agentic AI systems have gained significant attention for their ability to autonomously perform complex tasks. However, their reliance on well-prepared tools limits their applicability in the medical domain, which requires to train specialized models. In this paper, we make three contributions: (i) We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medica… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 38 pages, 7 figures

  19. arXiv:2502.13190  [pdf

    cs.LG physics.flu-dyn

    Application of machine learning algorithm in temperature field reconstruction

    Authors: Qianyu He, Huaiwei Sun, Yubo Li, Zhiwen You, Qiming Zheng, Yinghan Huang, Sipeng Zhu, Fengyu Wang

    Abstract: This study focuses on the stratification patterns and dynamic evolution of reservoir water temperatures, aiming to estimate and reconstruct the temperature field using limited and noisy local measurement data. Due to complex measurement environments and technical limitations, obtaining complete temperature information for reservoirs is highly challenging. Therefore, accurately reconstructing the t… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  20. arXiv:2502.12783  [pdf, other

    cs.DC

    FedHC: A Hierarchical Clustered Federated Learning Framework for Satellite Networks

    Authors: Zhuocheng Liu, Zhishu Shen, Pan Zhou, Qiushi Zheng, Jiong Jin

    Abstract: With the proliferation of data-driven services, the volume of data that needs to be processed by satellite networks has significantly increased. Federated learning (FL) is well-suited for big data processing in distributed, resource-constrained satellite environments. However, ensuring its convergence performance while minimizing processing time and energy consumption remains a challenge. To this… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  21. arXiv:2502.10119  [pdf, other

    cs.LG

    SeWA: Selective Weight Average via Probabilistic Masking

    Authors: Peng Wang, Shengchao Hu, Zerui Tao, Guoxia Wang, Dianhai Yu, Li Shen, Quan Zheng, Dacheng Tao

    Abstract: Weight averaging has become a standard technique for enhancing model performance. However, methods such as Stochastic Weight Averaging (SWA) and Latest Weight Averaging (LAWA) often require manually designed procedures to sample from the training trajectory, and the results depend heavily on hyperparameter tuning. To minimize human effort, this paper proposes a simple yet efficient algorithm calle… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  22. arXiv:2502.05034  [pdf, other

    cs.CV

    MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data

    Authors: Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, Jiamin Wu

    Abstract: Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  23. arXiv:2502.03275  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

    Authors: DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng

    Abstract: Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs where many words support textual coherence rather than core reasoning information, and processing these inputs consumes substantial computation resources. In this work, we propo… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  24. arXiv:2502.00318  [pdf, other

    cs.LG math.NA

    Sub-Sequential Physics-Informed Learning with State Space Model

    Authors: Chenhui Xu, Dancheng Liu, Yuting Hu, Jiajie Li, Ruiyang Qin, Qingxiao Zheng, Jinjun Xiong

    Abstract: Physics-Informed Neural Networks (PINNs) are a kind of deep-learning-based numerical solvers for partial differential equations (PDEs). Existing PINNs often suffer from failure modes of being unable to propagate patterns of initial conditions. We discover that these failure modes are caused by the simplicity bias of neural networks and the mismatch between PDE's continuity and PINN's discrete samp… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

  25. arXiv:2501.15068  [pdf, other

    cs.RO

    An Atomic Skill Library Construction Method for Data-Efficient Embodied Manipulation

    Authors: Dongjiang Li, Bo Peng, Chang Li, Ning Qiao, Qi Zheng, Lei Sun, Yusen Qin, Bangguo Li, Yifeng Luan, Bo Wu, Yibing Zhan, Mingang Sun, Tong Xu, Lusong Li, Hui Shen, Xiaodong He

    Abstract: Embodied manipulation is a fundamental ability in the realm of embodied artificial intelligence. Although current embodied manipulation models show certain generalizations in specific settings, they struggle in new environments and tasks due to the complexity and diversity of real-world scenarios. The traditional end-to-end data collection and training manner leads to significant data demands. Dec… ▽ More

    Submitted 5 February, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  26. arXiv:2501.14894  [pdf, other

    cs.CV

    Enhancing accuracy of uncertainty estimation in appearance-based gaze tracking with probabilistic evaluation and calibration

    Authors: Qiaojie Zheng, Jiucai Zhang, Xiaoli Zhang

    Abstract: Accurately knowing uncertainties in appearance-based gaze tracking is critical for ensuring reliable downstream applications. Due to the lack of individual uncertainty labels, current uncertainty-aware approaches adopt probabilistic models to acquire uncertainties by following distributions in the training dataset. Without regulations, this approach lets the uncertainty model build biases and over… ▽ More

    Submitted 17 March, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 9 pages, 7 figures, 2 tables

  27. arXiv:2501.01595  [pdf

    cs.CV

    Adaptive Homophily Clustering: Structure Homophily Graph Learning with Adaptive Filter for Hyperspectral Image

    Authors: Yao Ding, Weijie Kang, Aitao Yang, Zhili Zhang, Junyang Zhao, Jie Feng, Danfeng Hong, Qinhe Zheng

    Abstract: Hyperspectral image (HSI) clustering has been a fundamental but challenging task with zero training labels. Currently, some deep graph clustering methods have been successfully explored for HSI due to their outstanding performance in effective spatial structural information encoding. Nevertheless, insufficient structural information utilization, poor feature presentation ability, and weak graph up… ▽ More

    Submitted 7 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: 14 pages, 8 figure

  28. arXiv:2412.19991  [pdf, other

    cs.LG cs.DC

    A Robust Federated Learning Framework for Undependable Devices at Scale

    Authors: Shilong Wang, Jianchun Liu, Hongli Xu, Chunming Qiao, Huarong Deng, Qiuye Zheng, Jiantao Gong

    Abstract: In a federated learning (FL) system, many devices, such as smartphones, are often undependable (e.g., frequently disconnected from WiFi) during training. Existing FL frameworks always assume a dependable environment and exclude undependable devices from training, leading to poor model performance and resource wastage. In this paper, we propose FLUDE to effectively deal with undependable environmen… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

  29. arXiv:2412.15634  [pdf, other

    cs.SE

    Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model

    Authors: Xin Du, Shifan Ye, Qian Zheng, Yangfan Hu, Rui Yan, Shunyu Qi, Shuyang Chen, Huajin Tang, Gang Pan, Shuiguang Deng

    Abstract: Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters, with inference processes requiring substantial energy and computational resources. In contrast, the human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption, even with a similar number of… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  30. arXiv:2412.14537  [pdf, other

    cs.LG

    ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting

    Authors: Qi Zheng, Zihao Yao, Yaying Zhang

    Abstract: Spatial-temporal forecasting is crucial and widely applicable in various domains such as traffic, energy, and climate. Benefiting from the abundance of unlabeled spatial-temporal data, self-supervised methods are increasingly adapted to learn spatial-temporal representations. However, it encounters three key challenges: 1) the difficulty in selecting reliable negative pairs due to the homogeneity… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 13 pages, 7 pages. Accepted by AAAI2025

  31. CLDG: Contrastive Learning on Dynamic Graphs

    Authors: Yiming Xu, Bin Shi, Teng Ma, Bo Dong, Haoyi Zhou, Qinghua Zheng

    Abstract: The graph with complex annotations is the most potent data type, whose constantly evolving motivates further exploration of the unsupervised dynamic graph representation. One of the representative paradigms is graph contrastive learning. It constructs self-supervised signals by maximizing the mutual information between the statistic graph's augmentation views. However, the semantics and labels may… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted by ICDE2023

  32. arXiv:2412.13477  [pdf

    physics.ao-ph cs.AI cs.CV cs.LG physics.geo-ph

    Generating Unseen Nonlinear Evolution in Sea Surface Temperature Using a Deep Learning-Based Latent Space Data Assimilation Framework

    Authors: Qingyu Zheng, Guijun Han, Wei Li, Lige Cao, Gongfu Zhou, Haowen Wu, Qi Shao, Ru Wang, Xiaobo Wu, Xudong Cui, Hong Li, Xuan Wang

    Abstract: Advances in data assimilation (DA) methods have greatly improved the accuracy of Earth system predictions. To fuse multi-source data and reconstruct the nonlinear evolution missing from observations, geoscientists are developing future-oriented DA methods. In this paper, we redesign a purely data-driven latent space DA framework (DeepDA) that employs a generative artificial intelligence model to c… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 31 pages, 14 figures

  33. arXiv:2412.11138  [pdf, other

    cs.LG cs.AI

    Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

    Authors: Juntao Dai, Yaodong Yang, Qian Zheng, Gang Pan

    Abstract: A key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy, which is crucial for guiding the optimization of safe policy updates. However, the existing Advantage-based Estimation (ABE) method relies on the infinite-horizon discounted advantage function. This dependence leads to catastrophic errors in finite-horizon scenarios with non-disc… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9872-9903, 2024

  34. arXiv:2412.09529  [pdf, other

    cs.CV

    How Well Can Modern LLMs Act as Agent Cores in Radiology Environments?

    Authors: Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: We introduce RadA-BenchPlat, an evaluation platform that benchmarks the performance of large language models (LLMs) act as agent cores in radiology environments using 2,200 radiologist-verified synthetic patient records covering six anatomical regions, five imaging modalities, and 2,200 disease scenarios, resulting in 24,200 question-answer pairs that simulate diverse clinical situations. The plat… ▽ More

    Submitted 7 April, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

  35. arXiv:2412.08210  [pdf, other

    cs.CV eess.IV

    Unicorn: Unified Neural Image Compression with One Number Reconstruction

    Authors: Qi Zheng, Haozhi Wang, Zihao Liu, Jiaming Liu, Peiye Liu, Zhijian Hao, Yanheng Lu, Dimin Niu, Jinjia Zhou, Minge Jing, Yibo Fan

    Abstract: Prevalent lossy image compression schemes can be divided into: 1) explicit image compression (EIC), including traditional standards and neural end-to-end algorithms; 2) implicit image compression (IIC) based on implicit neural representations (INR). The former is encountering impasses of either leveling off bitrate reduction at a cost of tremendous complexity while the latter suffers from excessiv… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  36. arXiv:2412.04508  [pdf, other

    eess.IV cs.CV

    Video Quality Assessment: A Comprehensive Survey

    Authors: Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, Zhengzhong Tu

    Abstract: Video quality assessment (VQA) is an important processing task, aiming at predicting the quality of videos in a manner highly consistent with human judgments of perceived quality. Traditional VQA models based on natural image and/or video statistics, which are inspired both by models of projected images of the real world and by dual models of the human visual system, deliver only limited predictio… ▽ More

    Submitted 11 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

  37. arXiv:2411.15798  [pdf, other

    eess.IV cs.CV

    M3-CVC: Controllable Video Compression with Multimodal Generative Models

    Authors: Rui Wan, Qi Zheng, Yibo Fan

    Abstract: Traditional and neural video codecs commonly encounter limitations in controllability and generality under ultra-low-bitrate coding scenarios. To overcome these challenges, we propose M3-CVC, a controllable video compression framework incorporating multimodal generative models. The framework utilizes a semantic-motion composite strategy for keyframe selection to retain critical information. For ea… ▽ More

    Submitted 25 December, 2024; v1 submitted 24 November, 2024; originally announced November 2024.

    Comments: Accepted to ICASSP 2025

  38. arXiv:2411.12248  [pdf, other

    cs.CV

    Neuro-3D: Towards 3D Visual Decoding from EEG Signals

    Authors: Zhanqiang Guo, Jiamin Wu, Yonghao Song, Jiahui Bu, Weijian Mai, Qihao Zheng, Wanli Ouyang, Chunfeng Song

    Abstract: Human's perception of the visual world is shaped by the stereo processing of 3D information. Understanding how the brain perceives and processes 3D visual stimuli in the real world has been a longstanding endeavor in neuroscience. Towards this goal, we introduce a new neuroscience task: decoding 3D visual perception from EEG signals, a neuroimaging technique that enables real-time monitoring of ne… ▽ More

    Submitted 21 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

  39. arXiv:2411.10815  [pdf, other

    cs.DC

    Collaborative UAVs Multi-task Video Processing Optimization Based on Enhanced Distributed Actor-Critic Networks

    Authors: Ziqi Rong, Qiushi Zheng, Zhishu Shen, Xiaolong Li, Tiehua Zhang, Zheng Lei, Jiong Jin

    Abstract: With the rapid advancement of the Internet of Things (IoT) and Artificial Intelligence (AI), intelligent information services are being increasingly integrated across various sectors, including healthcare, industry, and transportation. Traditional solutions rely on centralized cloud processing, which encounters considerable challenges in fulfilling the Quality of Service (QoS) requirements of Comp… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

  40. arXiv:2411.07722  [pdf, other

    cs.AI

    Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding

    Authors: Zirui Shao, Chuwei Luo, Zhaoqing Zhu, Hangdi Xing, Zhi Yu, Qi Zheng, Jiajun Bu

    Abstract: Multimodal large language models (MLLMs) have shown impressive capabilities in document understanding, a rapidly growing research area with significant industrial demand in recent years. As a multimodal task, document understanding requires models to possess both perceptual and cognitive abilities. However, current MLLMs often face conflicts between perception and cognition. Taking a document VQA… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Preprint

  41. arXiv:2411.06137  [pdf, other

    cs.CR cs.DC

    A Sharded Blockchain-Based Secure Federated Learning Framework for LEO Satellite Networks

    Authors: Wenbo Wu, Cheng Tan, Kangcheng Yang, Zhishu Shen, Qiushi Zheng, Jiong Jin

    Abstract: Low Earth Orbit (LEO) satellite networks are increasingly essential for space-based artificial intelligence (AI) applications. However, as commercial use expands, LEO satellite networks face heightened cyberattack risks, especially through satellite-to-satellite communication links, which are more vulnerable than ground-based connections. As the number of operational satellites continues to grow,… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

  42. arXiv:2410.23841  [pdf, other

    cs.IR

    Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

    Authors: Jianqun Zhou, Yuanlei Zheng, Wei Chen, Qianqian Zheng, Hui Su, Wei Zhang, Rui Meng, Xiaoyu Shen

    Abstract: Instruction-following capabilities in LLMs have progressed significantly, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these p… ▽ More

    Submitted 5 March, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

  43. arXiv:2410.23022  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

    Authors: Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos

    Abstract: Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended exploration, and hierarchical skill design. Recent works have made promising steps by exploiting the prior knowledge of large language models (LLMs). However, these approaches suffer from important limitations: t… ▽ More

    Submitted 17 December, 2024; v1 submitted 30 October, 2024; originally announced October 2024.

  44. arXiv:2410.20253  [pdf, other

    cs.CE

    Application of an ANN and LSTM-based Ensemble Model for Stock Market Prediction

    Authors: Fang Liu, Shaobo Guo, Qianwen Xing, Xinye Sha, Ying Chen, Yuhui Jin, Qi Zheng, Chang Yu

    Abstract: Stock trading has always been a key economic indicator in modern society and a primary source of profit for financial giants such as investment banks, quantitative trading firms, and hedge funds. Discovering the underlying patterns within the seemingly volatile yet intrinsically structured economic activities has become a central focus of research for many companies. Our study leverages widely-use… ▽ More

    Submitted 13 November, 2024; v1 submitted 26 October, 2024; originally announced October 2024.

    Comments: This paper is accepted by ICISCAE 2024

    Report number: AE094

  45. arXiv:2410.20186  [pdf

    cs.CE

    SeisGPT: A Physics-Informed Data-Driven Large Model for Real-Time Seismic Response Prediction

    Authors: Shiqiao Meng, Ying Zhou, Qinghua Zheng, Bingxu Liao, Mushi Chang, Tianshu Zhang, Abderrahim Djerrad

    Abstract: Accurately predicting the dynamic responses of building structures under seismic loads is essential for ensuring structural safety and minimizing potential damage. This critical aspect of structural analysis allows engineers to evaluate how structures perform under various loading conditions, facilitating informed design and safety decisions. Traditional methods, which rely on complex finite eleme… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

    Comments: 23 pages, 6 figures

  46. arXiv:2410.19473  [pdf, other

    cs.RO

    A Robust and Efficient Visual-Inertial Initialization with Probabilistic Normal Epipolar Constraint

    Authors: Changshi Mu, Daquan Feng, Qi Zheng, Yuan Zhuang

    Abstract: Accurate and robust initialization is essential for Visual-Inertial Odometry (VIO), as poor initialization can severely degrade pose accuracy. During initialization, it is crucial to estimate parameters such as accelerometer bias, gyroscope bias, initial velocity, gravity, etc. Most existing VIO initialization methods adopt Structure from Motion (SfM) to solve for gyroscope bias. However, SfM is n… ▽ More

    Submitted 18 February, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: Accepted by RA-L

  47. arXiv:2410.09918  [pdf, other

    cs.AI cs.LG cs.LO

    Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

    Authors: DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng

    Abstract: In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative System 2. Recent studies have shown that incorporating System 2 process into Transformers including large language models (LLMs), significantly enhances their reasoning capabilities. Nevertheless, models that purely resemble System 2 thinking require substantia… ▽ More

    Submitted 10 April, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

  48. arXiv:2410.07266  [pdf, other

    cs.CV

    Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting

    Authors: Weixing Zhang, Zongrui Li, De Ma, Huajin Tang, Xudong Jiang, Qian Zheng, Gang Pan

    Abstract: 3D Gaussian Splatting is capable of reconstructing 3D scenes in minutes. Despite recent advances in improving surface reconstruction accuracy, the reconstructed results still exhibit bias and suffer from inefficiency in storage and training. This paper provides a different observation on the cause of the inefficiency and the reconstruction bias, which is attributed to the integration of the low-op… ▽ More

    Submitted 3 December, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  49. arXiv:2410.07265  [pdf, other

    cs.AR cs.AI cs.LG cs.SE

    A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

    Authors: Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai, Li, Yiran Chen

    Abstract: The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language processing and moving towards multi-modal functionality. These models are increasingly integrated into diverse applications, impacting both research and industry. However, their development and deployment present substan… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted by IEEE Circuits and Systems Magazine

  50. arXiv:2410.05938  [pdf, other

    cs.CV cs.AI

    EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

    Authors: Yifei Xing, Xiangyuan Lan, Ruiping Wang, Dongmei Jiang, Wenjun Huang, Qingfang Zheng, Yaowei Wang

    Abstract: Mamba-based architectures have shown to be a promising new direction for deep learning models owing to their competitive performance and sub-quadratic deployment speed. However, current Mamba multi-modal large language models (MLLM) are insufficient in extracting visual features, leading to imbalanced cross-modal alignment between visual and textural latents, negatively impacting performance on mu… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.