Skip to main content

Showing 1–50 of 584 results for author: Zheng, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09107  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.SR cs.DC

    Architecture of Tianyu Software: Relative Photometry as a Case Study

    Authors: Yicheng Rui, Yifan Xuan, Shuyue Zheng, Kexin Li, Kaiming Cui, Kai Xiao, Jie Zheng, Jun Kai Ng, Hongxuan Jiang, Fabo Feng, Qinghui Sun

    Abstract: Tianyu telescope, an one-meter robotic optical survey instrument to be constructed in Lenghu, Qinghai, China, is designed for detecting transiting exoplanets, variable stars and transients. It requires a highly automated, optimally distributed, easily extendable, and highly flexible software to enable the data processing for the raw data at rates exceeding 500MB/s. In this work, we introduce the a… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: 18 pages, 10 figures, 6 tables, accepted for publication in PASP

  2. arXiv:2505.08448  [pdf, ps, other

    cs.MA

    Scalable UAV Multi-Hop Networking via Multi-Agent Reinforcement Learning with Large Language Models

    Authors: Yanggang Xu, Weijie Hong, Jirong Zha, Geng Chen, Jianfeng Zheng, Chen-Chun Hsia, Xinlei Chen

    Abstract: In disaster scenarios, establishing robust emergency communication networks is critical, and unmanned aerial vehicles (UAVs) offer a promising solution to rapidly restore connectivity. However, organizing UAVs to form multi-hop networks in large-scale dynamic environments presents significant challenges, including limitations in algorithmic scalability and the vast exploration space required for c… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  3. arXiv:2505.08343  [pdf, other

    cs.AI

    An Identifiable Cost-Aware Causal Decision-Making Framework Using Counterfactual Reasoning

    Authors: Ruichu Cai, Xi Chen, Jie Qiao, Zijian Li, Yuequn Liu, Wei Chen, Keli Zhang, Jiale Zheng

    Abstract: Decision making under abnormal conditions is a critical process that involves evaluating the current state and determining the optimal action to restore the system to a normal state at an acceptable cost. However, in such scenarios, existing decision-making frameworks highly rely on reinforcement learning or root cause analysis, resulting in them frequently neglecting the cost of the actions or fa… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.07546  [pdf, ps, other

    cs.IR cs.AI

    GRADA: Graph-based Reranker against Adversarial Documents Attack

    Authors: Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu

    Abstract: Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models' static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.07347  [pdf, other

    cs.CV

    AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography

    Authors: Jiewen Yang, Taoran Huang, Shangwei Ding, Xiaowei Xu, Qinhua Zhao, Yong Jiang, Jiarong Guo, Bin Pu, Jiexuan Zheng, Caojin Zhang, Hongwen Fei, Xiaomeng Li

    Abstract: Echocardiographers can detect pulmonary hypertension using Doppler echocardiography; however, accurately assessing its progression often proves challenging. Right heart catheterization (RHC), the gold standard for precise evaluation, is invasive and unsuitable for routine use, limiting its practicality for timely diagnosis and monitoring of pulmonary hypertension progression. Here, we propose MePH… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  6. arXiv:2505.06861  [pdf, other

    cs.RO cs.AI cs.CV

    Efficient Robotic Policy Learning via Latent Space Backward Planning

    Authors: Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan

    Abstract: Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grai… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  7. arXiv:2505.06305  [pdf

    cs.CR cs.AI

    User Behavior Analysis in Privacy Protection with Large Language Models: A Study on Privacy Preferences with Limited Data

    Authors: Haowei Yang, Qingyi Lu, Yang Wang, Sibei Liu, Jiayun Zheng, Ao Xiang

    Abstract: With the widespread application of large language models (LLMs), user privacy protection has become a significant research topic. Existing privacy preference modeling methods often rely on large-scale user data, making effective privacy preference analysis challenging in data-limited environments. This study explores how LLMs can analyze user behavior related to privacy protection in scenarios wit… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  8. arXiv:2505.04918  [pdf, other

    cs.LG cs.AI

    Physics-Assisted and Topology-Informed Deep Learning for Weather Prediction

    Authors: Jiaqi Zheng, Qing Ling, Yerong Feng

    Abstract: Although deep learning models have demonstrated remarkable potential in weather prediction, most of them overlook either the \textbf{physics} of the underlying weather evolution or the \textbf{topology} of the Earth's surface. In light of these disadvantages, we develop PASSAT, a novel Physics-ASSisted And Topology-informed deep learning model for weather prediction. PASSAT attributes the weather… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: International Joint Conferences on Artificial Intelligence (IJCAI 2025)

  9. arXiv:2505.01450  [pdf, other

    cs.LG

    Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

    Authors: Chaoyi Wang, Junjie Zheng, Zihao Chen, Shiyu Xia, Chaofan Ding, Xiaohao Zhang, Xi Tao, Xiaoming He, Xinhan Di

    Abstract: Movie dubbing has advanced significantly, yet assessing the real-world effectiveness of these models remains challenging. A comprehensive evaluation benchmark is crucial for two key reasons: 1) Existing metrics fail to fully capture the complexities of dialogue, narration, monologue, and actor adaptability in movie dubbing. 2) A practical evaluation system should offer valuable insights to improve… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures, accepted to the AI for Content Creation workshop at CVPR 2025 in Nashville, TN

  10. arXiv:2505.00257  [pdf, other

    cs.LG cs.CR

    Graph Privacy: A Heterogeneous Federated GNN for Trans-Border Financial Data Circulation

    Authors: Zhizhong Tan, Jiexin Zheng, Kevin Qi Zhang, Wenyong Wang

    Abstract: The sharing of external data has become a strong demand of financial institutions, but the privacy issue has led to the difficulty of interconnecting different platforms and the low degree of data openness. To effectively solve the privacy problem of financial data in trans-border flow and sharing, to ensure that the data is available but not visible, to realize the joint portrait of all kinds of… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  11. arXiv:2505.00049  [pdf, other

    cs.CY cs.CL cs.HC cs.LG

    Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications

    Authors: Wenhan Dong, Yuemeng Zhao, Zhen Sun, Yule Liu, Zifan Peng, Jingyi Zheng, Zongmin Zhang, Ziyi Zhang, Jun Wu, Ruiming Wang, Shengmin Xu, Xinyi Huang, Xinlei He

    Abstract: As large language models (LLMs) are increasingly used in human-centered tasks, assessing their psychological traits is crucial for understanding their social impact and ensuring trustworthy AI alignment. While existing reviews have covered some aspects of related research, several important areas have not been systematically discussed, including detailed discussions of diverse psychological tests,… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

    Comments: 26 pages,7 figures

  12. arXiv:2504.19581  [pdf, other

    cs.CV

    SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity

    Authors: Chengzhi Wu, Yuxin Wan, Hao Fu, Julius Pfrommer, Zeyun Zhong, Junwei Zheng, Jiaming Zhang, Jürgen Beyerer

    Abstract: Driven by the increasing demand for accurate and efficient representation of 3D data in various domains, point cloud sampling has emerged as a pivotal research topic in 3D computer vision. Recently, learning-to-sample methods have garnered growing interest from the community, particularly for their ability to be jointly trained with downstream tasks. However, previous learning-based sampling metho… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  13. arXiv:2504.18866  [pdf, other

    cs.CV

    PiercingEye: Dual-Space Video Violence Detection with Hyperbolic Vision-Language Guidance

    Authors: Jiaxu Leng, Zhanjie Wu, Mingpi Tan, Mengjingcheng Mo, Jiankang Zheng, Qingqing Li, Ji Gan, Xinbo Gao

    Abstract: Existing weakly supervised video violence detection (VVD) methods primarily rely on Euclidean representation learning, which often struggles to distinguish visually similar yet semantically distinct events due to limited hierarchical modeling and insufficient ambiguous training samples. To address this challenge, we propose PiercingEye, a novel dual-space learning framework that synergizes Euclide… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  14. arXiv:2504.18600  [pdf, other

    q-fin.CP cs.AI cs.CE

    QuantBench: Benchmarking AI Methods for Quantitative Investment

    Authors: Saizhuo Wang, Hao Kong, Jiadong Guo, Fengrui Hua, Yiyan Qi, Wanyun Zhou, Jiahao Zheng, Xinyu Wang, Lionel M. Ni, Jian Guo

    Abstract: The field of artificial intelligence (AI) in quantitative investment has seen significant advancements, yet it lacks a standardized benchmark aligned with industry practices. This gap hinders research progress and limits the practical application of academic innovations. We present QuantBench, an industrial-grade benchmark platform designed to address this critical need. QuantBench offers three ke… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  15. arXiv:2504.18178  [pdf, other

    cs.CG

    Smallest Intersecting and Enclosing Balls

    Authors: Jiaqi Zheng, Tiow-Seng Tan

    Abstract: We study the smallest intersecting and enclosing ball problems in Euclidean spaces for input objects that are compact and convex. They link and unify many problems in computational geometry and machine learning. We show that both problems can be modeled as zero-sum games, and propose an approximation algorithm for the former. Specifically, the algorithm produces the first results in high-dimension… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Computational Geometry: Young Researchers Forum (CG:YRF), 2025

  16. arXiv:2504.13754  [pdf, other

    cs.CV cs.AI

    Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis

    Authors: Zhu Zhu, Shuo Jiang, Jingyuan Zheng, Yawen Li, Yifei Chen, Manli Zhao, Weizhong Gu, Feiwei Qin, Jinhu Wang, Gang Yu

    Abstract: Neuroblastoma, adrenal-derived, is among the most common pediatric solid malignancies, characterized by significant clinical heterogeneity. Timely and accurate pathological diagnosis from hematoxylin and eosin-stained whole-slide images is critical for patient prognosis. However, current diagnostic practices primarily rely on subjective manual examination by pathologists, leading to inconsistent a… ▽ More

    Submitted 6 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: 10pages, 8 figures

  17. arXiv:2504.13626  [pdf, other

    cs.CL cs.AI

    Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models

    Authors: Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He

    Abstract: Recent advancements in large reasoning models (LRMs) have demonstrated the effectiveness of scaling test-time computation to enhance reasoning capabilities in multiple tasks. However, LRMs typically suffer from "overthinking" problems, where models generate significantly redundant reasoning steps while bringing limited performance gains. Existing work relies on fine-tuning to mitigate overthinking… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  18. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  19. arXiv:2504.11966  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    Exploring Video-Based Driver Activity Recognition under Noisy Labels

    Authors: Linjuan Fan, Di Wen, Kunyu Peng, Kailun Yang, Jiaming Zhang, Ruiping Liu, Yufan Chen, Junwei Zheng, Jiamin Wu, Xudong Han, Rainer Stiefelhagen

    Abstract: As an open research topic in the field of deep learning, learning with noisy labels has attracted much attention and grown rapidly over the past ten years. Learning with label noise is crucial for driver distraction behavior recognition, as real-world video data often contains mislabeled samples, impacting model reliability and performance. However, label noise learning is barely explored in the d… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: The source code is available at https://github.com/ilonafan/DAR-noisy-labels

  20. arXiv:2504.10433  [pdf, other

    cs.CV cs.RO

    MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model

    Authors: Jian Liu, Wei Sun, Hui Yang, Jin Zheng, Zichen Geng, Hossein Rahmani, Ajmal Mian

    Abstract: Object pose estimation is a core means for robots to understand and interact with their environment. For this task, monocular category-level methods are attractive as they require only a single RGB camera. However, current methods rely on shape priors or CAD models of the intra-class known objects. We propose a diffusion-based monocular category-level 9D object pose generation method, MonoDiff9D.… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by ICRA'25

  21. arXiv:2504.09940  [pdf, other

    cs.LG

    TianQuan-Climate: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State

    Authors: Guowen Li, Xintong Liu, Shilei Cao, Haoyuan Liang, Mengxuan Chen, Lixian Zhang, Jinxiao Zhang, Jiuke Wang, Meng Jin, Juepeng Zheng, Haohuan Fu

    Abstract: Subseasonal forecasting serves as an important support for Sustainable Development Goals (SDGs), such as climate challenges, agricultural yield and sustainable energy production. However, subseasonal forecasting is a complex task in meteorology due to dissipating initial conditions and delayed external forces. Although AI models are increasingly pushing the boundaries of this forecasting limit, th… ▽ More

    Submitted 21 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  22. arXiv:2504.09588  [pdf, other

    cs.CV cs.AI

    TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting

    Authors: Zhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li, Liqiang Nie

    Abstract: Recent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accuratel… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  23. arXiv:2504.09561  [pdf, other

    cs.AR

    LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference

    Authors: Jianing Zheng, Gang Chen

    Abstract: In this paper, we propose LoopLynx, a scalable dataflow architecture for efficient LLM inference that optimizes FPGA usage through a hybrid spatial-temporal design. The design of LoopLynx incorporates a hybrid temporal-spatial architecture, where computationally intensive operators are implemented as large dataflow kernels. This achieves high throughput similar to spatial architecture, and organiz… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  24. arXiv:2504.09138  [pdf, other

    cs.IT

    White-Box AI Model: Next Frontier of Wireless Communications

    Authors: Jiayao Yang, Jiayi Zhang, Bokai Xu, Jiakang Zheng, Zhilong Liu, Ziheng Liu, Dusit Niyato, Mérouane Debbah, Zhu Han, Bo Ai

    Abstract: White-box AI (WAI), or explainable AI (XAI) model, a novel tool to achieve the reasoning behind decisions and predictions made by the AI algorithms, makes it more understandable and transparent. It offers a new approach to address key challenges of interpretability and mathematical validation in traditional black-box models. In this paper, WAI-aided wireless communication systems are proposed and… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  25. arXiv:2504.08685  [pdf, other

    cs.CV cs.AI

    Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

    Authors: Team Seawead, Ceyuan Yang, Zhijie Lin, Yang Zhao, Shanchuan Lin, Zhibei Ma, Haoyuan Guo, Hao Chen, Lu Qi, Sen Wang, Feng Cheng, Feilong Zuo, Xuejiao Zeng, Ziyan Yang, Fangyuan Kong, Meng Wei, Zhiwu Qing, Fei Xiao, Tuyen Hoang, Siyu Zhang, Peihao Zhu, Qi Zhao, Jiangqiao Yan, Liangke Gui, Sheng Bi , et al. (30 additional authors not shown)

    Abstract: This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary… ▽ More

    Submitted 4 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Technical report (some typos fixed)

  26. arXiv:2504.04321  [pdf, other

    cs.SE

    Compiler Optimization Testing Based on Optimization-Guided Equivalence Transformations

    Authors: Jingwen Wu, Jiajing Zheng, Zhenyu Yang, Zhongxing Yu

    Abstract: Compiler optimization techniques are inherently complex, and rigorous testing of compiler optimization implementation is critical. Recent years have witnessed the emergence of testing approaches for uncovering incorrect optimization bugs, but these approaches rely heavily on the differential testing mechanism, which requires comparing outputs across multiple compilers. This dependency gives rise t… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: Accepted by FSE-IVR 2025

  27. arXiv:2504.03886  [pdf, other

    cs.CV cs.RO

    WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

    Authors: Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni

    Abstract: We present WildGS-SLAM, a robust and efficient monocular RGB SLAM system designed to handle dynamic environments by leveraging uncertainty-aware geometric mapping. Unlike traditional SLAM systems, which assume static scenes, our approach integrates depth and uncertainty information to enhance tracking, mapping, and rendering performance in the presence of moving objects. We introduce an uncertaint… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  28. arXiv:2504.00784  [pdf, other

    cs.CV cs.LG

    CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification

    Authors: Yang Yang, Xijie Xu, Yixun Zhou, Jie Zheng

    Abstract: Cell instance segmentation is a fundamental task in digital pathology with broad clinical applications. Recently, vision foundation models, which are predominantly based on Vision Transformers (ViTs), have achieved remarkable success in pathology image analysis. However, their improvements in cell instance segmentation remain limited. A key challenge arises from the tokenization process in ViTs, w… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  29. arXiv:2504.00502  [pdf, other

    cs.CV cs.CL

    ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

    Authors: Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun

    Abstract: Multimodal Large Language Models (MLLMs) suffer from high computational costs due to their massive size and the large number of visual tokens. In this paper, we investigate layer-wise redundancy in MLLMs by introducing a novel metric, Layer Contribution (LC), which quantifies the impact of a layer's transformations on visual and text tokens, respectively. The calculation of LC involves measuring t… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Project page: https://github.com/icip-cas/ShortV

  30. arXiv:2504.00458  [pdf, other

    cs.CV

    Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection

    Authors: Shunxin Chen, Ajian Liu, Junze Zheng, Jun Wan, Kailai Peng, Sergio Escalera, Zhen Lei

    Abstract: Facial recognition systems in real-world scenarios are susceptible to both digital and physical attacks. Previous methods have attempted to achieve classification by learning a comprehensive feature space. However, these methods have not adequately accounted for the inherent characteristics of physical and digital attack data, particularly the large intra class variation in attacks and the small i… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 9 pages, 5 figures, accepted by AAAI-2025 (Oral)

  31. arXiv:2503.23660  [pdf, other

    cs.CV

    DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

    Authors: Junjie Zheng, Zihao Chen, Chaofan Ding, Xinhan Di

    Abstract: Current movie dubbing technology can generate the desired voice from a given speech prompt, ensuring good synchronization between speech and visuals while accurately conveying the intended emotions. However, in movie dubbing, key aspects such as adapting to different dubbing styles, handling dialogue, narration, and monologue effectively, and understanding subtle details like the age and gender of… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 11 pages, 5 figures

  32. arXiv:2503.22935  [pdf, other

    cs.CR cs.SE

    Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes

    Authors: Xueqing Liu, Jiangrui Zheng, Guanqun Yang, Siyan Wen, Qiushi Liu

    Abstract: In recent years, the rapid increase of security vulnerabilities has caused major challenges in managing them. One critical task in vulnerability management is tracing the patches that fix a vulnerability. By accurately tracing the patching commits, security stakeholders can precisely identify affected software components, determine vulnerable and fixed versions, assess the severity etc., which fac… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  33. arXiv:2503.22265  [pdf, other

    cs.CV cs.SD eess.AS

    DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

    Authors: Haomin Zhang, Chang Liu, Junjie Zheng, Zihao Chen, Chaofan Ding, Xinhan Di

    Abstract: Currently, high-quality, synchronized audio is synthesized using various multi-modal joint learning frameworks, leveraging video and optional text inputs. In the video-to-audio benchmarks, video-to-audio quality, semantic alignment, and audio-visual synchronization are effectively achieved. However, in real-world scenarios, speech and audio often coexist in videos simultaneously, and the end-to-en… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 11 pages, 5 figures

  34. arXiv:2503.19543  [pdf, other

    cs.CV

    Scene-agnostic Pose Regression for Visual Localization

    Authors: Junwei Zheng, Ruiping Liu, Yufan Chen, Zhenfang Chen, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Absolute Pose Regression (APR) predicts 6D camera poses but lacks the adaptability to unknown environments without retraining, while Relative Pose Regression (RPR) generalizes better yet requires a large image retrieval database. Visual Odometry (VO) generalizes well in unseen environments but suffers from accumulated error in open trajectories. To address this dilemma, we introduce a new task, Sc… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025. Project page: https://junweizheng93.github.io/publications/SPR/SPR.html

  35. arXiv:2503.18034  [pdf, other

    cs.CV cs.CL

    Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models

    Authors: Qiao Liang, Yanjiang Liu, Ben He, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun, Yingfei Sun

    Abstract: Does the prior knowledge of the vision encoder constrain the capability boundary of Multi-modal Large Language Models (MLLMs)? While most existing research treats MLLMs as unified systems optimized through end-to-end training, the impact of vision encoder's prior knowledge is seldom investigated. In this work, we introduce a novel metric, $Rank_e$, to quantify the effect of the vision encoder's pr… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  36. arXiv:2503.16914  [pdf

    cs.AI

    A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network

    Authors: Miao Ye, Jihao Zheng, Qiuxiang Jiang, Yuan Huang, Ziheng Wang, Yong Wang

    Abstract: The existing segment routing (SR) methods need to determine the routing first and then use path segmentation approaches to select swap nodes to form a segment routing path (SRP). They require re-segmentation of the path when the routing changes. Furthermore, they do not consider the flow table issuance time, which cannot maximize the speed of issuance flow table. To address these issues, this pape… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  37. arXiv:2503.12123  [pdf, other

    cs.CL cs.AI

    MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling

    Authors: Zhaopeng Feng, Jiahan Ren, Jiayuan Su, Jiamei Zheng, Zhihang Tang, Hongwei Wang, Zuozhu Liu

    Abstract: Process reward models (PRMs) have shown success in complex reasoning tasks for large language models (LLMs). However, their application to machine translation (MT) remains underexplored due to the lack of systematic methodologies and evaluation benchmarks. To address this gap, we introduce \textbf{MT-RewardTree}, a comprehensive framework for constructing, evaluating, and deploying process reward… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: Under review. Project page:https://sabijun.github.io/MT_RewardTreePage

  38. arXiv:2503.09674  [pdf, other

    cs.CL cs.LG

    Probabilistic Reasoning with LLMs for k-anonymity Estimation

    Authors: Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu

    Abstract: Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information. We propose BRANCH, which uses LLMs to factorize a joint pr… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 9 pages

  39. arXiv:2503.08708  [pdf, other

    cs.CR cs.AI

    TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors

    Authors: Jingyi Zheng, Junfeng Wang, Zhen Sun, Wenhan Dong, Yule Liu, Xinlei He

    Abstract: As Large Language Models (LLMs) advance, Machine-Generated Texts (MGTs) have become increasingly fluent, high-quality, and informative. Existing wide-range MGT detectors are designed to identify MGTs to prevent the spread of plagiarism and misinformation. However, adversaries attempt to humanize MGTs to evade detection (named evading attacks), which requires only minor modifications to bypass MGT… ▽ More

    Submitted 13 March, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  40. arXiv:2503.08153  [pdf, other

    cs.CV

    WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation

    Authors: Jing Wang, Ao Ma, Ke Cao, Jun Zheng, Zhanjie Zhang, Jiasong Feng, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng, Yuhui Yin, Xiaodan Liang

    Abstract: Recent rapid advancements in text-to-video (T2V) generation, such as SoRA and Kling, have shown great potential for building world simulators. However, current T2V models struggle to grasp abstract physical principles and generate videos that adhere to physical laws. This challenge arises primarily from a lack of clear guidance on physical information due to a significant gap between abstract phys… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  41. arXiv:2503.07189  [pdf, ps, other

    cs.IT eess.SP

    Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems

    Authors: Yizhuo Li, Jiakang Zheng, Bokai Xu, Yiyang Zhu, Jiayi Zhang, Bo Ai

    Abstract: Reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising architecture for further improving spectral efficiency (SE) with low cost and power consumption. However, conventional RIS has inevitable limitations due to its capability of only reflecting signals. In contrast, beyond-diagonal RIS (BD-RIS), with its ability to both reflect… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  42. arXiv:2503.07003  [pdf, other

    cs.CL

    Large Language Models Often Say One Thing and Do Another

    Authors: Ruoxi Xu, Hongyu Lin, Xianpei Han, Jia Zheng, Weixiang Zhou, Le Sun, Yingfei Sun

    Abstract: As large language models (LLMs) increasingly become central to various applications and interact with diverse user populations, ensuring their reliable and consistent performance is becoming more important. This paper explores a critical issue in assessing the reliability of LLMs: the consistency between their words and deeds. To quantitatively explore this consistency, we developed a novel evalua… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Published on ICLR 2025

  43. arXiv:2503.06052  [pdf, other

    cs.LG q-bio.QM

    Interpretable High-order Knowledge Graph Neural Network for Predicting Synthetic Lethality in Human Cancers

    Authors: Xuexin Chen, Ruichu Cai, Zhengting Huang, Zijian Li, Jie Zheng, Min Wu

    Abstract: Synthetic lethality (SL) is a promising gene interaction for cancer therapy. Recent SL prediction methods integrate knowledge graphs (KGs) into graph neural networks (GNNs) and employ attention mechanisms to extract local subgraphs as explanations for target gene pairs. However, attention mechanisms often lack fidelity, typically generate a single explanation per gene pair, and fail to ensure trus… ▽ More

    Submitted 19 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 15 pages. Accepted by Briefings in Bioinformatics

    Journal ref: Briefings in Bioinformatics 2025

  44. arXiv:2503.05578  [pdf, other

    cs.CV cs.RO

    Novel Object 6D Pose Estimation with a Single Reference View

    Authors: Jian Liu, Wei Sun, Kai Zeng, Jin Zheng, Hui Yang, Lin Wang, Hossein Rahmani, Ajmal Mian

    Abstract: Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 17 pages, 12 figures (including supplementary material)

  45. arXiv:2503.04021  [pdf, other

    cs.CV cs.AI

    TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

    Authors: Wanglong Lu, Lingming Su, Jingjing Zheng, Vinícius Veloso de Melo, Farzaneh Shoeleh, John Hawkin, Terrence Tricco, Hanli Zhao, Xianta Jiang

    Abstract: Digital versions of real-world text documents often suffer from issues like environmental corrosion of the original document, low-quality scanning, or human interference. Existing document restoration and inpainting methods typically struggle with generalizing to unseen document styles and handling high-resolution images. To address these challenges, we introduce TextDoctor, a novel unified docume… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 28 pages, 25 figures

    MSC Class: 68U10 ACM Class: I.4.3; I.4.4; I.4.5; I.4.9

  46. arXiv:2503.03104  [pdf, other

    cs.CV cs.AI

    RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition

    Authors: Jinhui Zheng, Zhiquan Liu, Yain-Whar Si, Jianqing Li, Xinyuan Zhang, Xiaofan Li, Haozhi Huang, Xueyuan Gong

    Abstract: Handwritten Paragraph Text Recognition (HPTR) is a challenging task in Computer Vision, requiring the transformation of a paragraph text image, rich in handwritten text, into text encoding sequences. One of the most advanced models for this task is Vertical Attention Network (VAN), which utilizes a Vertical Attention Module (VAM) to implicitly segment paragraph text images into text lines, thereby… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  47. arXiv:2503.02662  [pdf, other

    cs.CV

    10K is Enough: An Ultra-Lightweight Binarized Network for Infrared Small-Target Detection

    Authors: Biqiao Xin, Qianchen Mao, Bingshu Wang, Jiangbin Zheng, Yong Zhao, C. L. Philip Chen

    Abstract: The widespread deployment of Infrared Small-Target Detection (IRSTD) algorithms on edge devices necessitates the exploration of model compression techniques. Binarized neural networks (BNNs) are distinguished by their exceptional efficiency in model compression. However, the small size of infrared targets introduces stringent precision requirements for the IRSTD task, while the inherent precision… ▽ More

    Submitted 10 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  48. XAIxArts Manifesto: Explainable AI for the Arts

    Authors: Nick Bryan-Kinns, Shuoyang Jasper Zheng, Francisco Castro, Makayla Lewis, Jia-Rey Chang, Gabriel Vigliensoni, Terence Broad, Michael Clemens, Elizabeth Wilson

    Abstract: Explainable AI (XAI) is concerned with how to make AI models more understandable to people. To date these explanations have predominantly been technocentric - mechanistic or productivity oriented. This paper introduces the Explainable AI for the Arts (XAIxArts) manifesto to provoke new ways of thinking about explainability and AI beyond technocentric discourses. Manifestos offer a means to communi… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: Author version of paper in: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, April 26-May 1, 2025, Yokohama, Japan DOI 10.1145/3706599.3716227 ISBN 979-8-4007-1395-8/25/04

  49. arXiv:2502.20387  [pdf, other

    cs.CV

    InsTaG: Learning Personalized 3D Talking Head from Few-Second Video

    Authors: Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Jun Zhou, Lin Gu

    Abstract: Despite exhibiting impressive performance in synthesizing lifelike personalized 3D talking heads, prevailing methods based on radiance fields suffer from high demands for training data and time for each new identity. This paper introduces InsTaG, a 3D talking head synthesis framework that allows a fast learning of realistic personalized 3D talking head from few training data. Built upon a lightwei… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted at CVPR 2025. Project page: https://fictionarry.github.io/InsTaG/

  50. arXiv:2502.18906  [pdf, other

    cs.LG

    VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

    Authors: Jiani Zheng, Lu Wang, Fangkai Yang, Chaoyun Zhang, Lingrui Mei, Wenjie Yin, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples value estimation from policy optimization by leveraging a… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 20pages,5 figures