Skip to main content

Showing 1–50 of 10,126 results for author: Peng

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21513  [pdf, ps, other

    cs.CV

    GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

    Authors: Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian

    Abstract: Creating high-quality, generalizable speech-driven 3D talking heads remains a persistent challenge. Previous methods achieve satisfactory results for fixed viewpoints and small-scale audio variations, but they struggle with large head rotations and out-of-distribution (OOD) audio. Moreover, they are constrained by the need for time-consuming, identity-specific training. We believe the core issue l… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ICCV 2025, Project page: https://vincenthu19.github.io/GGTalker/

  2. arXiv:2506.21370  [pdf, ps, other

    cs.IT eess.SP

    Cluster-Aware Two-Stage Method for Fast Iterative MIMO Detection in LEO Satellite Communications

    Authors: Jiuyu Liu, Yi Ma, Qihao Peng, Rahim Tafazolli

    Abstract: In this paper, a cluster-aware two-stage multiple-input multiple-output (MIMO) detection method is proposed for direct-to-cell satellite communications. The method achieves computational efficiency by exploiting a distinctive property of satellite MIMO channels: users within the same geographical cluster exhibit highly correlated channel characteristics due to their physical proximity, which typic… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: This work has been accepted by IEEE/CIC ICCC 2025

  3. arXiv:2506.21198  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Xu Zheng, Hao Shi, Kunyu Peng, Hang Liu, Kailun Yang, Hui Zhang

    Abstract: Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations. Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data. To address these, we introduce a more practical task, i.e., Source-Fre… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK

  4. arXiv:2506.21185  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Out-of-Distribution Semantic Occupancy Prediction

    Authors: Yuheng Zhang, Mengfei Duan, Kunyu Peng, Yuhang Wang, Ruiping Liu, Fei Teng, Kai Luo, Zhiyong Li, Kailun Yang

    Abstract: 3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these cha… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD

  5. arXiv:2506.21117  [pdf, ps, other

    cs.CV

    CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

    Authors: Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas Guibas, Songyou Peng

    Abstract: In dynamic 3D environments, accurately updating scene representations over time is crucial for applications in robotics, mixed reality, and embodied AI. As scenes evolve, efficient methods to incorporate changes are needed to maintain up-to-date, high-quality reconstructions without the computational overhead of re-optimizing the entire scene. This paper introduces CL-Splats, which incrementally u… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ICCV 2025, Project Page: https://cl-splats.github.io

  6. arXiv:2506.21093  [pdf, ps, other

    cs.LG cs.IT eess.SP stat.ML

    Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection

    Authors: Li Fan, Peng Wang, Jing Yang, Cong Shen

    Abstract: Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work,… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  7. arXiv:2506.21085  [pdf, ps, other

    q-bio.BM cs.AI cs.LG

    CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions

    Authors: Yangzhe Peng, Kaiyuan Gao, Liang He, Yuheng Cong, Haiguang Liu, Kun He, Lijun Wu

    Abstract: Molecular docking plays a crucial role in predicting the binding mode of ligands to target proteins, and covalent interactions, which involve the formation of a covalent bond between the ligand and the target, are particularly valuable due to their strong, enduring binding nature. However, most existing docking methods and deep learning approaches hardly account for the formation of covalent bonds… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to KDD 2025 Research Track

  8. arXiv:2506.21063  [pdf, ps, other

    cs.RO

    Control of Marine Robots in the Era of Data-Driven Intelligence

    Authors: Lin Hong, Lu Liu, Zhouhua Peng, Fumin Zhang

    Abstract: The control of marine robots has long relied on model-based methods grounded in classical and modern control theory. However, the nonlinearity and uncertainties inherent in robot dynamics, coupled with the complexity of marine environments, have revealed the limitations of conventional control methods. The rapid evolution of machine learning has opened new avenues for incorporating data-driven int… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  9. arXiv:2506.21054  [pdf, ps, other

    cs.LG

    FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning

    Authors: Fu Peng, Ming Tang

    Abstract: In federated learning (FL), the data distribution of each client may change over time, introducing both temporal and spatial data heterogeneity, known as concept drift. Data heterogeneity arises from three drift sources: real drift (a shift in the conditional distribution P(y|x)), virtual drift (a shift in the input distribution P(x)), and label drift (a shift in the label distribution P(y)). Howe… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  10. arXiv:2506.21049  [pdf, ps, other

    cs.CL cs.AI cs.IR

    A Semi-supervised Scalable Unified Framework for E-commerce Query Classification

    Authors: Chunyuan Yuan, Chong Zhang, Zheng Fang, Ming Pang, Xue Jiang, Changping Peng, Zhangang Lin, Ching Law

    Abstract: Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct tr… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025

  11. arXiv:2506.21036  [pdf, ps, other

    cs.LG cs.DC

    An Information-Theoretic Analysis for Federated Learning under Concept Drift

    Authors: Fu Peng, Meng Zhang, Ming Tang

    Abstract: Recent studies in federated learning (FL) commonly train models on static datasets. However, real-world data often arrives as streams with shifting distributions, causing performance degradation known as concept drift. This paper analyzes FL performance under concept drift using information theory and proposes an algorithm to mitigate the performance degradation. We model concept drift as a Markov… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  12. arXiv:2506.20918  [pdf

    cs.DL cs.ET cs.IR

    Metadata Enrichment of Long Text Documents using Large Language Models

    Authors: Manika Lamba, You Peng, Sophie Nikolov, Glen Layne-Worthey, J. Stephen Downie

    Abstract: In this project, we semantically enriched and enhanced the metadata of long text documents, theses and dissertations, retrieved from the HathiTrust Digital Library in English published from 1920 to 2020 through a combination of manual efforts and large language models. This dataset provides a valuable resource for advancing research in areas such as computational social science, digital humanities… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  13. arXiv:2506.20697  [pdf, ps, other

    q-bio.CB cs.LG

    scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection

    Authors: Zhen Yuan, Shaoqing Jiao, Yihang Xiao, Jiajie Peng

    Abstract: The advent of single-cell multi-omics technologies has enabled the simultaneous profiling of diverse omics layers within individual cells. Integrating such multimodal data provides unprecedented insights into cellular identity, regulatory processes, and disease mechanisms. However, it remains challenging, as current methods often rely on selecting highly variable genes or peaks during preprocessin… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  14. arXiv:2506.20666  [pdf, ps, other

    cs.CL cs.AI

    Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

    Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman

    Abstract: Navigating everyday social situations often requires juggling conflicting goals, such as conveying a harsh truth, maintaining trust, all while still being mindful of another person's feelings. These value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in LLMs are limited. In cogniti… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 11 pages, 3 figures

  15. arXiv:2506.20493  [pdf

    eess.SY cs.GT

    Analyzing the Impact of Strategic Bidding on the Reserve Capacity via a Bi-Level Model

    Authors: Yun Xu, Yunxiao Bai, Yunyong Zhang, Peng Wang, Xuelin Wang, Jiqun Guo, Kaijun Xie, Rusheng Zhao

    Abstract: The growing integration of renewable energy sources necessitates adequate reserve capacity to maintain power balance. However, in market clearing, power companies with flexible resources may submit strategic bids to maximize profits, potentially compromising system reserves. This paper examines the effects of such strategic behavior by modeling the market as a bi-level problem. The upper level rep… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  16. arXiv:2506.20445  [pdf, ps, other

    cs.RO

    Learn to Position -- A Novel Meta Method for Robotic Positioning

    Authors: Dongkun Wang, Junkai Zhao, Yunfei Teng, Jieyang Peng, Wenjing Xue, Xiaoming Tao

    Abstract: Absolute positioning accuracy is a vital specification for robots. Achieving high position precision can be challenging due to the presence of various sources of errors. Meanwhile, accurately depicting these errors is difficult due to their stochastic nature. Vision-based methods are commonly integrated to guide robotic positioning, but their performance can be highly impacted by inevitable occlus… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  17. arXiv:2506.20344  [pdf, ps, other

    math.OC cs.LG

    A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization

    Authors: Po Chen, Rujun Jiang, Peng Wang

    Abstract: Despite its wide range of applications across various domains, the optimization foundations of deep matrix factorization (DMF) remain largely open. In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem. Toward this goal, we first provide a closed-form expression of all critical points. Building on this, we establish precise c… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 35 pages, 3 figures

  18. arXiv:2506.20235  [pdf, ps, other

    cs.LG cs.AI

    Directed Link Prediction using GNN with Local and Global Feature Fusion

    Authors: Yuyang Zhang, Xu Shen, Yu Xie, Ka-Chun Wong, Weidun Xie, Chengbin Peng

    Abstract: Link prediction is a classical problem in graph analysis with many practical applications. For directed graphs, recently developed deep learning approaches typically analyze node similarities through contrastive learning and aggregate neighborhood information through graph convolutions. In this work, we propose a novel graph neural network (GNN) framework to fuse feature embedding with community i… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  19. arXiv:2506.20148  [pdf

    cs.CE

    Developing Artificial Mechanics Intuitions from Extremely Small Data

    Authors: Jingruo Peng, Shuze Zhu

    Abstract: Humans can possess good mechanics intuitions by learning from a few examples, which leads to the question of how to develop artificial mechanics intuitions that can be learned from small data, as we are eagerly entering the era of artificial intelligence. We propose in this Letter the sample-switchable training method, which successfully develops highly-accurate artificial mechanics intuitions tha… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  20. arXiv:2506.20139  [pdf, ps, other

    cs.DB cs.LG

    Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

    Authors: Jiayong Qin, Xianyu Zhu, Qiyu Liu, Guangyi Zhang, Zhigang Cai, Jianwei Liao, Sha Hu, Jingshu Peng, Yingxia Shao, Lei Chen

    Abstract: A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms re… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  21. arXiv:2506.20123  [pdf, ps, other

    cs.CE

    DiT-SGCR: Directed Temporal Structural Representation with Global-Cluster Awareness for Ethereum Malicious Account Detection

    Authors: Ye Tian, Liangliang Song, Peng Qian, Yanbin Wang, Jianguo Sun, Yifan Jia

    Abstract: The detection of malicious accounts on Ethereum - the preeminent DeFi platform - is critical for protecting digital assets and maintaining trust in decentralized finance. Recent advances highlight that temporal transaction evolution reveals more attack signatures than static graphs. However, current methods either fail to model continuous transaction dynamics or incur high computational costs that… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  22. arXiv:2506.20103  [pdf, ps, other

    cs.CV cs.AI

    BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos

    Authors: Jiahao Lin, Weixuan Peng, Bojia Zi, Yifeng Gao, Xianbiao Qi, Xingjun Ma, Yu-Gang Jiang

    Abstract: Recent advances in deep generative models have led to significant progress in video generation, yet the fidelity of AI-generated videos remains limited. Synthesized content often exhibits visual artifacts such as temporally inconsistent motion, physically implausible trajectories, unnatural object deformations, and local blurring that undermine realism and user trust. Accurate detection and spatia… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 7 page,4 figures,2 tables

    ACM Class: I.4

  23. arXiv:2506.19889  [pdf, ps, other

    cs.CR cs.AI

    Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

    Authors: Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen

    Abstract: Recent advances in large language models (LLMs) have made a profound impact on our society and also raised new security concerns. Particularly, due to the remarkable inference ability of LLMs, the privacy violation attack (PVA), revealed by Staab et al., introduces serious personal privacy issues. Existing defense methods mainly leverage LLMs to anonymize the input query, which requires costly inf… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  24. arXiv:2506.19852  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

    Authors: Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

    Abstract: Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal d… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Code: https://github.com/mit-han-lab/radial-attention

  25. arXiv:2506.19808  [pdf, ps, other

    cs.CV

    One Prototype Is Enough: Single-Prototype Activation for Interpretable Image Classification

    Authors: Yitao Peng, Lianghua He, Die Hu

    Abstract: In this paper, we propose ProtoSolo, a novel deep neural architecture for interpretable image classification inspired by prototypical networks such as ProtoPNet. Existing prototype networks usually rely on the collaborative decision-making of multiple prototypes to achieve the classification and interpretation of a single category. In contrast, ProtoSolo only requires the activation of a single pr… ▽ More

    Submitted 25 June, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

  26. arXiv:2506.19806  [pdf, ps, other

    cs.CY cs.CL cs.MA

    LLM-Based Social Simulations Require a Boundary

    Authors: Zengqing Wu, Run Peng, Takayuki Ito, Chuan Xiao

    Abstract: This position paper argues that large language model (LLM)-based social simulations should establish clear boundaries to meaningfully contribute to social science research. While LLMs offer promising capabilities for modeling human-like agents compared to traditional agent-based modeling, they face fundamental limitations that constrain their reliability for social pattern discovery. The core issu… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  27. arXiv:2506.19399  [pdf, ps, other

    cs.CL cs.AI

    Automated Detection of Pre-training Text in Black-box LLMs

    Authors: Ruihan Hu, Yu-Ming Shang, Jiankun Peng, Wei Luo, Yazhe Wang, Xi Zhang

    Abstract: Detecting whether a given text is a member of the pre-training data of Large Language Models (LLMs) is crucial for ensuring data privacy and copyright protection. Most existing methods rely on the LLM's hidden information (e.g., model parameters or token probabilities), making them ineffective in the black-box setting, where only input and output texts are accessible. Although some methods have be… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 13 pages

  28. arXiv:2506.19269  [pdf, ps, other

    cs.RO cs.AI

    AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

    Authors: Ziyan Zhao, Ke Fan, He-Yang Xu, Ning Qiao, Bo Peng, Wenlong Gao, Dongjiang Li, Hui Shen

    Abstract: We present AnchorDP3, a diffusion policy framework for dual-arm robotic manipulation that achieves state-of-the-art performance in highly randomized environments. AnchorDP3 integrates three key innovations: (1) Simulator-Supervised Semantic Segmentation, using rendered ground truth to explicitly segment task-critical objects within the point cloud, which provides strong affordance priors; (2) Task… ▽ More

    Submitted 25 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  29. arXiv:2506.19019  [pdf, ps, other

    cs.DC cs.AI

    Survey of HPC in US Research Institutions

    Authors: Peng Shu, Junhao Chen, Zhengliang Liu, Huaqin Zhao, Xinliang Li, Tianming Liu

    Abstract: The rapid growth of AI, data-intensive science, and digital twin technologies has driven an unprecedented demand for high-performance computing (HPC) across the research ecosystem. While national laboratories and industrial hyperscalers have invested heavily in exascale and GPU-centric architectures, university-operated HPC systems remain comparatively under-resourced. This survey presents a compr… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  30. arXiv:2506.18959  [pdf, ps, other

    cs.IR cs.CL cs.LG

    From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

    Authors: Weizhi Zhang, Yangning Li, Yuanchen Bei, Junyu Luo, Guancheng Wan, Liangwei Yang, Chenxuan Xie, Yuyao Yang, Wei-Chieh Huang, Chunyu Miao, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Yankai Chen, Chunkit Chan, Peilin Zhou, Xinyang Zhang, Chenwei Zhang, Jingbo Shang, Ming Zhang, Yangqiu Song, Irwin King, Philip S. Yu

    Abstract: Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm terme… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  31. arXiv:2506.18904  [pdf, ps, other

    cs.CV

    TC-Light: Temporally Consistent Relighting for Dynamic Long Videos

    Authors: Yang Liu, Chuanchen Luo, Zimo Tang, Yingyan Li, Yuran Yang, Yuanyong Ning, Lue Fan, Junran Peng, Zhaoxiang Zhang

    Abstract: Editing illumination in long videos with complex dynamics has significant value in various downstream tasks, including visual content creation and manipulation, as well as data scaling up for embodied AI through sim2real and real2real transfer. Nevertheless, existing video relighting techniques are predominantly limited to portrait videos or fall into the bottleneck of temporal consistency and com… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project Page: https://dekuliutesla.github.io/tclight/ Code: https://github.com/Linketic/TC-Light

  32. arXiv:2506.18862  [pdf, ps, other

    cs.CV cs.AI

    TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

    Authors: Zhongbin Guo, Yuhao Wang, Ping Jian, Xinyue Chen, Wei Peng, Ertai E

    Abstract: Satellite image time-series analysis demands fine-grained spatial-temporal reasoning, which remains a challenge for existing multimodal large language models (MLLMs). In this work, we study the capabilities of MLLMs on a novel task that jointly targets temporal change understanding and future scene generation, aiming to assess their potential for modeling complex multimodal dynamics over time. We… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Submitted to the 33rd ACM International Conference on Multimedia. Our dataset can be found at https://huggingface.co/datasets/IceInPot/TAMMs

  33. arXiv:2506.18701  [pdf, ps, other

    cs.CV cs.AI

    Matrix-Game: Interactive World Foundation Model

    Authors: Yifan Zhang, Chunli Peng, Boyang Wang, Puyi Wang, Qingcheng Zhu, Fei Kang, Biao Jiang, Zedong Gao, Eric Li, Yang Liu, Yahui Zhou

    Abstract: We introduce Matrix-Game, an interactive world foundation model for controllable game world generation. Matrix-Game is trained using a two-stage pipeline that first performs large-scale unlabeled pretraining for environment understanding, followed by action-labeled training for interactive video generation. To support this, we curate Matrix-Game-MC, a comprehensive Minecraft dataset comprising ove… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Technical Report

  34. arXiv:2506.18678  [pdf, ps, other

    cs.CV cs.RO

    MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation

    Authors: Tianchen Deng, Guole Shen, Xun Chen, Shenghai Yuan, Hongming Shen, Guohao Peng, Zhenyu Wu, Jingchuan Wang, Lihua Xie, Danwei Wang, Hesheng Wang, Weidong Chen

    Abstract: Neural implicit scene representations have recently shown promising results in dense visual SLAM. However, existing implicit SLAM algorithms are constrained to single-agent scenarios, and fall difficulties in large-scale scenes and long sequences. Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth. To this end, we propose the first distributed mu… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  35. arXiv:2506.18655  [pdf, ps, other

    cs.CV

    RDPO: Real Data Preference Optimization for Physics Consistency Video Generation

    Authors: Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, Anxiang Zeng

    Abstract: Video generation techniques have achieved remarkable advancements in visual quality, yet faithfully reproducing real-world physics remains elusive. Preference-based model post-training may improve physical consistency, but requires costly human-annotated datasets or reward models that are not yet feasible. To address these challenges, we present Real Data Preference Optimisation (RDPO), an annotat… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 16 pages, 10 figures

    ACM Class: I.2.6; I.2.10

  36. arXiv:2506.18575  [pdf, ps, other

    cs.CV

    2D Triangle Splatting for Direct Differentiable Mesh Training

    Authors: Kaifeng Sheng, Zheng Zhou, Yingliang Peng, Qianwei Wang

    Abstract: Differentiable rendering with 3D Gaussian primitives has emerged as a powerful method for reconstructing high-fidelity 3D scenes from multi-view images. While it offers improvements over NeRF-based methods, this representation still encounters challenges with rendering speed and advanced rendering effects, such as relighting and shadow rendering, compared to mesh-based models. In this paper, we pr… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: 13 pages, 8 figures

  37. arXiv:2506.18522  [pdf, ps, other

    cs.LG

    DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling

    Authors: Yang Chang, Kuang-Da Wang, Ping-Chun Hsieh, Cheng-Kuan Lin, Wen-Chih Peng

    Abstract: Uncovering the underlying ordinary differential equations (ODEs) that govern dynamic systems is crucial for advancing our understanding of complex phenomena. Traditional symbolic regression methods often struggle to capture the temporal dynamics and intervariable correlations inherent in ODEs. ODEFormer, a state-of-the-art method for inferring multidimensional ODEs from single trajectories, has ma… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  38. arXiv:2506.18322  [pdf, ps, other

    cs.CV cs.LG

    Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

    Authors: Yiwei Yang, Chung Peng Lee, Shangbin Feng, Dora Zhao, Bingbing Wen, Anthony Z. Liu, Yulia Tsvetkov, Bill Howe

    Abstract: Finetuning can cause spurious correlations to arise between non-essential features and the target labels, but benchmarks to study these effects involve contrived settings and narrow tasks. In contrast, we consider spurious correlations in multi-modal Large Vision Language Models (LVLMs) pretrained on extensive and diverse datasets without explicit task supervision. We develop a benchmark by sourci… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  39. arXiv:2506.18071  [pdf, ps, other

    cs.CV cs.AI

    MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering

    Authors: Jisheng Dang, Huilin Song, Junbin Xiao, Bimei Wang, Han Peng, Haoxuan Li, Xun Yang, Meng Wang, Tat-Seng Chua

    Abstract: Grounded Video Question Answering (Grounded VideoQA) requires aligning textual answers with explicit visual evidence. However, modern multimodal models often rely on linguistic priors and spurious correlations, resulting in poorly grounded predictions. In this work, we propose MUPA, a cooperative MUlti-Path Agentic approach that unifies video grounding, question answering, answer reflection and ag… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  40. arXiv:2506.18006  [pdf, ps, other

    cs.CV

    OSDMamba: Enhancing Oil Spill Detection from Remote Sensing Images Using Selective State Space Model

    Authors: Shuaiyu Chen, Fu Wang, Peng Ren, Chunbo Luo, Zeyu Fu

    Abstract: Semantic segmentation is commonly used for Oil Spill Detection (OSD) in remote sensing images. However, the limited availability of labelled oil spill samples and class imbalance present significant challenges that can reduce detection accuracy. Furthermore, most existing methods, which rely on convolutional neural networks (CNNs), struggle to detect small oil spill areas due to their limited rece… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  41. arXiv:2506.17958  [pdf, ps, other

    cs.CV

    ELMAR: Enhancing LiDAR Detection with 4D Radar Motion Awareness and Cross-modal Uncertainty

    Authors: Xiangyuan Peng, Miao Tang, Huawei Sun, Bierzynski Kay, Lorenzo Servadei, Robert Wille

    Abstract: LiDAR and 4D radar are widely used in autonomous driving and robotics. While LiDAR provides rich spatial information, 4D radar offers velocity measurement and remains robust under adverse conditions. As a result, increasing studies have focused on the 4D radar-LiDAR fusion method to enhance the perception. However, the misalignment between different modalities is often overlooked. To address this… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 7 pages. Accepted by IROS2025

  42. arXiv:2506.17667  [pdf, ps, other

    cs.AI

    PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models

    Authors: Lintao Wang, Encheng Su, Jiaqi Liu, Pengze Li, Peng Xia, Jiabei Xiao, Wenlong Zhang, Xinnan Dai, Xi Chen, Yuan Meng, Mingyu Ding, Lei Bai, Wanli Ouyang, Shixiang Tang, Aoran Wang, Xinzhu Ma

    Abstract: Physics problem-solving is a challenging domain for large AI models, requiring integration of conceptual understanding, mathematical reasoning, and interpretation of physical diagrams. Current evaluation methodologies show notable limitations in capturing the breadth and complexity of undergraduate-level physics, underscoring the need for more rigorous assessments. To this end, we present PhysUniB… ▽ More

    Submitted 25 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

  43. arXiv:2506.17638  [pdf, ps, other

    cs.SE

    Deep Learning Framework Testing via Model Mutation: How Far Are We?

    Authors: Yanzhou Mu, Rong Wang, Juan Zhai, Chunrong Fang, Xiang Chen, Zhiyuan Peng, Peiran Yang, Ruixiang Qian, Shaoyu Yang, Zhenyu Chen

    Abstract: Deep Learning (DL) frameworks are a fundamental component of DL development. Therefore, the detection of DL framework defects is important and challenging. As one of the most widely adopted DL testing techniques, model mutation has recently gained significant attention. In this study, we revisit the defect detection ability of existing mutation-based testing methods and investigate the factors tha… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 27 pages, 9 figures

  44. arXiv:2506.17623  [pdf, ps, other

    cs.MM cs.CV

    Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?

    Authors: Yuesheng Huang, Peng Zhang, Riliang Liu, Jiaqi Liang

    Abstract: A significant ``modality gap" exists between the abundance of text-only data and the increasing power of multimodal models. This work systematically investigates whether images generated on-the-fly by Text-to-Image (T2I) models can serve as a valuable complementary modality for text-centric tasks. Through a comprehensive evaluation framework on text classification, we analyze the impact of critica… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 4 figures,7 tables

  45. arXiv:2506.17611  [pdf, ps, other

    cs.CL cs.SD eess.AS

    OpusLM: A Family of Open Unified Speech Language Models

    Authors: Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue, Huck Yang, Shinji Watanabe

    Abstract: This paper presents Open Unified Speech Language Models (OpusLMs), a family of open foundational speech language models (SpeechLMs) up to 7B. Initialized from decoder-only text language models, the OpusLMs are continuously pre-trained on 213K hours of speech-text pairs and 292B text-only tokens. We demonstrate our OpusLMs achieve comparable (or even superior) performance with existing SpeechLMs in… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  46. arXiv:2506.17576  [pdf, ps, other

    cs.LG

    Towards Deeper GCNs: Alleviating Over-smoothing via Iterative Training and Fine-tuning

    Authors: Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang

    Abstract: Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at mo… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 16 pages,18 figures

  47. arXiv:2506.17536  [pdf

    physics.med-ph cs.AI

    Exploring Strategies for Personalized Radiation Therapy Part I Unlocking Response-Related Tumor Subregions with Class Activation Mapping

    Authors: Hao Peng, Steve Jiang, Robert Timmerman

    Abstract: Personalized precision radiation therapy requires more than simple classification, it demands the identification of prognostic, spatially informative features and the ability to adapt treatment based on individual response. This study compares three approaches for predicting treatment response: standard radiomics, gradient based features, and convolutional neural networks enhanced with Class Activ… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  48. arXiv:2506.17491  [pdf

    physics.med-ph cs.AI

    Exploring Strategies for Personalized Radiation Therapy Part II Predicting Tumor Drift Patterns with Diffusion Models

    Authors: Hao Peng, Steve Jiang, Robert Timmerman

    Abstract: Radiation therapy outcomes are decided by two key parameters, dose and timing, whose best values vary substantially across patients. This variability is especially critical in the treatment of brain cancer, where fractionated or staged stereotactic radiosurgery improves safety compared to single fraction approaches, but complicates the ability to predict treatment response. To address this challen… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  49. arXiv:2506.17457  [pdf, ps, other

    cs.CV

    When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network

    Authors: Dong Xiao, Guangyao Chen, Peixi Peng, Yangru Huang, Yifan Zhao, Yongxing Dai, Yonghong Tian

    Abstract: Anomaly detection is essential for the safety and reliability of autonomous driving systems. Current methods often focus on detection accuracy but neglect response time, which is critical in time-sensitive driving scenarios. In this paper, we introduce real-time anomaly detection for autonomous driving, prioritizing both minimal response time and high accuracy. We propose a novel multimodal asynch… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: ICML 2025 Spotlight

  50. arXiv:2506.17310  [pdf, ps, other

    q-bio.NC cs.CL cs.NE

    PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding

    Authors: Kangcong Li, Peng Ye, Chongjun Tu, Lin Zhang, Chunfeng Song, Jiamin Wu, Tao Yang, Qihao Zheng, Tao Chen

    Abstract: While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent A… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.