Skip to main content

Showing 1–50 of 7,858 results for author: Yang, J

.
  1. arXiv:2505.23692  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Mobi-$π$: Mobilizing Your Robot Learning Policy

    Authors: Jingyun Yang, Isabella Huang, Brandon Vu, Max Bajracharya, Rika Antonova, Jeannette Bohg

    Abstract: Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets.… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project website: https://mobipi.github.io/

  2. arXiv:2505.23583  [pdf, ps, other

    cs.LG

    Improving Time Series Forecasting via Instance-aware Post-hoc Revision

    Authors: Zhiding Liu, Mingyue Cheng, Guanhao Zhao, Jiqian Yang, Qi Liu, Enhong Chen

    Abstract: Time series forecasting plays a vital role in various real-world applications and has attracted significant attention in recent decades. While recent methods have achieved remarkable accuracy by incorporating advanced inductive biases and training strategies, we observe that instance-level variations remain a significant challenge. These variations--stemming from distribution shifts, missing data,… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  3. arXiv:2505.23471  [pdf, ps, other

    cs.SE

    Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency

    Authors: Jun Yang, Cheng-Chi Wang, Bogdan Alexandru Stoica, Kexin Pei

    Abstract: Large Language Models (LLMs) have been increasingly used to optimize code efficiency. Evaluating their effectiveness and further suggesting optimization opportunities often rely on high-quality tests to demonstrate the performance bottlenecks presented in the program. However, existing approaches rely on a limited set of hand-curated inputs or LLM-generated uninteresting length-stressing tests, fa… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 30 pages, 3 figures

    ACM Class: D.2.5

  4. arXiv:2505.23304  [pdf, other

    cs.CL

    Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs

    Authors: Yi Luo, Qiwen Wang, Junqi Yang, Luyao Tang, Zhenghao Lin, Zhenzhe Ying, Weiqiang Wang, Chen Lin

    Abstract: Generalized Category Discovery (GCD) aims to classify both known and novel categories using partially labeled data that contains only known classes. Despite achieving strong performance on existing benchmarks, current textual GCD methods lack sufficient validation in realistic settings. We introduce Event-Centric GCD (EC-GCD), characterized by long, complex narratives and highly imbalanced class d… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  5. Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition

    Authors: Shanaka Ramesh Gunasekara, Wanqing Li, Philip Ogunbona, Jack Yang

    Abstract: Traditional approaches in unsupervised or self supervised learning for skeleton-based action classification have concentrated predominantly on the dynamic aspects of skeletal sequences. Yet, the intricate interaction between the moving and static elements of the skeleton presents a rarely tapped discriminative potential for action classification. This paper introduces a novel measurement, referred… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Journal ref: IEEE Transactions on Biometrics, Behavior, and Identity Science (2025)

  6. arXiv:2505.22678  [pdf, other

    q-fin.TR cs.LG

    An Efficient deep learning model to Predict Stock Price Movement Based on Limit Order Book

    Authors: Jiahao Yang, Ran Fang, Ming Zhang, Jun Zhou

    Abstract: In high-frequency trading (HFT), leveraging limit order books (LOB) to model stock price movements is crucial for achieving profitable outcomes. However, this task is challenging due to the high-dimensional and volatile nature of the original data. Even recent deep learning models often struggle to capture price movement patterns effectively, particularly without well-designed features. We observe… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.22633  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Spatial Knowledge Graph-Guided Multimodal Synthesis

    Authors: Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Huajun Chen, Ningyu Zhang

    Abstract: Recent advances in multimodal large language models (MLLMs) have significantly enhanced their capabilities; however, their spatial perception abilities remain a notable limitation. To address this challenge, multimodal data synthesis offers a promising solution. Yet, ensuring that synthesized data adhere to spatial common sense is a non-trivial task. In this work, we introduce SKG2Data, a novel mu… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Ongoing work

  8. arXiv:2505.22467  [pdf, ps, other

    cs.MA cs.AI cs.LG

    Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems

    Authors: Jiaxi Yang, Mengqi Zhang, Yiqiao Jin, Hao Chen, Qingsong Wen, Lu Lin, Yi He, Weijie Xu, James Evans, Jindong Wang

    Abstract: Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. Nevertheless, the question of how agents should be structurally organized for optimal cooperation remains largely unexplored. In this position paper, we aim to gently redirect the focus of the MAS research community toward this critical dimension:… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.22297  [pdf

    physics.optics

    Revealing the terahertz-laser velocity effect during air filamentation via travelling-wave-antenna model

    Authors: Jiajun Yang, Xiaofeng Li, Linlin Yuan, Li Lao, Jiayu Zhao

    Abstract: During femtosecond laser filamentation in air, the velocity ratio (K) between the terahertz (THz) phase velocity and the laser group velocity plays a crucial role in THz waves generation. However, K is typically assumed to be unity and its impact has been long overlooked due to the more attention paid to the more easily controlled filament length. Here, we investigate the obscured contribution of… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  10. arXiv:2505.22141  [pdf, other

    cs.CV cs.AI

    FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing

    Authors: Guanwen Feng, Zhiyuan Ma, Yunan Li, Junwei Jing, Jiahao Yang, Qiguang Miao

    Abstract: Recent advances in audio-driven talking head generation have achieved impressive results in lip synchronization and emotional expression. However, they largely overlook the crucial task of facial attribute editing. This capability is crucial for achieving deep personalization and expanding the range of practical applications, including user-tailored digital avatars, engaging online education conte… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  11. arXiv:2505.22140  [pdf, other

    hep-ex

    Search for a dark baryon in the $Ξ^-\rightarrowπ^-+{\rm invisible}$ decay

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

    Abstract: A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 figures, 1 table

  12. arXiv:2505.22101  [pdf, other

    cs.CL

    MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

    Authors: Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi-Qin John Xu, Feiyu Xiong

    Abstract: Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  13. arXiv:2505.22013  [pdf, other

    cs.SD eess.AS

    Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge

    Authors: Shangkun Huang, Yuxuan Du, Jingwen Yang, Dejun Zhang, Xupeng Jia, Jing Deng, Jintao Kang, Rong Zheng

    Abstract: This paper presents the system developed to address the MISP 2025 Challenge. For the diarization system, we proposed a hybrid approach combining a WavLM end-to-end segmentation method with a traditional multi-module clustering technique to adaptively select the appropriate model for handling varying degrees of overlapping speech. For the automatic speech recognition (ASR) system, we proposed an AS… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  14. arXiv:2505.21960  [pdf, ps, other

    cs.CV

    One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

    Authors: Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang

    Abstract: Text-to-Image (T2I) diffusion models have made remarkable advancements in generative modeling; however, they face a trade-off between inference speed and image quality, posing challenges for efficient deployment. Existing distilled T2I models can generate high-fidelity images with fewer sampling steps, but often struggle with diversity and quality, especially in one-step models. From our analysis,… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted at CVPR2025, Code: https://github.com/sen-mao/Loopfree

  15. arXiv:2505.21598  [pdf, ps, other

    cs.CL

    Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives

    Authors: Yajiao Liu, Congliang Chen, Junchi Yang, Ruoyu Sun

    Abstract: Training large language models with data collected from various domains can improve their performance on downstream tasks. However, given a fixed training budget, the sampling proportions of these different domains significantly impact the model's performance. How can we determine the domain weights across different data domains to train the best-performing model within constrained computational r… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: The first version of this paper was submitted to ACL ARR 2025 February Submission

  16. Stochastic Geometry-Based Performance Evaluation for LEO Satellite-Assisted Space Caching

    Authors: Chunyi Ma, Jiajie Xu, Jianhua Yang, Mustafa A. Kishk

    Abstract: To achieve the Internet of Things (IoT) vision,Mobile Edge Computing (MEC) is a promising technology aimed at providing low-latency computing services to user equipment (UE). However, terrestrial MEC network struggles to provide service to UEs in remote and maritime region. Low Earth Orbit (LEO) satellite networks have the potential to overcome geographical restrictions and provide seamless global… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 15 pages, 12 figures, be accepted by IEEE IoTJ

  17. arXiv:2505.20834  [pdf, ps, other

    cs.CV cs.NE

    Fully Spiking Neural Networks for Unified Frame-Event Object Tracking

    Authors: Jingjun Yang, Liangwei Fan, Jinpu Zhang, Xiangkai Lian, Hui Shen, Dewen Hu

    Abstract: The integration of image and event streams offers a promising approach for achieving robust visual object tracking in complex environments. However, current fusion methods achieve high performance at the cost of significant computational overhead and struggle to efficiently extract the sparse, asynchronous information from event streams, failing to leverage the energy-efficient advantages of event… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 13 pages,6 figures,4 tables

  18. arXiv:2505.20737  [pdf, ps, other

    cs.AI

    RRO: LLM Agent Optimization Through Rising Reward Trajectories

    Authors: Zilong Wang, Jingfeng Yang, Sreyashi Nag, Samarth Varshney, Xianfeng Tang, Haoming Jiang, Jingbo Shang, Sheikh Muhammad Sarwar

    Abstract: Large language models (LLMs) have exhibited extraordinary performance in a variety of tasks while it remains challenging for them to solve complex multi-step tasks as agents. In practice, agents sensitive to the outcome of certain key steps which makes them likely to fail the task because of a subtle mistake in the planning trajectory. Recent approaches resort to calibrating the reasoning process… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: preprint

  19. arXiv:2505.20641  [pdf, other

    cs.CV

    See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction

    Authors: Yuan Wu, Zhiqiang Yan, Yigong Zhang, Xiang Li, Jian Yang

    Abstract: Occupancy prediction aims to estimate the 3D spatial distribution of occupied regions along with their corresponding semantic labels. Existing vision-based methods perform well on daytime benchmarks but struggle in nighttime scenarios due to limited visibility and challenging lighting conditions. To address these challenges, we propose \textbf{LIAR}, a novel framework that learns illumination-affi… ▽ More

    Submitted 28 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  20. arXiv:2505.20139  [pdf, ps, other

    cs.SE cs.AI cs.CL

    StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

    Authors: Jialin Yang, Dongfu Jiang, Lipeng He, Sherman Siu, Yuxuan Zhang, Disen Liao, Zhuofeng Li, Huaye Zeng, Yiming Jia, Haozhe Wang, Benjamin Schneider, Chi Ruan, Wentao Ma, Zhiheng Lyu, Yifei Wang, Yi Lu, Quy Duc Do, Ziyan Jiang, Ping Nie, Wenhu Chen

    Abstract: As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce StructEval, a comprehensive benchmark for evaluating LLMs' capabilities in producing both non-renderable (JSON, YAML, CSV) and renderable (HTML, React, SVG) structured formats. Unlike prior benchmarks, StructEval systematicall… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 16 pages, 9 figures, 13 tables

  21. arXiv:2505.19907  [pdf, ps, other

    hep-ex nucl-ex

    First measurement of $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ cross-sections via $Σ^+$-nucleus scattering at an electron-positron collider

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

    Abstract: Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals o… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 9 pages, 2 figures

  22. arXiv:2505.19690  [pdf, ps, other

    cs.AI

    Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models

    Authors: Baihui Zheng, Boren Zheng, Kerui Cao, Yingshui Tan, Zhendong Liu, Weixun Wang, Jiaheng Liu, Jian Yang, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

    Abstract: Despite the remarkable proficiency of \textit{Large Reasoning Models} (LRMs) in handling complex reasoning tasks, their reliability in safety-critical scenarios remains uncertain. Existing evaluations primarily assess response-level safety, neglecting a critical issue we identify as \textbf{\textit{Superficial Safety Alignment} (SSA)} -- a phenomenon where models produce superficially safe outputs… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  23. arXiv:2505.19637  [pdf, ps, other

    cs.MA

    Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning

    Authors: Byunghyun Yoo, Younghwan Shin, Hyunwoo Kim, Euisok Chung, Jeongmin Yang

    Abstract: In standard reinforcement learning, an episode is defined as a sequence of interactions between agents and the environment, which terminates upon reaching a terminal state or a pre-defined episode length. Setting a shorter episode length enables the generation of multiple episodes with the same number of data samples, thereby facilitating an exploration of diverse states. While shorter episodes ma… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    ACM Class: I.2.11

  24. arXiv:2505.19624  [pdf

    cs.CV cs.AI

    Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat

    Authors: Pusheng Xu, Xia Gong, Xiaolan Chen, Weiyi Zhang, Jiancheng Yang, Bingjie Yan, Meng Yuan, Yalin Zheng, Mingguang He, Danli Shi

    Abstract: Purpose: To develop a bilingual multimodal visual question answering (VQA) benchmark for evaluating VLMs in ophthalmology. Methods: Ophthalmic image posts and associated captions published between January 1, 2016, and December 31, 2024, were collected from WeChat Official Accounts. Based on these captions, bilingual question-answer (QA) pairs in Chinese and English were generated using GPT-4o-mini… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  25. arXiv:2505.19255  [pdf, ps, other

    cs.LG cs.AI

    VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

    Authors: Mingyuan Wu, Jingcheng Yang, Jize Jiang, Meitang Li, Kaizhuo Yan, Hanchao Yu, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt

    Abstract: Reinforcement Learning Finetuning (RFT) has significantly advanced the reasoning capabilities of large language models (LLMs) by enabling long chains of thought, self-correction, and effective tool use. While recent works attempt to extend RFT to vision-language models (VLMs), these efforts largely produce text-only reasoning conditioned on static image inputs, falling short of true multimodal rea… ▽ More

    Submitted 28 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  26. arXiv:2505.19148  [pdf, other

    cs.CV

    DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing

    Authors: Shengdong Han, Shangdong Yang, Xin Zhang, Yuxuan Li, Xiang Li, Jian Yang, Ming-Ming Cheng, Yimian Dai

    Abstract: Resolving closely-spaced small targets in dense clusters presents a significant challenge in infrared imaging, as the overlapping signals hinder precise determination of their quantity, sub-pixel positions, and radiation intensities. While deep learning has advanced the field of infrared small target detection, its application to closely-spaced infrared small targets has not yet been explored. Thi… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  27. arXiv:2505.18954  [pdf, other

    cs.AR

    Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity

    Authors: Cenlin Duan, Jianlei Yang, Yikun Wang, Yiou Wang, Yingjie Qi, Xiaolin He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weisheng Zhao

    Abstract: Processing-in-memory (PIM) is a transformative architectural paradigm designed to overcome the Von Neumann bottleneck. Among PIM architectures, digital SRAM-PIM emerges as a promising solution, offering significant advantages by directly integrating digital logic within the SRAM array. However, rigid crossbar architecture and full array activation pose challenges in efficiently utilizing tradition… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: This paper is accepted by the Journal of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

  28. arXiv:2505.18610  [pdf, ps, other

    cs.CL

    PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs

    Authors: Tengxuan Liu, Shiyao Li, Jiayi Yang, Tianchen Zhao, Feng Zhou, Xiaohui Song, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

    Abstract: Recently, significant progress has been made in developing reasoning-capable Large Language Models (LLMs) through long Chain-of-Thought (CoT) techniques. However, this long-CoT reasoning process imposes substantial memory overhead due to the large Key-Value (KV) Cache memory overhead. Post-training KV Cache quantization has emerged as a promising compression technique and has been extensively stud… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  29. arXiv:2505.18004  [pdf, ps, other

    hep-ex

    Measurement of branching fractions of $Λ_{c}^{+}$ decays to $Σ^{+} η$ and $Σ^{+} η'$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: By analyzing $e^+e^-$ collision data taken at center-of-mass energies $\sqrt{s} = 4.600 \sim 4.699$ $\mbox{GeV}$ with the BESIII detector at the BEPCII collider, corresponding to an integrated luminosity of $\rm 4.5~fb^{-1}$, we study the hadronic decays $Λ_{c}^{+} \rightarrow Σ^{+} η$ and $Λ_{c}^{+} \rightarrow Σ^{+} η^{\prime}$ using the single-tag method. The branching fraction ratio of… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  30. arXiv:2505.17645  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning

    Authors: Chuhao Zhou, Jianfei Yang

    Abstract: Embodied agents operating in smart homes must understand human behavior through diverse sensory inputs and communicate via natural language. While Vision-Language Models (VLMs) have enabled impressive language-grounded perception, their reliance on visual data limits robustness in real-world scenarios with occlusions, poor lighting, or privacy constraints. In this paper, we introduce HoloLLM, a Mu… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 18 pages, 13 figures, 6 tables

  31. arXiv:2505.17389  [pdf, other

    cs.RO cs.AI

    Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space

    Authors: Jinrong Yang, Kexun Chen, Zhuoling Li, Shengkai Wu, Yong Zhao, Liangliang Ren, Wenqiu Luo, Chaohui Shang, Meiyu Zhi, Linfeng Gao, Mingshan Sun, Hui Cheng

    Abstract: Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks. While minimal demonstrations enable robotic action execution, achieving high success rates and generalization requires high cost, e.g., continuously adding data or incrementally conducting human-in-loop processes with complex hardware/software systems. In this paper, we rethink the state/action… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  32. arXiv:2505.17333  [pdf, ps, other

    cs.CV

    Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis

    Authors: Xin You, Minghui Zhang, Hanxiao Zhang, Jie Yang, Nassir Navab

    Abstract: Temporal modeling on regular respiration-induced motions is crucial to image-guided clinical applications. Existing methods cannot simulate temporal motions unless high-dose imaging scans including starting and ending frames exist simultaneously. However, in the preoperative data acquisition stage, the slight movement of patients may result in dynamic backgrounds between the first and last frames… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: early accepted by MICCAI

  33. arXiv:2505.17104  [pdf, ps, other

    cs.CL cs.MM

    P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

    Authors: Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, Zhoujun Li

    Abstract: Academic posters are vital for scholarly communication, yet their manual creation is time-consuming. However, automated academic poster generation faces significant challenges in preserving intricate scientific details and achieving effective visual-textual integration. Existing approaches often struggle with semantic richness and structural nuances, and lack standardized benchmarks for evaluating… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  34. arXiv:2505.17098  [pdf, ps, other

    cs.CL cs.CV

    TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

    Authors: Yanshu Li, Tian Yun, Jianjiang Yang, Pinyuan Feng, Jinfa Huang, Ruixiang Tang

    Abstract: Multimodal in-context learning (ICL) has emerged as a key mechanism for harnessing the capabilities of large vision-language models (LVLMs). However, its effectiveness remains highly sensitive to the quality of input in-context sequences, particularly for tasks involving complex reasoning or open-ended generation. A major limitation is our limited understanding of how LVLMs actually exploit these… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 29 pages, 11 figures, 19 tables. arXiv admin note: substantial text overlap with arXiv:2503.04839

  35. arXiv:2505.17097  [pdf, ps, other

    cs.CV cs.CL

    CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention

    Authors: Yanshu Li, JianJiang Yang, Bozheng Li, Ruixiang Tang

    Abstract: Multimodal in-context learning (ICL) enables large vision-language models (LVLMs) to efficiently adapt to novel tasks, supporting a wide array of real-world applications. However, multimodal ICL remains unstable, and current research largely focuses on optimizing sequence configuration while overlooking the internal mechanisms of LVLMs. In this work, we first provide a theoretical analysis of atte… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 10 pages, 2 figures, 6 tables

  36. arXiv:2505.17006  [pdf, other

    cs.CV cs.RO

    CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

    Authors: Jiange Yang, Yansong Shi, Haoyi Zhu, Mingyu Liu, Kaijing Ma, Yating Wang, Gangshan Wu, Tong He, Limin Wang

    Abstract: Learning latent motion from Internet videos is crucial for building generalist robots. However, existing discrete latent action methods suffer from information loss and struggle with complex and fine-grained dynamics. We propose CoMo, which aims to learn more informative continuous motion representations from diverse, internet-scale videos. CoMo employs a early temporal feature difference mechanis… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 18 pages, 7 figures

  37. arXiv:2505.16399  [pdf, other

    cs.CV

    Sketchy Bounding-box Supervision for 3D Instance Segmentation

    Authors: Qian Deng, Le Hui, Jin Xie, Jian Yang

    Abstract: Bounding box supervision has gained considerable attention in weakly supervised 3D instance segmentation. While this approach alleviates the need for extensive point-level annotations, obtaining accurate bounding boxes in practical applications remains challenging. To this end, we explore the inaccurate bounding box, named sketchy bounding box, which is imitated through perturbing ground truth bou… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  38. arXiv:2505.16384  [pdf, other

    cs.CV cs.HC

    MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

    Authors: Haoming Huang, Musen Zhang, Jianxin Yang, Zhen Li, Jinkai Li, Yao Guo

    Abstract: Eye gaze can provide rich information on human psychological activities, and has garnered significant attention in the field of Human-Robot Interaction (HRI). However, existing gaze estimation methods merely predict either the gaze direction or the Point-of-Gaze (PoG) on the screen, failing to provide sufficient information for a comprehensive six Degree-of-Freedom (DoF) gaze analysis in 3D space.… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Under review

  39. arXiv:2505.16314  [pdf, ps, other

    cs.CV cs.AI

    NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

    Authors: Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao , et al. (90 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspe… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  40. arXiv:2505.15923  [pdf, other

    astro-ph.GA

    Discovery and characterization of 25 new quasars at 4.6 < z < 6.9 from wide-field multi-band surveys

    Authors: Silvia Belladitta, Eduardo Bañados, Zhang-Liang Xie, Roberto Decarli, Silvia Onorato, Jinyi Yang, Manuela Bischetti, Masafusa Onoue, Federica Loiacono, Laura N. Martínez-Ramírez, Chiara Mazzucchelli, Frederick B. Davies, Julien Wolf, Jan-Torge Schindler, Xiaohui Fan, Feige Wang, Fabian Walter, Tatevik Mkrtchyan, Daniel Stern, Emanuele P. Farina, Bram P. Venemans

    Abstract: Luminous quasars at $z>4$ provide key insights into the early Universe. Their rarity necessitates wide-field multi-band surveys to efficiently separate them from the main astrophysical contaminants (i.e., ultracool dwarfs). To expand the sample of high-$z$ quasars, we conducted targeted selections using optical, infrared, and radio surveys, complemented by literature-based quasar candidate catalog… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 25 pages, 8 figures, 12 tables, Accepted for publication in A&A

  41. arXiv:2505.15775  [pdf, ps, other

    math.OC

    New Understandings and Computation on Augmented Lagrangian Methods for Low-Rank Semidefinite Programming

    Authors: Lijun Ding, Haihao Lu, Jinwen Yang

    Abstract: Augmented Lagrangian Method (ALM) combined with Burer-Monteiro (BM) factorization, dubbed ALM-BM, offers a powerful approach for solving large-scale low-rank semidefinite programs (SDPs). Despite its empirical success, the theoretical understandings of the resulting non-convex ALM-BM subproblems, particularly concerning their structural properties and efficient subproblem solvability by first-orde… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  42. arXiv:2505.15765  [pdf, ps, other

    cs.CV cs.AI

    Constructing a 3D Town from a Single Image

    Authors: Kaizhi Zheng, Ruijian Zhang, Jing Gu, Jie Yang, Xin Eric Wang

    Abstract: Acquiring detailed 3D scenes typically demands costly equipment, multi-view data, or labor-intensive modeling. Therefore, a lightweight alternative, generating complex 3D scenes from a single top-down image, plays an essential role in real-world applications. While recent 3D generative models have achieved remarkable results at the object level, their extension to full-scene generation often leads… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  43. arXiv:2505.15656  [pdf, ps, other

    cs.CL

    Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

    Authors: Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang

    Abstract: Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to t… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 19 pages

  44. arXiv:2505.15620  [pdf, ps, other

    hep-ex

    Observation of $χ_{cJ}\to 3K_S^0K^\pmπ^\mp$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (678 additional authors not shown)

    Abstract: By analyzing $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays $χ_{c0,1,2} \to 3K_S^0K^\pmπ^\mp$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to 3K_S^0K^\pmπ^\mp )=(7.95\pm0.50\pm0.65)\times10^{-5},$… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 11 pages, 6 figures

  45. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  46. arXiv:2505.15404  [pdf, other

    cs.CL

    How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

    Authors: Zhexin Zhang, Xian Qi Loye, Victor Shea-Jay Huang, Junxiao Yang, Qi Zhu, Shiyao Cui, Fei Mi, Lifeng Shang, Yingkang Wang, Hongning Wang, Minlie Huang

    Abstract: Large Reasoning Models (LRMs) have achieved remarkable success on reasoning-intensive tasks such as mathematics and programming. However, their enhanced reasoning capabilities do not necessarily translate to improved safety performance-and in some cases, may even degrade it. This raises an important research question: how can we enhance the safety of LRMs? In this paper, we present a comprehensive… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 19 pages

  47. arXiv:2505.15304  [pdf, ps, other

    cs.RO

    Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control

    Authors: Seongmin Park, Hyungmin Kim, Sangwoo kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, Jungwook Choi

    Abstract: Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  48. arXiv:2505.15291  [pdf, ps, other

    cs.CL

    Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization

    Authors: Joonho Yang, Seunghyun Yoon, Hwan Chang, Byeongjeong Kim, Hwanhee Lee

    Abstract: Large Language Models (LLMs) have significantly advanced text generation capabilities, including tasks like summarization, often producing coherent and fluent outputs. However, faithfulness to source material remains a significant challenge due to the generation of hallucinations. While extensive research focuses on detecting and reducing these inaccuracies, less attention has been paid to the pos… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 11 tables, 8 figures

  49. arXiv:2505.15284  [pdf, ps, other

    cs.LG cs.CV

    Kernel PCA for Out-of-Distribution Detection: Non-Linear Kernel Selections and Approximations

    Authors: Kun Fang, Qinghua Tao, Mingzhen He, Kexin Lv, Runze Yang, Haibo Hu, Xiaolin Huang, Jie Yang, Longbin Cao

    Abstract: Out-of-Distribution (OoD) detection is vital for the reliability of deep neural networks, the key of which lies in effectively characterizing the disparities between OoD and In-Distribution (InD) data. In this work, such disparities are exploited through a fresh perspective of non-linear feature subspace. That is, a discriminative non-linear subspace is learned from InD features to capture represe… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: This study is an extension of its conference version published in NeurIPS'24, see https://proceedings.neurips.cc/paper_files/paper/2024/hash/f2543511e5f4d4764857f9ad833a977d-Abstract-Conference.html

  50. arXiv:2505.15184  [pdf, other

    cs.CV

    AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection

    Authors: Yangting Shi, Renjie He, Le Hui, Xiang Li, Jian Yang, Ming-Ming Cheng, Yimian Dai

    Abstract: Omni-domain infrared small target detection (IRSTD) poses formidable challenges, as a single model must seamlessly adapt to diverse imaging systems, varying resolutions, and multiple spectral bands simultaneously. Current approaches predominantly rely on visual-only modeling paradigms that not only struggle with complex background interference and inherently scarce target features, but also exhibi… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.