Skip to main content

Showing 1–50 of 4,263 results for author: Wang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01923  [pdf, ps, other

    cs.CL

    Decision-oriented Text Evaluation

    Authors: Yu-Shiang Huang, Chuan-Ju Wang, Chung-Chi Chen

    Abstract: Natural language generation (NLG) is increasingly deployed in high-stakes domains, yet common intrinsic evaluation methods, such as n-gram overlap or sentence plausibility, weakly correlate with actual decision-making efficacy. We propose a decision-oriented framework for evaluating generated text by directly measuring its influence on human and large language model (LLM) decision outcomes. Using… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2507.01908  [pdf, ps, other

    cs.CV

    Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning

    Authors: Qingdong He, Xueqin Chen, Chaoyi Wang, Yanjie Pan, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Xiangtai Li, Jiangning Zhang

    Abstract: Instruction-based image editing (IIE) has advanced rapidly with the success of diffusion models. However, existing efforts primarily focus on simple and explicit instructions to execute editing operations such as adding, deleting, moving, or swapping objects. They struggle to handle more complex implicit hypothetical instructions that require deeper reasoning to infer plausible visual changes and… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.01801  [pdf, ps, other

    cs.CV

    AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction

    Authors: Bin Rao, Haicheng Liao, Yanchen Guan, Chengyue Wang, Bonan Wang, Jiaxun Zhang, Zhenning Li

    Abstract: Accurately predicting the future trajectories of traffic agents is essential in autonomous driving. However, due to the inherent imbalance in trajectory distributions, tail data in natural datasets often represents more complex and hazardous scenarios. Existing studies typically rely solely on a base model's prediction error, without considering the diversity and uncertainty of long-tail trajector… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.01616  [pdf, ps, other

    cs.IR cs.AI cs.DB

    Enhanced Influence-aware Group Recommendation for Online Media Propagation

    Authors: Chengkun He, Xiangmin Zhou, Chen Wang, Longbing Cao, Jie Shao, Xiaodong Li, Guang Xu, Carrie Jinqiu Hu, Zahir Tari

    Abstract: Group recommendation over social media streams has attracted significant attention due to its wide applications in domains such as e-commerce, entertainment, and online news broadcasting. By leveraging social connections and group behaviours, group recommendation (GR) aims to provide more accurate and engaging content to a set of users rather than individuals. Recently, influence-aware GR has emer… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  5. arXiv:2507.01436  [pdf, ps, other

    cs.HC

    Challenges & Opportunities with LLM-Assisted Visualization Retargeting

    Authors: Luke S. Snyder, Chenglong Wang, Steven Drucker

    Abstract: Despite the ubiquity of visualization examples published on the web, retargeting existing custom chart implementations to new datasets remains difficult, time-intensive, and tedious. The adaptation process assumes author familiarity with both the implementation of the example as well as how the new dataset might need to be transformed to fit into the example code. With recent advances in Large Lan… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 5 pages, 3 figures, 1 table

  6. arXiv:2507.01352  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

    Authors: Chris Yuhao Liu, Liang Zeng, Yuzhen Xiao, Jujie He, Jiacai Liu, Chaojie Wang, Rui Yan, Wei Shen, Fuxiang Zhang, Jiacheng Xu, Yang Liu, Yahui Zhou

    Abstract: Despite the critical role of reward models (RMs) in reinforcement learning from human feedback (RLHF), current state-of-the-art open RMs perform poorly on most existing evaluation benchmarks, failing to capture the spectrum of nuanced and sophisticated human preferences. Even approaches that incorporate advanced training techniques have not yielded meaningful performance improvements. We hypothesi… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  7. arXiv:2507.00816  [pdf, ps, other

    cs.RO cs.AI

    PI-WAN: A Physics-Informed Wind-Adaptive Network for Quadrotor Dynamics Prediction in Unknown Environments

    Authors: Mengyun Wang, Bo Wang, Yifeng Niu, Chang Wang

    Abstract: Accurate dynamics modeling is essential for quadrotors to achieve precise trajectory tracking in various applications. Traditional physical knowledge-driven modeling methods face substantial limitations in unknown environments characterized by variable payloads, wind disturbances, and external perturbations. On the other hand, data-driven modeling methods suffer from poor generalization when handl… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2507.00699  [pdf, ps, other

    cs.SE

    A Hierarchical and Evolvable Benchmark for Fine-Grained Code Instruction Following with Multi-Turn Feedback

    Authors: Guoliang Duan, Mingwei Liu, Yanlin Wang, Chong Wang, Xin Peng, Zibin Zheng

    Abstract: Large language models (LLMs) have advanced significantly in code generation, yet their ability to follow complex programming instructions with layered and diverse constraints remains underexplored. Existing benchmarks often prioritize functional correctness, overlooking the nuanced requirements found in real-world development. We introduce MultiCodeIF, a comprehensive benchmark designed to evaluat… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  9. arXiv:2507.00671  [pdf, ps, other

    stat.CO cs.LG stat.ML

    Harnessing the Power of Reinforcement Learning for Adaptive MCMC

    Authors: Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates

    Abstract: Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et a… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  10. arXiv:2507.00045  [pdf, ps, other

    cs.CV cs.AI cs.CL

    CaughtCheating: Is Your MLLM a Good Cheating Detective? Exploring the Boundary of Visual Perception and Reasoning

    Authors: Ming Li, Chenguang Wang, Yijun Liang, Xiyao Wang, Yuhang Zhou, Xiyang Wu, Yuqing Zhang, Ruiyi Zhang, Tianyi Zhou

    Abstract: Recent agentic Multi-Modal Large Language Models (MLLMs) such as GPT-o3 have achieved near-ceiling scores on various existing benchmarks, motivating a demand for more challenging test tasks. These MLLMs have been reported to excel in a few expert-level tasks for humans, e.g., GeoGuesser, reflecting their potential as a detective who can notice minuscule cues in an image and weave them into coheren… ▽ More

    Submitted 23 June, 2025; originally announced July 2025.

  11. arXiv:2507.00026  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CY

    ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models

    Authors: Jiale Ding, Xiang Zheng, Cong Wang, Wei-Bin Lee, Xingjun Ma, Yu-Gang Jiang

    Abstract: As Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications, evaluating their safety-especially under adversarial prompting-has become critical. Arguably, effective safety evaluations should be adaptive, evolving with LLM capabilities, and also cover a broad spectrum of harmful topics and real-world scenarios to fully expose potential vulnerabilitie… ▽ More

    Submitted 17 June, 2025; originally announced July 2025.

  12. arXiv:2506.23801  [pdf, ps, other

    cs.CV

    Controllable Reference-Based Real-World Remote Sensing Image Super-Resolution with Generative Diffusion Priors

    Authors: Ce Wang, Wanjie Sun

    Abstract: Super-resolution (SR) techniques can enhance the spatial resolution of remote sensing images by utilizing low-resolution (LR) images to reconstruct high-resolution (HR) images, enabling more efficient large-scale earth observation applications. While single-image super-resolution (SISR) methods have shown progress, reference-based super-resolution (RefSR) offers superior performance by incorporati… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  13. arXiv:2506.23692  [pdf, ps, other

    cs.AI

    Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models

    Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye

    Abstract: While AI for Science (AI4S) serves as an analytical tool in the current research paradigm, it doesn't solve its core inefficiency. We propose "Agent for Science" (Agent4S)-the use of LLM-driven agents to automate the entire research workflow-as the true Fifth Scientific Paradigm. This paper introduces a five-level classification for Agent4S, outlining a clear roadmap from simple task automation to… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  14. arXiv:2506.23121  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

    Authors: Xinlei Yu, Chanmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

    Abstract: Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduc… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 19 pages, 9 figures, 10 tables

  15. arXiv:2506.23009  [pdf, ps, other

    cs.CV

    MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models

    Authors: Jian Chen, Wenye Ma, Penghang Liu, Wei Wang, Tengwei Song, Ming Li, Chenguang Wang, Ruiyi Zhang, Changyou Chen

    Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable visual reasoning abilities in natural images, text-rich documents, and graphic designs. However, their ability to interpret music sheets remains underexplored. To bridge this gap, we introduce MusiXQA, the first comprehensive dataset for evaluating and advancing MLLMs in music sheet understanding. MusiXQA features high-quality synth… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  16. arXiv:2506.22716  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.DB

    BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute

    Authors: Dujian Ding, Ankur Mallick, Shaokun Zhang, Chi Wang, Daniel Madrigal, Mirian Del Carmen Hipolito Garcia, Menglin Xia, Laks V. S. Lakshmanan, Qingyun Wu, Victor Rühle

    Abstract: Large language models (LLMs) are powerful tools but are often expensive to deploy at scale. LLM query routing mitigates this by dynamically assigning queries to models of varying cost and quality to obtain a desired trade-off. Prior query routing approaches generate only one response from the selected model and a single response from a small (inexpensive) model was often not good enough to beat a… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ICML 2025 (main conference)

  17. arXiv:2506.22554  [pdf, ps, other

    cs.CV cs.AI

    Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

    Authors: Vasu Agrawal, Akinniyi Akinyemi, Kathryn Alvero, Morteza Behrooz, Julia Buffalini, Fabio Maria Carlucci, Joy Chen, Junming Chen, Zhang Chen, Shiyang Cheng, Praveen Chowdary, Joe Chuang, Antony D'Avirro, Jon Daly, Ning Dong, Mark Duppenthaler, Cynthia Gao, Jeff Girard, Martin Gleize, Sahir Gomez, Hongyu Gong, Srivathsan Govindarajan, Brandon Han, Sen He, Denise Hernandez , et al. (59 additional authors not shown)

    Abstract: Human communication involves a complex interplay of verbal and nonverbal signals, essential for conveying meaning and achieving interpersonal goals. To develop socially intelligent AI technologies, it is crucial to develop models that can both comprehend and generate dyadic behavioral dynamics. To this end, we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  18. arXiv:2506.22295  [pdf, ps, other

    cs.LG

    Score-Based Model for Low-Rank Tensor Recovery

    Authors: Zhengyun Cheng, Changhao Wang, Guanwen Zhang, Yi Xu, Wei Zhou, Xiangyang Ji

    Abstract: Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these can be viewed as using Dirac delta distributions to model the relationships between shared factors and the low-rank tensor. However, such prior knowledge is rare… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  19. arXiv:2506.22200  [pdf, ps, other

    cs.LG cs.AI

    EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework

    Authors: Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Yue Wang, Yuzhi Zhang

    Abstract: Recent advances in reinforcement learning (RL) have significantly enhanced the reasoning capabilities of large language models (LLMs). Group Relative Policy Optimization (GRPO), an efficient variant of PPO that lowers RL's computational cost, still faces limited exploration, low sample efficiency and instability, constraining its performance on complex reasoning tasks. To address these limitations… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  20. arXiv:2506.22157  [pdf, ps, other

    cs.CL

    Training Language Model to Critique for Better Refinement

    Authors: Tianshu Yu, Chao Xiang, Mingchuan Yang, Pei Ke, Bosi Wen, Cunxiang Wang, Jiale Cheng, Li Zhang, Xinyu Mu, Chuxiong Sun, Minlie Huang

    Abstract: Large language models (LLMs) have demonstrated remarkable evaluation and critique capabilities, providing insightful feedback and identifying flaws in various tasks. However, limited research has explored which types of critiques are most effective for improving model responses or how to generate such critiques. To address this gap, we introduce \textbf{R}efinement-oriented \textbf{C}ritique \text… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 Findings

  21. arXiv:2506.22134  [pdf, ps, other

    cs.CV

    Low-Rank Implicit Neural Representation via Schatten-p Quasi-Norm and Jacobian Regularization

    Authors: Zhengyun Cheng, Changhao Wang, Guanwen Zhang, Yi Xu, Wei Zhou, Xiangyang Ji

    Abstract: Higher-order tensors are well-suited for representing multi-dimensional data, such as color images and videos. Low-rank tensor representation has become essential in machine learning and computer vision, but existing methods like Tucker decomposition offer flexibility at the expense of interpretability. In contrast, while the CANDECOMP/PARAFAC (CP) decomposition provides a more natural and interpr… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Submitted to IEEE Transactions on Circuits and Systems for Video Technology

  22. arXiv:2506.21957  [pdf, ps, other

    cs.CV

    Exploring Semantic Masked Autoencoder for Self-supervised Point Cloud Understanding

    Authors: Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang

    Abstract: Point cloud understanding aims to acquire robust and general feature representations from unlabeled data. Masked point modeling-based methods have recently shown significant performance across various downstream tasks. These pre-training methods rely on random masking strategies to establish the perception of point clouds by restoring corrupted point cloud inputs, which leads to the failure of cap… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCAI 2025

  23. arXiv:2506.21843  [pdf, ps, other

    cs.CV

    3D-Telepathy: Reconstructing 3D Objects from EEG Signals

    Authors: Yuxiang Ge, Jionghao Cheng, Ruiquan Ge, Zhaojie Fang, Gangyong Jia, Xiang Wan, Nannan Li, Ahmed Elazab, Changmiao Wang

    Abstract: Reconstructing 3D visual stimuli from Electroencephalography (EEG) data holds significant potential for applications in Brain-Computer Interfaces (BCIs) and aiding individuals with communication disorders. Traditionally, efforts have focused on converting brain activity into 2D images, neglecting the translation of EEG data into 3D objects. This limitation is noteworthy, as the human brain inheren… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  24. arXiv:2506.21572  [pdf, ps, other

    cs.CL

    Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling

    Authors: Tianyu. Zou, Shengwu. Xiong, Ruilin. Yao, Jirui. Huang, Yi. Rong, Yaxiong. Chen, Shili. Xiong, Cong. Wang

    Abstract: Evaluating multimodal large language models (MLLMs) remains a fundamental challenge due to a lack of structured, interpretable, and theoretically grounded benchmark designs. Existing benchmarks often adopt heuristic-based task groupings with unclear cognitive targets, thus resulting in overlapping abilities, redundant indicators, and limited diagnostic power. In this work, we propose a novel frame… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures

  25. arXiv:2506.21541  [pdf, ps, other

    cs.CV

    StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning

    Authors: Chuxin Wang, Yixin Zha, Wenfei Yang, Tianzhu Zhang

    Abstract: Recently, Mamba-based methods have demonstrated impressive performance in point cloud representation learning by leveraging State Space Model (SSM) with the efficient context modeling ability and linear complexity. However, these methods still face two key issues that limit the potential of SSM: Destroying the adjacency of 3D points during SSM processing and failing to retain long-sequence memory… ▽ More

    Submitted 1 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  26. arXiv:2506.21449  [pdf

    cs.DC

    exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design

    Authors: Maxim Moraru, Weiyi Xia, Zhuo Ye, Feng Zhang, Yongxin Yao, Ying Wai Li, Cai-Zhuang Wang

    Abstract: exa-AMD is a Python-based application designed to accelerate the discovery and design of functional materials by integrating AI/ML tools, materials databases, and quantum mechanical calculations into scalable, high-performance workflows. The execution model of exa-AMD relies on Parsl, a task-parallel programming library that enables a flexible execution of tasks on any computing resource from lapt… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: We intend to publish the paper to the Journal of Open Source Software

  27. arXiv:2506.21101  [pdf, ps, other

    cs.CV

    OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

    Authors: Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, SevenShu, Yunsheng Wu, Yongge Liu, Rongrong Ji

    Abstract: As one of the earliest ancient languages, Oracle Bone Script (OBS) encapsulates the cultural records and intellectual expressions of ancient civilizations. Despite the discovery of approximately 4,500 OBS characters, only about 1,600 have been deciphered. The remaining undeciphered ones, with their complex structure and abstract imagery, pose significant challenges for interpretation. To address t… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  28. arXiv:2506.20756  [pdf, ps, other

    cs.CV

    StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation

    Authors: Haodong Li, Chen Wang, Jiahui Lei, Kostas Daniilidis, Lingjie Liu

    Abstract: Recent video depth estimation methods achieve great performance by following the paradigm of image depth estimation, i.e., typically fine-tuning pre-trained video diffusion models with massive data. However, we argue that video depth estimation is not a naive extension of image depth estimation. The temporal consistency requirements for dynamic and static regions in videos are fundamentally differ… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Work done in Nov 2024, during an internship at the University of Pennsylvania. Project page: https://stereodiff.github.io/

  29. arXiv:2506.20607  [pdf, ps, other

    cs.LG

    H-FEX: A Symbolic Learning Method for Hamiltonian Systems

    Authors: Jasen Lai, Senwei Liang, Chunmei Wang

    Abstract: Hamiltonian systems describe a broad class of dynamical systems governed by Hamiltonian functions, which encode the total energy and dictate the evolution of the system. Data-driven approaches, such as symbolic regression and neural network-based methods, provide a means to learn the governing equations of dynamical systems directly from observational data of Hamiltonian systems. However, these me… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 16 pages, 7 figures

  30. arXiv:2506.20373  [pdf, ps, other

    cs.RO cs.AI cs.HC

    CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

    Authors: Joerg Deigmoeller, Stephan Hasler, Nakul Agarwal, Daniel Tanneberg, Anna Belardinelli, Reza Ghoddoosian, Chao Wang, Felix Ocker, Fan Zhang, Behzad Dariush, Michael Gienger

    Abstract: We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring tha… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  31. arXiv:2506.18879  [pdf, ps, other

    cs.CL cs.AI

    CommVQ: Commutative Vector Quantization for KV Cache Compression

    Authors: Junyan Li, Yang Zhang, Muhammad Yusuf Hassan, Talha Chafekar, Tianle Cai, Zhile Ren, Pengsheng Guo, Foroozan Karimzadeh, Colorado Reed, Chong Wang, Chuang Gan

    Abstract: Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. We first introduce additive quantization with a lightweight encoder and co… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: ICML 2025 poster

  32. arXiv:2506.18839  [pdf, ps, other

    cs.CV

    4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

    Authors: Chaoyang Wang, Ashkan Mirzaei, Vidit Goel, Willi Menapace, Aliaksandr Siarohin, Avalon Vinella, Michael Vasilkovsky, Ivan Skorokhodov, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Peter Wonka

    Abstract: We propose the first framework capable of computing a 4D spatio-temporal grid of video frames and 3D Gaussian particles for each time step using a feed-forward architecture. Our architecture has two main components, a 4D video model and a 4D reconstruction model. In the first part, we analyze current 4D video diffusion architectures that perform spatial and temporal attention either sequentially o… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  33. arXiv:2506.18655  [pdf, ps, other

    cs.CV

    RDPO: Real Data Preference Optimization for Physics Consistency Video Generation

    Authors: Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, Anxiang Zeng

    Abstract: Video generation techniques have achieved remarkable advancements in visual quality, yet faithfully reproducing real-world physics remains elusive. Preference-based model post-training may improve physical consistency, but requires costly human-annotated datasets or reward models that are not yet feasible. To address these challenges, we present Real Data Preference Optimisation (RDPO), an annotat… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 16 pages, 10 figures

    ACM Class: I.2.6; I.2.10

  34. arXiv:2506.18466  [pdf, ps, other

    cs.RO cs.HC

    Mirror Eyes: Explainable Human-Robot Interaction at a Glance

    Authors: Matti Krüger, Daniel Tanneberg, Chao Wang, Stephan Hasler, Michael Gienger

    Abstract: The gaze of a person tends to reflect their interest. This work explores what happens when this statement is taken literally and applied to robots. Here we present a robot system that employs a moving robot head with a screen-based eye model that can direct the robot's gaze to points in physical space and present a reflection-like mirror image of the attended region on top of each eye. We conducte… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted to the 34th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

  35. arXiv:2506.18240  [pdf, ps, other

    cs.LG cs.AI physics.optics

    Quantum-Classical Hybrid Quantized Neural Network

    Authors: Wenxin Li, Chuan Wang, Hongdong Zhu, Qi Gao, Yin Ma, Hai Wei, Kai Wen

    Abstract: Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activat… ▽ More

    Submitted 24 June, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: 27 pages, 5 figures, comments are welcome

  36. arXiv:2506.18233  [pdf, ps, other

    cs.AI

    The 4th Dimension for Scaling Model Size

    Authors: Ruike Zhu, Hanwen Zhang, Tianyu Shi, Chi Wang, Tianyi Zhou, Zengyi Qin

    Abstract: Scaling the size of large language models typically involves three dimensions: depth, width, and the number of parameters. In this work, we explore a fourth dimension, virtual logical depth (VLD), which increases the effective algorithmic depth without changing the overall parameter count by reusing parameters within the model. Although parameter reuse is not a new concept, its potential and chara… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  37. arXiv:2506.18050  [pdf, ps, other

    cs.SE

    VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software

    Authors: Lyuye Zhang, Jian Zhang, Kaixuan Li, Chong Wang, Chengwei Liu, Jiahui Wu, Sen Chen, Yaowen Zheng, Yang Liu

    Abstract: Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies. In particular, reachability analysis is increasingly used in Open-Source Software (OSS) projects to identify reachable vulnerabilities (e.g., CVEs) through call graphs, enabling a focus on exploitable risks. Performing reachability analysis typically requires the vulnerab… ▽ More

    Submitted 24 June, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: 15 pages

  38. arXiv:2506.17622  [pdf, ps, other

    cs.CR

    SoK: Stablecoin Designs, Risks, and the Stablecoin LEGO

    Authors: Shengchen Ling, Yuefeng Du, Yajin Zhou, Lei Wu, Cong Wang, Xiaohua Jia, Houmin Yan

    Abstract: Stablecoins have become significant assets in modern finance, with a market capitalization exceeding USD 246 billion (May 2025). Yet, despite their systemic importance, a comprehensive and risk-oriented understanding of crucial aspects like their design trade-offs, security dynamics, and interdependent failure pathways often remains underdeveloped. This SoK confronts this gap through a large-scale… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  39. arXiv:2506.17219  [pdf, ps, other

    cs.LG cs.AI

    No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

    Authors: Yanzhi Zhang, Zhaoxi Zhang, Haoxiang Guan, Yilin Cheng, Yitong Duan, Chen Wang, Yue Wang, Shuxin Zheng, Jiyan He

    Abstract: Reinforcement learning has emerged as a powerful paradigm for post-training large language models (LLMs) to improve reasoning. Approaches like Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) have shown strong results, but they require extensive external supervision. We investigate an alternative class of methods, Reinforcement Learning fr… ▽ More

    Submitted 25 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  40. arXiv:2506.16735  [pdf, other

    cs.CV eess.IV

    3DeepRep: 3D Deep Low-rank Tensor Representation for Hyperspectral Image Inpainting

    Authors: Yunshan Li, Wenwu Gong, Qianqian Wang, Chao Wang, Lili Yang

    Abstract: Recent approaches based on transform-based tensor nuclear norm (TNN) have demonstrated notable effectiveness in hyperspectral image (HSI) inpainting by leveraging low-rank structures in latent representations. Recent developments incorporate deep transforms to improve low-rank tensor representation; however, existing approaches typically restrict the transform to the spectral mode, neglecting low-… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  41. arXiv:2506.16718  [pdf, ps, other

    cs.MA cs.AI

    Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation

    Authors: Chenxu Wang, Yonggang Jin, Cheng Hu, Youpeng Zhao, Zipeng Dai, Jian Zhao, Shiyu Huang, Liuyu Xiang, Junge Zhang, Zhaofeng He

    Abstract: Adapting a single agent to a new multi-agent system brings challenges, necessitating adjustments across various tasks, environments, and interactions with unknown teammates and opponents. Addressing this challenge is highly complex, and researchers have proposed two simplified scenarios, Multi-agent reinforcement learning for zero-shot learning and Ad-Hoc Teamwork. Building on these foundations, w… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: This manuscript is under submission to Neurocomputing

    Report number: NEUCOM-D-25-02272R1

  42. arXiv:2506.16411  [pdf, ps, other

    cs.CL cs.LG

    When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework

    Authors: Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang

    Abstract: We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: under review

  43. arXiv:2506.16285  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information

    Authors: Hao-Chien Lu, Jhen-Ke Lin, Hong-Yun Lin, Chung-Chun Wang, Berlin Chen

    Abstract: Current automated speaking assessment (ASA) systems for use in multi-aspect evaluations often fail to make full use of content relevance, overlooking image or exemplar cues, and employ superficial grammar analysis that lacks detailed error types. This paper ameliorates these deficiencies by introducing two novel enhancements to construct a hybrid scoring model. First, a multifaceted relevance modu… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: submitted to the ISCA SLaTE-2025 Workshop

  44. arXiv:2506.16202  [pdf, ps, other

    cs.CY cs.HC stat.AP

    AI labeling reduces the perceived accuracy of online content but has limited broader effects

    Authors: Chuyao Wang, Patrick Sturgis, Daniel de Kadt

    Abstract: Explicit labeling of online content produced by artificial intelligence (AI) is a widely mooted policy for ensuring transparency and promoting public confidence. Yet little is known about the scope of AI labeling effects on public assessments of labeled content. We contribute new evidence on this question from a survey experiment using a high-quality nationally representative probability sample (n… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 30 pages, 5 figures, 10 tables

    MSC Class: 62P25; 91C99 ACM Class: J.4; H.1.2

  45. arXiv:2506.16151  [pdf, ps, other

    cs.CL cs.AI

    Under the Shadow of Babel: How Language Shapes Reasoning in LLMs

    Authors: Chenxi Wang, Yixuan Zhang, Lang Gao, Zixiang Xu, Zirui Song, Yanbo Wang, Xiuying Chen

    Abstract: Language is not only a tool for communication but also a medium for human cognition and reasoning. If, as linguistic relativity suggests, the structure of language shapes cognitive patterns, then large language models (LLMs) trained on human language may also internalize the habitual logical structures embedded in different languages. To examine this hypothesis, we introduce BICAUSE, a structured… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 15 pages, 10 figures

  46. arXiv:2506.15980  [pdf, ps, other

    cs.CV cs.AI

    Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization

    Authors: Cong Wang, Zexuan Deng, Zhiwei Jiang, Fei Shen, Yafeng Yin, Shiwei Gan, Zifeng Cheng, Shiping Ge, Qing Gu

    Abstract: Sign Language Video Generation (SLVG) seeks to generate identity-preserving sign language videos from spoken language texts. Existing methods primarily rely on the single coarse condition (\eg, skeleton sequences) as the intermediary to bridge the translation model and the video generation model, which limits both the naturalness and expressiveness of the generated videos. To overcome these limita… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  47. arXiv:2506.15569  [pdf, ps, other

    cs.CL

    SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

    Authors: Chengye Wang, Yifei Shen, Zexi Kuang, Arman Cohan, Yilun Zhao

    Abstract: We introduce SciVer, the first benchmark specifically designed to evaluate the ability of foundation models to verify claims within a multimodal scientific context. SciVer consists of 3,000 expert-annotated examples over 1,113 scientific papers, covering four subsets, each representing a common reasoning type in multimodal scientific claim verification. To enable fine-grained evaluation, each exam… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  48. arXiv:2506.15565  [pdf, ps, other

    cs.CV

    Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification

    Authors: Junhao Wu, Aboagye-Ntow Stephen, Chuyuan Wang, Gang Chen, Xin Huang

    Abstract: Ultra-high Spatial Resolution Land Cover Classification is essential for fine-grained land cover analysis, yet it remains challenging due to the high cost of pixel-level annotations, significant scale variation, and the limited adaptability of large-scale vision models. Existing methods typically focus on 1-meter spatial resolution imagery and rely heavily on annotated data, whereas practical appl… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  49. arXiv:2506.15545  [pdf, ps, other

    cs.CL

    RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models

    Authors: Bailin Wang, Chang Lan, Chong Wang, Ruoming Pang

    Abstract: Local-global attention models have recently emerged as compelling alternatives to standard Transformers, promising improvements in both training and inference efficiency. However, the crucial choice of window size presents a Pareto tradeoff: larger windows maintain performance akin to full attention but offer minimal efficiency gains in short-context scenarios, while smaller windows can lead to pe… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 9 pages

  50. arXiv:2506.15377  [pdf, ps, other

    cs.AI

    Efficient and Generalizable Environmental Understanding for Visual Navigation

    Authors: Ruoyu Wang, Xinshu Li, Chen Wang, Lina Yao

    Abstract: Visual Navigation is a core task in Embodied AI, enabling agents to navigate complex environments toward given objectives. Across diverse settings within Navigation tasks, many necessitate the modelling of sequential data accumulated from preceding time steps. While existing methods perform well, they typically process all historical observations simultaneously, overlooking the internal associatio… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.