Skip to main content

Showing 1–50 of 6,792 results for author: Wang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01775  [pdf, ps, other

    cs.CG cs.DS

    A Deterministic Partition Tree and Applications

    Authors: Haitao Wang

    Abstract: In this paper, we present a deterministic variant of Chan's randomized partition tree [Discret. Comput. Geom., 2012]. This result leads to numerous applications. In particular, for $d$-dimensional simplex range counting (for any constant $d \ge 2$), we construct a data structure using $O(n)$ space and $O(n^{1+ε})$ preprocessing time, such that each query can be answered in $o(n^{1-1/d})$ time (spe… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: To appear in ESA 2025

  2. arXiv:2507.01773  [pdf, ps, other

    cs.NI

    Frontiers of Generative AI for Network Optimization: Theories, Limits, and Visions

    Authors: Bo Yang, Ruihuai Liang, Weixin Li, Han Wang, Xuelin Cao, Zhiwen Yu, Samson Lasaulce, Mérouane Debbah, Mohamed-Slim Alouini, H. Vincent Poor, Chau Yuen

    Abstract: While interest in the application of generative AI (GenAI) in network optimization has surged in recent years, its rapid progress has often overshadowed critical limitations intrinsic to generative models that remain insufficiently examined in existing literature. This survey provides a comprehensive review and critical analysis of GenAI in network optimization. We focus on the two dominant paradi… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.01694  [pdf, ps, other

    cs.CR eess.SY

    Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks

    Authors: Hanlin Cai, Haofan Dong, Houtianfu Wang, Kai Li, Ozgur B. Akan

    Abstract: Federated large language models (FedLLMs) provide powerful generative capabilities in CyberEdge networks while protecting data privacy. However, FedLLMs remains highly vulnerable to model poisoning attacks. This article first reviews recent model poisoning techniques and existing defense mechanisms for FedLLMs, highlighting critical limitations, particularly under non-IID text distributions. In pa… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 7 pages, 5 figures

  4. arXiv:2507.01663  [pdf, ps, other

    cs.LG cs.AI

    AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

    Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu

    Abstract: Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled w… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  5. arXiv:2507.01586  [pdf, ps, other

    cs.CV

    SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation

    Authors: Bryan Constantine Sadihin, Michael Hua Wang, Shei Pern Chua, Hang Su

    Abstract: The production of high-quality 2D animation is highly labor-intensive process, as animators are currently required to draw and color a large number of frames by hand. We present SketchColour, the first sketch-to-colour pipeline for 2D animation built on a diffusion transformer (DiT) backbone. By replacing the conventional U-Net denoiser with a DiT-style architecture and injecting sketch informatio… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Project page and code: https://bconstantine.github.io/SketchColour

  6. arXiv:2507.01573  [pdf, ps, other

    cs.CV

    A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation

    Authors: Hao Wang, Keyan Hu, Xin Guo, Haifeng Li, Chao Tao

    Abstract: Remote sensing semantic segmentation must address both what the ground objects are within an image and where they are located. Consequently, segmentation models must ensure not only the semantic correctness of large-scale patches (low-frequency information) but also the precise localization of boundaries between patches (high-frequency information). However, most existing approaches rely heavily o… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 20 pages, 14 figures

  7. arXiv:2507.01561  [pdf, ps, other

    cs.RO

    Self-Closing Suction Grippers for Industrial Grasping via Form-Flexible Design

    Authors: Huijiang Wang, Holger Kunz, Timon Adler, Fumiya Iida

    Abstract: Shape-morphing robots have shown benefits in industrial grasping. We propose form-flexible grippers for adaptive grasping. The design is based on the hybrid jamming and suction mechanism, which deforms to handle objects that vary significantly in size from the aperture, including both larger and smaller parts. Compared with traditional grippers, the gripper achieves self-closing to form an airtigh… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: This manuscript has been submitted for potential consideration at IEEE publication venues

  8. arXiv:2507.01378  [pdf, ps, other

    cs.MA cs.AI cs.RO

    RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms

    Authors: Ziyao Wang, Rongpeng Li, Sizhao Li, Yuming Xiang, Haiping Wang, Zhifeng Zhao, Honggang Zhang

    Abstract: Intelligent control of Unmanned Aerial Vehicles (UAVs) swarms has emerged as a critical research focus, and it typically requires the swarm to navigate effectively while avoiding obstacles and achieving continuous coverage over multiple mission targets. Although traditional Multi-Agent Reinforcement Learning (MARL) approaches offer dynamic adaptability, they are hindered by the semantic gap in num… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  9. arXiv:2507.01216  [pdf, ps, other

    cs.LG cs.CR

    PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

    Authors: Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Hao Wang, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan

    Abstract: There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  10. arXiv:2507.01017  [pdf, ps, other

    cs.HC

    A Comprehensive Review of Human Error in Risk-Informed Decision Making: Integrating Human Reliability Assessment, Artificial Intelligence, and Human Performance Models

    Authors: Xingyu Xiao, Hongxu Zhu, Jingang Liang, Jiejuan Tong, Haitao Wang

    Abstract: Human error remains a dominant risk driver in safety-critical sectors such as nuclear power, aviation, and healthcare, where seemingly minor mistakes can cascade into catastrophic outcomes. Although decades of research have produced a rich repertoire of mitigation techniques, persistent limitations: scarce high-quality data, algorithmic opacity, and residual reliance on expert judgment, continue t… ▽ More

    Submitted 10 June, 2025; originally announced July 2025.

  11. arXiv:2507.00915  [pdf, ps, other

    cs.IT

    MichelangeRoll: Sculpting Rational Distributions Exactly and Efficiently

    Authors: Jui-Hsiang Shao, Hsin-Po Wang

    Abstract: Simulating an arbitrary discrete distribution $D \in [0, 1]^n$ using fair coin tosses incurs trade-offs between entropy complexity and space and time complexity. Shannon's theory suggests that $H(D)$ tosses are necessary and sufficient, but does not guarantee exact distribution. Knuth and Yao showed that a decision tree consumes fewer than $H(D) + 2$ tosses for one exact sample. Drapper and Saad's… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 13 pages, 7 figures, RANDOM says no so here

  12. arXiv:2507.00790  [pdf, ps, other

    cs.CV cs.AI

    LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

    Authors: Huaqiu Li, Yong Wang, Tongwen Huang, Hailang Huang, Haoqian Wang, Xiangxiang Chu

    Abstract: Unified image restoration is a significantly challenging task in low-level vision. Existing methods either make tailored designs for specific tasks, limiting their generalizability across various types of degradation, or rely on training with paired datasets, thereby suffering from closed-set constraints. To address these issues, we propose a novel, dataset-free, and unified approach through recur… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  13. arXiv:2507.00066  [pdf, other

    cs.HC cs.AI

    InSight-R: A Framework for Risk-informed Human Failure Event Identification and Interface-Induced Risk Assessment Driven by AutoGraph

    Authors: Xingyu Xiao, Jiejuan Tong, Peng Chen, Jun Sun, Zhe Sui, Jingang Liang, Hongru Zhao, Jun Zhao, Haitao Wang

    Abstract: Human reliability remains a critical concern in safety-critical domains such as nuclear power, where operational failures are often linked to human error. While conventional human reliability analysis (HRA) methods have been widely adopted, they rely heavily on expert judgment for identifying human failure events (HFEs) and assigning performance influencing factors (PIFs). This reliance introduces… ▽ More

    Submitted 27 June, 2025; originally announced July 2025.

  14. arXiv:2506.24123  [pdf, ps, other

    cs.CV

    Calligrapher: Freestyle Text Image Customization

    Authors: Yue Ma, Qingyan Bai, Hao Ouyang, Ka Leong Cheng, Qiuyu Wang, Hongyu Liu, Zichen Liu, Haofan Wang, Jingye Chen, Yujun Shen, Qifeng Chen

    Abstract: We introduce Calligrapher, a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Addressing the challenges of precise style control and data dependency in typographic customization, our framework incorporates three key technical contributions. First, we develop a self-distillation mechani… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Project page: https://calligrapher2025.github.io/Calligrapher Code: https://github.com/Calligrapher2025/Calligrapher

  15. arXiv:2506.24005  [pdf, ps, other

    cs.LG

    Provably Efficient and Agile Randomized Q-Learning

    Authors: He Wang, Xingyu Xu, Yuejie Chi

    Abstract: While Bayesian-based exploration often demonstrates superior empirical performance compared to bonus-based methods in model-based reinforcement learning (RL), its theoretical understanding remains limited for model-free settings. Existing provable algorithms either suffer from computational intractability or rely on stage-wise policy updates which reduce responsiveness and slow down the learning p… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  16. arXiv:2506.23986  [pdf, ps, other

    cs.SD eess.AS

    StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding

    Authors: Dake Guo, Jixun Yao, Linhan Ma, He Wang, Lei Xie

    Abstract: Recent advancements in discrete token-based speech generation have highlighted the importance of token-to-waveform generation for audio quality, particularly in real-time interactions. Traditional frameworks integrating semantic tokens with flow matching (FM) struggle with streaming capabilities due to their reliance on a global receptive field. Additionally, directly implementing token-by-token s… ▽ More

    Submitted 1 July, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

  17. arXiv:2506.23762  [pdf, ps, other

    cs.SE cs.AI

    Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead

    Authors: Hongzhou Rao, Yanjie Zhao, Xinyi Hou, Shenao Wang, Haoyu Wang

    Abstract: The rapid advancement of large language models (LLMs) has redefined artificial intelligence (AI), pushing the boundaries of AI research and enabling unbounded possibilities for both academia and the industry. However, LLM development faces increasingly complex challenges throughout its lifecycle, yet no existing research systematically explores these challenges and solutions from the perspective o… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  18. arXiv:2506.23643  [pdf, ps, other

    cs.IR

    Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation

    Authors: Yifan Wang, Weinan Gan, Longtao Xiao, Jieming Zhu, Heng Chang, Haozhao Wang, Rui Zhang, Zhenhua Dong, Ruiming Tang, Ruixuan Li

    Abstract: Generative recommendation (GR) typically encodes behavioral or semantic aspects of item information into discrete tokens, leveraging the standard autoregressive (AR) generation paradigm to make predictions. However, existing methods tend to overlook their intrinsic relationship, that is, the semantic usually provides some reasonable explainability "$\textbf{why}$" for the behavior "… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures

  19. arXiv:2506.23622  [pdf, ps, other

    cs.CR

    Privacy-Preserving Federated Learning Scheme with Mitigating Model Poisoning Attacks: Vulnerabilities and Countermeasures

    Authors: Jiahui Wu, Fucai Luo, Tiecheng Sun, Haiyan Wang, Weizhe Zhang

    Abstract: The privacy-preserving federated learning schemes based on the setting of two honest-but-curious and non-colluding servers offer promising solutions in terms of security and efficiency. However, our investigation reveals that these schemes still suffer from privacy leakage when considering model poisoning attacks from malicious users. Specifically, we demonstrate that the privacy-preserving comput… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  20. arXiv:2506.23549  [pdf, ps, other

    cs.AI cs.HC cs.LG

    CooT: Learning to Coordinate In-Context with Coordination Transformers

    Authors: Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun

    Abstract: Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require extensive training. To overcome these limitations, we propose Coordination Transformers (CooT), a novel in-context coordination framewo… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 23 pages, 10 tables, 8 figures

  21. arXiv:2506.23485  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent

    Authors: Haocheng Yu, Yaxiong Wu, Hao Wang, Wei Guo, Yong Liu, Yawen Li, Yuyang Ye, Junping Du, Enhong Chen

    Abstract: Interactive recommendation is a typical information-seeking task that allows users to interactively express their needs through natural language and obtain personalized recommendations. Large language model-powered (LLM-powered) agents have become a new paradigm in interactive recommendations, effectively capturing users' real-time needs and enhancing personalized experiences. However, due to limi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  22. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  23. arXiv:2506.23346  [pdf, ps, other

    cs.RO eess.SY

    Safe and Performant Deployment of Autonomous Systems via Model Predictive Control and Hamilton-Jacobi Reachability Analysis

    Authors: Hao Wang, Armand Jordana, Ludovic Righetti, Somil Bansal

    Abstract: While we have made significant algorithmic developments to enable autonomous systems to perform sophisticated tasks, it remains difficult for them to perform tasks effective and safely. Most existing approaches either fail to provide any safety assurances or substantially compromise task performance for safety. In this work, we develop a framework, based on model predictive control (MPC) and Hamil… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: RSS 2025 Workshop on Reliable Robotics

  24. arXiv:2506.23270  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Token Activation Map to Visually Explain Multimodal LLMs

    Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, Xiaomeng Li

    Abstract: Multimodal large language models (MLLMs) are broadly empowering various fields. Despite their advancements, the explainability of MLLMs remains less explored, hindering deeper understanding, model credibility, and effective visualization. Unlike conventional vision models (e.g., CNNs, ViTs, CLIP) that produce a single output, MLLMs generate sequences of tokens progressively, where each generated t… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: ICCV2025 Accepted

  25. arXiv:2506.23157  [pdf, ps, other

    cs.CV

    STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene

    Authors: Hanyu Zhou, Haonan Wang, Haoyue Liu, Yuxing Duan, Luxin Yan, Gim Hee Lee

    Abstract: High-dynamic scene reconstruction aims to represent static background with rigid spatial features and dynamic objects with deformed continuous spatiotemporal features. Typically, existing methods adopt unified representation model (e.g., Gaussian) to directly match the spatiotemporal features of dynamic scene from frame camera. However, this unified paradigm fails in the potential discontinuous te… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  26. arXiv:2506.22637  [pdf, ps, other

    cs.CV

    CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

    Authors: Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan

    Abstract: The recent introduction of diffusion models in dataset distillation has shown promising potential in creating compact surrogate datasets for large, high-resolution target datasets, offering improved efficiency and performance over traditional bi-level/uni-level optimization methods. However, current diffusion-based dataset distillation approaches overlook the evaluation process and exhibit two cri… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: ICCV 2025. Code is available at https://github.com/hatchetProject/CaO2

  27. arXiv:2506.22037  [pdf

    cs.SE

    KARMA Approach supporting Development Process Reconstruction in Model-based Systems Engineering

    Authors: Jiawei Li, Zan Liang, Guoxin Wang, Jinzhi Lu, Yan Yan, Shouxuan Wu, Hao Wang

    Abstract: Model reconstruction is a method used to drive the development of complex system development processes in model-based systems engineering. Currently, during the iterative design process of a system, there is a lack of an effective method to manage changes in development requirements, such as development cycle requirements and cost requirements, and to realize the reconstruction of the system devel… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 12 pages, 9 figures, submitted to the 15th international Complex Systems Design & Management (CSD&M) conference

  28. arXiv:2506.22027  [pdf, ps, other

    cs.CV

    Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

    Authors: Han Wang, Shengyang Li, Jian Yang, Yuxuan Liu, Yixuan Lv, Zhuang Zhou

    Abstract: Detecting and tracking ground objects using earth observation imagery remains a significant challenge in the field of remote sensing. Continuous maritime ship tracking is crucial for applications such as maritime search and rescue, law enforcement, and shipping analysis. However, most current ship tracking methods rely on geostationary satellites or video satellites. The former offer low resolutio… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  29. arXiv:2506.22023  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy

    Authors: Bohan Li, Zhihan Li, Haoran Wang, Hanglei Zhang, Yiwei Guo, Hankun Wang, Xie Chen, Kai Yu

    Abstract: Recently, autoregressive (AR) language models have emerged as a dominant approach in speech synthesis, offering expressive generation and scalable training. However, conventional AR speech synthesis models relying on the next-token prediction paradigm often encounter significant challenges when handling long speech sequences. These models often struggle to construct stable frame-to-frame attention… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 17 pages, 8 figures, 5 tables

  30. arXiv:2506.21952  [pdf, ps, other

    cs.LG physics.app-ph physics.optics

    Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applications

    Authors: Yangyang Wan, Haotian Wang, Xuhui Yu, Jiageng Chen, Xinyu Fan, Zuyuan He

    Abstract: Distributed acoustic sensing (DAS) has attracted considerable attention across various fields and artificial intelligence (AI) technology plays an important role in DAS applications to realize event recognition and denoising. Existing AI models require real-world data (RWD), whether labeled or not, for training, which is contradictory to the fact of limited available event data in real-world scena… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  31. arXiv:2506.21926  [pdf, ps, other

    cs.CG cs.DS

    Computing Maximum Cliques in Unit Disk Graphs

    Authors: Anastasiia Tkachenko, Haitao Wang

    Abstract: Given a set $P$ of $n$ points in the plane, the unit-disk graph $G(P)$ is a graph with $P$ as its vertex set such that two points of $P$ have an edge if their Euclidean distance is at most $1$. We consider the problem of computing a maximum clique in $G(P)$. The previously best algorithm for the problem runs in $O(n^{7/3+o(1)})$ time. We show that the problem can be solved in… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: To appear in CCCG 2025

  32. arXiv:2506.21865  [pdf, ps, other

    cs.MM cs.CL

    RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture

    Authors: Haofeng Wang, Yilin Guo, Zehao Li, Tong Yue, Yizong Wang, Enci Zhang, Rongqun Lin, Feng Gao, Shiqi Wang, Siwei Ma

    Abstract: The Yellow River is China's mother river and a cradle of human civilization. The ancient Yellow River culture is, moreover, an indispensable part of human art history. To conserve and inherit the ancient Yellow River culture, we designed RiverEcho, a real-time interactive system that responds to voice queries using a large language model and a cultural knowledge dataset, delivering explanations th… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: IEEE International Conference on Multimedia and Expo Workshop, 2025.(Accepted)

  33. arXiv:2506.21851  [pdf, ps, other

    cs.CV cs.MM eess.IV

    End-to-End RGB-IR Joint Image Compression With Channel-wise Cross-modality Entropy Model

    Authors: Haofeng Wang, Fangtao Zhou, Qi Zhang, Zeyuan Chen, Enci Zhang, Zhao Wang, Xiaofeng Huang, Siwei Ma

    Abstract: RGB-IR(RGB-Infrared) image pairs are frequently applied simultaneously in various applications like intelligent surveillance. However, as the number of modalities increases, the required data storage and transmission costs also double. Therefore, efficient RGB-IR data compression is essential. This work proposes a joint compression framework for RGB-IR image pair. Specifically, to fully utilize cr… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: IEEE International Conference on Systems, Man, and Cybernetics 2025. (SMC), under review

  34. arXiv:2506.21611  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Does Multimodality Lead to Better Time Series Forecasting?

    Authors: Xiyuan Zhang, Boran Han, Haoyang Fang, Abdul Fatir Ansari, Shuai Zhang, Danielle C. Maddix, Cuixiong Hu, Andrew Gordon Wilson, Michael W. Mahoney, Hao Wang, Yan Liu, Huzefa Rangwala, George Karypis, Bernie Wang

    Abstract: Recently, there has been growing interest in incorporating textual information into foundation models for time series forecasting. However, it remains unclear whether and under what conditions such multimodal integration consistently yields gains. We systematically investigate these questions across a diverse benchmark of 14 forecasting tasks spanning 7 domains, including health, environment, and… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  35. arXiv:2506.21349  [pdf, ps, other

    cs.CV eess.IV

    Generalizable Neural Electromagnetic Inverse Scattering

    Authors: Yizhe Cheng, Chunxun Tian, Haoru Wang, Wentao Zhu, Xiaoxuan Ma, Yizhou Wang

    Abstract: Solving Electromagnetic Inverse Scattering Problems (EISP) is fundamental in applications such as medical imaging, where the goal is to reconstruct the relative permittivity from scattered electromagnetic field. This inverse process is inherently ill-posed and highly nonlinear, making it particularly challenging. A recent machine learning-based approach, Img-Interiors, shows promising results by l… ▽ More

    Submitted 1 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  36. arXiv:2506.21322  [pdf, ps, other

    cs.HC cs.RO

    "Who Should I Believe?": User Interpretation and Decision-Making When a Family Healthcare Robot Contradicts Human Memory

    Authors: Hong Wang, Natalia Calvo-Barajas, Katie Winkle, Ginevra Castellano

    Abstract: Advancements in robotic capabilities for providing physical assistance, psychological support, and daily health management are making the deployment of intelligent healthcare robots in home environments increasingly feasible in the near future. However, challenges arise when the information provided by these robots contradicts users' memory, raising concerns about user trust and decision-making. T… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 8 pages

  37. arXiv:2506.21140  [pdf, ps, other

    cs.LG cs.AI

    DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding

    Authors: Ziwei Wang, Hongbin Wang, Tianwang Jia, Xingyi He, Siyang Li, Dongrui Wu

    Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) transform spontaneous/evoked neural activity into control commands for external communication. While convolutional neural networks (CNNs) remain the mainstream backbone for EEG decoding, their inherently short receptive field makes it difficult to capture long-range temporal dependencies and global inter-channel relationships. Rec… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 12 pages, 6 figures

  38. arXiv:2506.21096  [pdf, ps, other

    cs.CL

    DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning

    Authors: Kang He, Yuzhe Ding, Haining Wang, Fei Li, Chong Teng, Donghong Ji

    Abstract: Previous multimodal sentence representation learning methods have achieved impressive performance. However, most approaches focus on aligning images and text at a coarse level, facing two critical challenges:cross-modal misalignment bias and intra-modal semantic divergence, which significantly degrade sentence representation quality. To address these challenges, we propose DALR (Dual-level Alignme… ▽ More

    Submitted 30 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 Findings

  39. arXiv:2506.21074  [pdf, ps, other

    eess.AS cs.SD

    CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate

    Authors: Hankun Wang, Yiwei Guo, Chongtian Shao, Bohan Li, Xie Chen, Kai Yu

    Abstract: Neural speech codecs have been widely used in audio compression and various downstream tasks. Current mainstream codecs are fixed-frame-rate (FFR), which allocate the same number of tokens to every equal-duration slice. However, speech is inherently non-uniform in temporal information density. As a result, many tokens are wasted on steady-state segments like long vowels and silences. To address th… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 16 pages, 5 figures, 9 tables

  40. FedSC: Federated Learning with Semantic-Aware Collaboration

    Authors: Huan Wang, Haoran Li, Huaming Chen, Jun Yan, Jiahua Shi, Jun Shen

    Abstract: Federated learning (FL) aims to train models collaboratively across clients without sharing data for privacy-preserving. However, one major challenge is the data heterogeneity issue, which refers to the biased labeling preferences at multiple clients. A number of existing FL methods attempt to tackle data heterogeneity locally (e.g., regularizing local models) or globally (e.g., fine-tuning global… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 12 pages, KDD 2025

  41. arXiv:2506.20923  [pdf, ps, other

    cs.CL

    KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

    Authors: Xinping Zhao, Xinshuo Hu, Zifei Shan, Shouzheng Huang, Yao Zhou, Zetian Sun, Zhenyu Liu, Dongfang Li, Xinyuan Wei, Qian Chen, Youcheng Pan, Yang Xiang, Meishan Zhang, Haofen Wang, Jun Yu, Baotian Hu, Min Zhang

    Abstract: In this paper, we propose KaLM-Embedding-V2, a versatile and compact embedding model, which achieves impressive performance in general-purpose text embedding tasks by leveraging superior training techniques and data. Our key innovations include: (1) To better align the architecture with representation learning, we remove the causal attention mask and adopt a fully bidirectional transformer with si… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Technical Report; 26 pages 12 tables 1 figure. arXiv admin note: substantial text overlap with arXiv:2501.01028

  42. arXiv:2506.20606  [pdf, ps, other

    cs.CL

    Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm

    Authors: Baixiang Huang, Zhen Tan, Haoran Wang, Zijie Liu, Dawei Li, Ali Payani, Huan Liu, Tianlong Chen, Kai Shu

    Abstract: Agents based on Large Language Models (LLMs) have demonstrated strong capabilities across a wide range of tasks. However, deploying LLM-based agents in high-stakes domains comes with significant safety and ethical risks. Unethical behavior by these agents can directly result in serious real-world consequences, including physical harm and financial loss. To efficiently steer the ethical behavior of… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Main paper: 9 pages; total: 18 pages (including appendix). Code, data, results, and additional resources are available at: https://model-editing.github.io

  43. arXiv:2506.20487  [pdf, ps, other

    cs.RO

    Behavior Foundation Model: Towards Next-Generation Whole-Body Control System of Humanoid Robots

    Authors: Mingqi Yuan, Tao Yu, Wenqi Ge, Xiuyong Yao, Dapeng Li, Huijiang Wang, Jiayu Chen, Xin Jin, Bo Li, Hua Chen, Wei Zhang, Wenjun Zeng

    Abstract: Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 19 pages, 8 figures

  44. arXiv:2506.20388  [pdf

    cs.CV

    A Novel Large Vision Foundation Model (LVFM)-based Approach for Generating High-Resolution Canopy Height Maps in Plantations for Precision Forestry Management

    Authors: Shen Tan, Xin Zhang, Liangxiu Han, Huaguo Huang, Han Wang

    Abstract: Accurate, cost-effective monitoring of plantation aboveground biomass (AGB) is crucial for supporting local livelihoods and carbon sequestration initiatives like the China Certified Emission Reduction (CCER) program. High-resolution canopy height maps (CHMs) are essential for this, but standard lidar-based methods are expensive. While deep learning with RGB imagery offers an alternative, accuratel… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  45. arXiv:2506.20101  [pdf, ps, other

    cs.CR

    Secure Multi-Key Homomorphic Encryption with Application to Privacy-Preserving Federated Learning

    Authors: Jiahui Wu, Tiecheng Sun, Fucai Luo, Haiyan Wang, Weizhe Zhang

    Abstract: Multi-Key Homomorphic Encryption (MKHE), proposed by Lopez-Alt et al. (STOC 2012), allows for performing arithmetic computations directly on ciphertexts encrypted under distinct keys. Subsequent works by Chen and Dai et al. (CCS 2019) and Kim and Song et al. (CCS 2023) extended this concept by proposing multi-key BFV/CKKS variants, referred to as the CDKS scheme. These variants incorporate asympto… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  46. arXiv:2506.19816  [pdf, ps, other

    cs.RO cs.CV

    CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

    Authors: Hao Li, Shuai Yang, Yilun Chen, Yang Tian, Xiaoda Yang, Xinyi Chen, Hanqing Wang, Tai Wang, Feng Zhao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong generalization across manipulation tasks. However, they remain constrained by a single-frame observation paradigm and cannot fully benefit from the motion information offered by aggregated multi-frame historical observations, as the large vision-language backbone introduces substan… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 36 pages, 21 figures

  47. arXiv:2506.19469  [pdf, ps, other

    cs.CV cs.AI

    Surgery-R1: Advancing Surgical-VQLA with Reasoning Multimodal Large Language Model via Reinforcement Learning

    Authors: Pengfei Hao, Shuaibo Li, Hongqiu Wang, Zhizhuo Kou, Junhang Zhang, Guang Yang, Lei Zhu

    Abstract: In recent years, significant progress has been made in the field of surgical scene understanding, particularly in the task of Visual Question Localized-Answering in robotic surgery (Surgical-VQLA). However, existing Surgical-VQLA models lack deep reasoning capabilities and interpretability in surgical scenes, which limits their reliability and potential for development in clinical applications. To… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  48. arXiv:2506.19246  [pdf

    cs.LG

    Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning

    Authors: Renzi Meng, Heyi Wang, Yumeng Sun, Qiyuan Wu, Lian Lian, Renhan Zhang

    Abstract: This paper addresses the increasingly prominent problem of anomaly detection in distributed systems. It proposes a detection method based on federated contrastive learning. The goal is to overcome the limitations of traditional centralized approaches in terms of data privacy, node heterogeneity, and anomaly pattern recognition. The proposed method combines the distributed collaborative modeling ca… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  49. arXiv:2506.18898  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.MM

    Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

    Authors: Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang

    Abstract: This paper presents a multimodal framework that attempts to unify visual understanding and generation within a shared discrete semantic representation. At its core is the Text-Aligned Tokenizer (TA-Tok), which converts images into discrete tokens using a text-aligned codebook projected from a large language model's (LLM) vocabulary. By integrating vision and text into a unified space with an expan… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project page: https://tar.csuhan.com

  50. arXiv:2506.18727  [pdf, other

    cs.HC cs.SE

    AutoGraph: A Knowledge-Graph Framework for Modeling Interface Interaction and Automating Procedure Execution in Digital Nuclear Control Rooms

    Authors: Xingyu Xiao, Jiejuan Tong, Jun Sun, Zhe Sui, Jingang Liang, Hongru Zhao, Jun Zhao, Haitao Wang

    Abstract: Digitalization in nuclear power plant (NPP) control rooms is reshaping how operators interact with procedures and interface elements. However, existing computer-based procedures (CBPs) often lack semantic integration with human-system interfaces (HSIs), limiting their capacity to support intelligent automation and increasing the risk of human error, particularly under dynamic or complex operating… ▽ More

    Submitted 26 May, 2025; originally announced June 2025.