Skip to main content

Showing 1–50 of 3,506 results for author: Liu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10075  [pdf, ps, other

    cs.RO cs.CV

    FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

    Authors: Jun Guo, Xiaojian Ma, Yikai Wang, Min Yang, Huaping Liu, Qing Li

    Abstract: This paper investigates training better visual world models for robot manipulation, i.e., models that can predict future visual observations by conditioning on past frames and robot actions. Specifically, we consider world models that operate on RGB-D frames (RGB-D world models). As opposed to canonical approaches that handle dynamics prediction mostly implicitly and reconcile it with visual rende… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Project page: see https://sharinka0715.github.io/FlowDreamer/

  2. arXiv:2505.09862  [pdf, ps, other

    cs.HC cs.CY

    Rhetorical XAI: Explaining AI's Benefits as well as its Use via Rhetorical Design

    Authors: Houjiang Liu, Yiheng Su, Matthew Lease

    Abstract: This paper explores potential benefits of incorporating Rhetorical Design into the design of Explainable Artificial Intelligence (XAI) systems. While XAI is traditionally framed around explaining individual predictions or overall system behavior, explanations also function as a form of argumentation, shaping how users evaluate system perceived usefulness, credibility, and foster appropriate trust.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.09393  [pdf, ps, other

    cs.GR cs.AI cs.CV

    UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units

    Authors: Huakun Liu, Hiroki Ota, Xin Wei, Yutaro Hirao, Monica Perusquia-Hernandez, Hideaki Uchiyama, Kiyoshi Kiyokawa

    Abstract: Sparse wearable inertial measurement units (IMUs) have gained popularity for estimating 3D human motion. However, challenges such as pose ambiguity, data drift, and limited adaptability to diverse bodies persist. To address these issues, we propose UMotion, an uncertainty-driven, online fusing-all state estimation framework for 3D human shape and pose estimation, supported by six integrated, body-… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  4. arXiv:2505.08693  [pdf, other

    eess.IV cs.CV

    VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation

    Authors: Badhan Kumar Das, Ajay Singh, Gengyan Zhao, Han Liu, Thomas J. Re, Dorin Comaniciu, Eli Gibson, Andreas Maier

    Abstract: Self-supervised pretrain techniques have been widely used to improve the downstream tasks' performance. However, real-world magnetic resonance (MR) studies usually consist of different sets of contrasts due to different acquisition protocols, which poses challenges for the current deep learning methods on large-scale pretrain and different downstream tasks with different input requirements, since… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 9 pages

  5. arXiv:2505.08672  [pdf, other

    cs.CY

    How Students Use AI Feedback Matters: Experimental Evidence on Physics Achievement and Autonomy

    Authors: Xusheng Dai, Zhaochun Wen, Jianxiao Jiang, Huiqin Liu, Yu Zhang

    Abstract: Despite the precision and adaptiveness of generative AI (GAI)-powered feedback provided to students, existing practice and literature might ignore how usage patterns impact student learning. This study examines the heterogeneous effects of GAI-powered personalized feedback on high school students' physics achievement and autonomy through two randomized controlled trials, with a major focus on usag… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.08581  [pdf, other

    cs.CV eess.IV q-bio.TO

    ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

    Authors: Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin

    Abstract: Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicabil… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Early accepted by MICCAI 2025

  7. arXiv:2505.08523  [pdf, ps, other

    cs.IT eess.SP

    Dual-UAV-Enabled Secure Communication and Sensing for A2G-ISAC Systems with Maneuverable Jamming

    Authors: Libiao Lou, Yuan Liu, Fotis Foukalas, Hongjiang Lei, Gaofeng Pan, Theodoros A. Tsiftsis, Hongwu Liu

    Abstract: In this paper, we propose a dual-unmanned aerial vehicle (UAV)-enabled secure communication and sensing (SCS) scheme for an air-to-ground integrated sensing and communication (ISAC) system, in which a dual-functional source UAV and jamming UAV collaborate to enhance both the secure communication and target sensing performance. From a perspective of hybrid monostatitc-bistatic radar, the jamming UA… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 13 pages, submitted to IEEE Journal

  8. arXiv:2505.08446  [pdf, ps, other

    cs.AI

    Agent-as-a-Service based on Agent Network

    Authors: Yuhan Zhu, Haojie Liu, Jian Wang, Bing Li, Zikang Yin, Yefei Liao

    Abstract: The rise of large model-based AI agents has spurred interest in Multi-Agent Systems (MAS) for their capabilities in decision-making, collaboration, and adaptability. While the Model Context Protocol (MCP) addresses tool invocation and data exchange challenges via a unified protocol, it lacks support for organizing agent-level collaboration. To bridge this gap, we propose Agent-as-a-Service based o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: work in progress

  9. arXiv:2505.08265  [pdf, other

    cs.LG cs.AI

    LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification

    Authors: Hang Gao, Wenxuan Huang, Fengge Wu, Junsuo Zhao, Changwen Zheng, Huaping Liu

    Abstract: The use of large language models (LLMs) as feature enhancers to optimize node representations, which are then used as inputs for graph neural networks (GNNs), has shown significant potential in graph representation learning. However, the fundamental properties of this approach remain underexplored. To address this issue, we propose conducting a more in-depth analysis of this issue based on the int… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  10. arXiv:2505.08159  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Enhancing the Efficiency of Complex Systems Crystal Structure Prediction by Active Learning Guided Machine Learning Potential

    Authors: Jiaxiang Li, Junwei Feng, Jie Luo, Bowen Jiang, Xiangyu Zheng, Jian Lv, Keith Butler, Hanyu Liu, Congwei Xie, Yu Xie, Yanming Ma

    Abstract: Understanding multicomponent complex material systems is essential for design of advanced materials for a wide range of technological applications. While state-of-the-art crystal structure prediction (CSP) methods effectively identify new structures and assess phase stability, they face fundamental limitations when applied to complex systems. This challenge stems from the combinatorial explosion o… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  11. arXiv:2505.07852  [pdf, other

    cs.CL cs.AI cs.LG

    Joint Detection of Fraud and Concept Drift inOnline Conversations with LLM-Assisted Judgment

    Authors: Ali Senol, Garima Agrawal, Huan Liu

    Abstract: Detecting fake interactions in digital communication platforms remains a challenging and insufficiently addressed problem. These interactions may appear as harmless spam or escalate into sophisticated scam attempts, making it difficult to flag malicious intent early. Traditional detection methods often rely on static anomaly detection techniques that fail to adapt to dynamic conversational shifts.… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  12. arXiv:2505.07815  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models

    Authors: Seungjae Lee, Daniel Ekpo, Haowen Liu, Furong Huang, Abhinav Shrivastava, Jia-Bin Huang

    Abstract: Exploration is essential for general-purpose robotic learning, especially in open-ended environments where dense rewards, explicit goals, or task-specific supervision are scarce. Vision-language models (VLMs), with their semantic reasoning over objects, spatial relations, and potential outcomes, present a compelling foundation for generating high-level exploratory behaviors. However, their outputs… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project webpage: https://ive-robot.github.io/

  13. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. arXiv:2505.07209  [pdf, other

    cs.CV

    Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

    Authors: Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, Hongwei Liu

    Abstract: Concept Bottleneck Models (CBMs) try to make the decision-making process transparent by exploring an intermediate concept space between the input image and the output prediction. Existing CBMs just learn coarse-grained relations between the whole image and the concepts, less considering local image information, leading to two main drawbacks: i) they often produce spurious visual-concept relations,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  15. arXiv:2505.07202  [pdf, ps, other

    cs.CL cs.SD eess.AS

    On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud

    Authors: Hyouin Liu, Zhikuan Zhang

    Abstract: Modern TTS systems designed for conversations achieve high-quality utterances but often remain inaccessible publicly. Are existing open-source architectures inadequate, or are current training techniques insufficient? This paper investigates prominent models and their underlying behaviors regarding conversational context. Using 20 GPU-hours on an NVIDIA H100, we empirically examine two approaches:… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  16. arXiv:2505.06321  [pdf, other

    cs.LG cs.AI

    Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning

    Authors: Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various domains. However, they still face significant challenges, including high computational costs for training and limitations in solving complex reasoning problems. Although existing methods have extended the reasoning capabilities of LLMs through structured paradigms, these approaches often rely on task-specific prompts and… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  17. arXiv:2505.05639  [pdf, other

    cs.CG cs.GR

    Designing 3D Anisotropic Frame Fields with Odeco Tensors

    Authors: Haikuan Zhu, Hongbo Li, Hsueh-Ti Derek Liu, Wenping Wang, Jing Hua, Zichun Zhong

    Abstract: This paper introduces a method to synthesize a 3D tensor field within a constrained geometric domain represented as a tetrahedral mesh. Whereas previous techniques optimize for isotropic fields, we focus on anisotropic tensor fields that are smooth and aligned with the domain boundary or user guidance. The key ingredient of our method is a novel computational design framework, built on top of the… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: it was accepted by TOG

  18. arXiv:2505.05185  [pdf, other

    cs.DS

    Efficient Parallel Ising Samplers via Localization Schemes

    Authors: Xiaoyu Chen, Hongyang Liu, Yitong Yin, Xinyuan Zhang

    Abstract: We introduce efficient parallel algorithms for sampling from the Gibbs distribution and estimating the partition function of Ising models. These algorithms achieve parallel efficiency, with polylogarithmic depth and polynomial total work, and are applicable to Ising models in the following regimes: (1) Ferromagnetic Ising models with external fields; (2) Ising models with interaction matrix $J$ of… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  19. arXiv:2505.05101  [pdf, other

    cs.CV

    MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models

    Authors: Hongyang Zhu, Haipeng Liu, Bo Fu, Yang Wang

    Abstract: Multi-object editing aims to modify multiple objects or regions in complex scenes while preserving structural coherence. This task faces significant challenges in scenarios involving overlapping or interacting objects: (1) Inaccurate localization of target objects due to attention misalignment, leading to incomplete or misplaced edits; (2) Attribute-object mismatch, where color or texture changes… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages, 7 figures

  20. arXiv:2505.05081  [pdf, other

    cs.CV

    PIDiff: Image Customization for Personalized Identities with Diffusion Models

    Authors: Jinyu Gu, Haipeng Liu, Meng Wang, Yang Wang

    Abstract: Text-to-image generation for personalized identities aims at incorporating the specific identity into images using a text prompt and an identity image. Based on the powerful generative capabilities of DDPMs, many previous works adopt additional prompts, such as text embeddings and CLIP image embeddings, to represent the identity information, while they fail to disentangle the identity information… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages, 11 figures

  21. arXiv:2505.04656  [pdf, other

    cs.GR

    MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation

    Authors: Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, Huaping Liu

    Abstract: In this paper, we introduce MeshGen, an advanced image-to-3D pipeline that generates high-quality 3D meshes with detailed geometry and physically based rendering (PBR) textures. Addressing the challenges faced by existing 3D native diffusion models, such as suboptimal auto-encoder performance, limited controllability, poor generalization, and inconsistent image-based PBR texturing, MeshGen employs… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: To appear at CVPR 2025 with highlight

  22. arXiv:2505.04369  [pdf, other

    cs.CV

    WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image Dehazing

    Authors: Jie Sun, Heng Liu, Yongzhen Wang, Xiao-Ping Zhang, Mingqiang Wei

    Abstract: In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail en… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  23. arXiv:2505.04193  [pdf, other

    cs.LG cs.RO stat.ML

    Trajectory Entropy Reinforcement Learning for Predictable and Robust Control

    Authors: Bang You, Chenxu Wang, Huaping Liu

    Abstract: Simplicity is a critical inductive bias for designing data-driven controllers, especially when robustness is important. Despite the impressive results of deep reinforcement learning in complex control tasks, it is prone to capturing intricate and spurious correlations between observations and actions, leading to failure under slight perturbations to the environment. To tackle this problem, in this… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages

  24. arXiv:2505.03178  [pdf, other

    cs.LG cs.RO

    RADE: Learning Risk-Adjustable Driving Environment via Multi-Agent Conditional Diffusion

    Authors: Jiawei Wang, Xintao Yan, Yao Mu, Haowei Sun, Zhong Cao, Henry X. Liu

    Abstract: Generating safety-critical scenarios in high-fidelity simulations offers a promising and cost-effective approach for efficient testing of autonomous vehicles. Existing methods typically rely on manipulating a single vehicle's trajectory through sophisticated designed objectives to induce adversarial interactions, often at the cost of realism and scalability. In this work, we propose the Risk-Adjus… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  25. arXiv:2505.03116  [pdf, ps, other

    cs.CV

    TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion

    Authors: Haoyue Liu, Jinghan Xu, Yi Chang, Hanyu Zhou, Haozhi Zhao, Lin Wang, Luxin Yan

    Abstract: Video frame interpolation (VFI) that leverages the bio-inspired event cameras as guidance has recently shown better performance and memory efficiency than the frame-based methods, thanks to the event cameras' advantages, such as high temporal resolution. A hurdle for event-based VFI is how to effectively deal with non-linear motion, caused by the dynamic changes in motion direction and speed withi… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  26. arXiv:2505.02284  [pdf, other

    cs.DB

    Conformal Prediction for Verifiable Learned Query Optimization

    Authors: Hanwen Liu, Shashank Giridhara, Ibrahim Sabek

    Abstract: Query optimization is critical in relational databases. Recently, numerous Learned Query Optimizers (LQOs) have been proposed, demonstrating superior performance over traditional hand-crafted query optimizers after short training periods. However, the opacity and instability of machine learning models have limited their practical applications. To address this issue, we are the first to formulate t… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: To appear at VLDB 2025 (https://vldb.org/2025/)

  27. arXiv:2505.02078  [pdf, ps, other

    cs.CL cs.AI

    LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning

    Authors: Joy Lim Jia Yin, Daniel Zhang-Li, Jifan Yu, Haoxuan Li, Shangqing Tu, Yuanchun Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li, Bin Xu

    Abstract: Evaluating the quality of slide-based multimedia instruction is challenging. Existing methods like manual assessment, reference-based metrics, and large language model evaluators face limitations in scalability, context capture, or bias. In this paper, we introduce LecEval, an automated metric grounded in Mayer's Cognitive Theory of Multimedia Learning, to evaluate multimodal knowledge acquisition… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures

  28. arXiv:2505.01583  [pdf, ps, other

    cs.CV cs.AI

    TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

    Authors: Jen-Hao Cheng, Vivian Wang, Huayu Wang, Huapeng Zhou, Yi-Hao Peng, Hou-I Liu, Hsiang-Wei Huang, Kuang-Ming Chen, Cheng-Yen Yang, Wenhao Chai, Yi-Ling Chen, Vibhav Vineet, Qin Cai, Jenq-Neng Hwang

    Abstract: Understanding causal event relationships and achieving fine-grained temporal grounding in videos remain challenging for vision-language models. Existing methods either compress video tokens to reduce temporal resolution, or treat videos as unsegmented streams, which obscures fine-grained event boundaries and limits the modeling of causal dependencies. We propose TEMPURA (Temporal Event Masked Pred… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  29. arXiv:2505.00969  [pdf, other

    cs.RO

    Real-time Two-tape Control System in Vine robots

    Authors: Hanmo Liu, Kayleen Smith, Zimu Yang, Mark Yim

    Abstract: This paper focuses on how to make a growing Vine robot steer in different directions with a novel approach to real-time steering control by autonomously applying adhesive tape to induce a surface wrinkles. This enabling real-time directional control with arbitrary many turns while maintaining the robot's soft structure. This system feeds growing material external to the tube. The design achieves f… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 6 pages 8 figures; submitted to IROS2025

  30. arXiv:2505.00598  [pdf, ps, other

    cs.LG cs.AI

    Fast and Low-Cost Genomic Foundation Models via Outlier Removal

    Authors: Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu

    Abstract: To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model with strong compression performance and fast adaptability. GERM improves upon models like DNABERT-2 by eliminating outliers that hinder low-rank adaptation and post-training quantization, enhancing both efficiency and robustness. We replace the vanilla attention layer with… ▽ More

    Submitted 2 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: International Conference on Machine Learning (ICML) 2025

  31. arXiv:2505.00284  [pdf, ps, other

    cs.RO cs.AI

    LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving

    Authors: Zhijie Qiao, Haowei Li, Zhong Cao, Henry X. Liu

    Abstract: Vision-Language Models (VLMs) have demonstrated significant potential for end-to-end autonomous driving. However, fully exploiting their capabilities for safe and reliable vehicle control remains an open research challenge. To systematically examine advances and limitations of VLMs in driving tasks, we introduce LightEMMA, a Lightweight End-to-End Multimodal Model for Autonomous driving. LightEMMA… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  32. arXiv:2504.21614  [pdf, other

    cs.CV

    Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection

    Authors: Daniel Bogdoll, Rajanikant Patnaik Ananta, Abeyankar Giridharan, Isabel Moore, Gregory Stevens, Henry X. Liu

    Abstract: With an ever-increasing availability of data, it has become more and more challenging to select and label appropriate samples for the training of machine learning models. It is especially difficult to detect long-tail classes of interest in large amounts of unlabeled data. This holds especially true for Intelligent Transportation Systems (ITS), where vehicle fleets and roadside perception systems… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  33. arXiv:2504.21071  [pdf

    cs.RO

    Automated Parking Trajectory Generation Using Deep Reinforcement Learning

    Authors: Zheyu Zhang, Yutong Luo, Yongzhou Chen, Haopeng Zhao, Zhichao Ma, Hao Liu

    Abstract: Autonomous parking is a key technology in modern autonomous driving systems, requiring high precision, strong adaptability, and efficiency in complex environments. This paper proposes a Deep Reinforcement Learning (DRL) framework based on the Soft Actor-Critic (SAC) algorithm to optimize autonomous parking tasks. SAC, an off-policy method with entropy regularization, is particularly well-suited fo… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  34. arXiv:2504.20848  [pdf, other

    cs.LG cs.AI cs.CR

    Mitigating the Structural Bias in Graph Adversarial Defenses

    Authors: Junyuan Fang, Huimin Liu, Han Yang, Jiajing Wu, Zibin Zheng, Chi K. Tse

    Abstract: In recent years, graph neural networks (GNNs) have shown great potential in addressing various graph structure-related downstream tasks. However, recent studies have found that current GNNs are susceptible to malicious adversarial attacks. Given the inevitable presence of adversarial attacks in the real world, a variety of defense methods have been proposed to counter these attacks and enhance the… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Under Review

  35. arXiv:2504.20835  [pdf, other

    cs.SD eess.AS

    Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning

    Authors: Hongfei Xue, Yufeng Tang, Hexin Liu, Jun Zhang, Xuelong Geng, Lei Xie

    Abstract: Large language models have been extended to the speech domain, leading to the development of speech large language models (SLLMs). While existing SLLMs demonstrate strong performance in speech instruction-following for core languages (e.g., English), they often struggle with non-core languages due to the scarcity of paired speech-text data and limited multilingual semantic reasoning capabilities.… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 10 pages, 6 figures, Submitted to ACM MM 2025

  36. arXiv:2504.20403  [pdf, other

    cs.GR cs.CV

    Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting

    Authors: Hanxi Liu, Yifang Men, Zhouhui Lian

    Abstract: Personalized 3D avatar editing holds significant promise due to its user-friendliness and availability to applications such as AR/VR and virtual try-ons. Previous studies have explored the feasibility of 3D editing, but often struggle to generate visually pleasing results, possibly due to the unstable representation learning under mixed optimization of geometry and texture in complicated reconstru… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  37. arXiv:2504.19901  [pdf, other

    cs.LG cs.AI stat.ML

    Attention Mechanism, Max-Affine Partition, and Universal Approximation

    Authors: Hude Liu, Jerry Yao-Chieh Hu, Zhao Song, Han Liu

    Abstract: We establish the universal approximation capability of single-layer, single-head self- and cross-attention mechanisms with minimal attached structures. Our key insight is to interpret single-head attention as an input domain-partition mechanism that assigns distinct values to subregions. This allows us to engineer the attention weights such that this assignment imitates the target function. Buildi… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  38. arXiv:2504.18837  [pdf, other

    cs.SI

    Sentiment and Social Signals in the Climate Crisis: A Survey on Analyzing Social Media Responses to Extreme Weather Events

    Authors: Pouya Shaeri, Yasaman Mohammadpour, Alimohammad Beigi, Ariane Middel, Huan Liu

    Abstract: Extreme weather events driven by climate change, such as wildfires, floods, and heatwaves, prompt significant public reactions on social media platforms. Analyzing the sentiment expressed in these online discussions can offer valuable insights into public perception, inform policy decisions, and enhance emergency responses. Although sentiment analysis has been widely studied in various fields, its… ▽ More

    Submitted 7 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

    Comments: 13 Pages, 1 figure, Under review for a computer science conference

  39. arXiv:2504.18768  [pdf, other

    cs.GR cs.CV

    TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians

    Authors: Letian Huang, Dongwei Ye, Jialin Dan, Chengzhi Tao, Huiwen Liu, Kun Zhou, Bo Ren, Yuanqi Li, Yanwen Guo, Jie Guo

    Abstract: The emergence of neural and Gaussian-based radiance field methods has led to considerable advancements in novel view synthesis and 3D object reconstruction. Nonetheless, specular reflection and refraction continue to pose significant challenges due to the instability and incorrect overfitting of radiance fields to high-frequency light variations. Currently, even 3D Gaussian Splatting (3D-GS), as a… ▽ More

    Submitted 1 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: accepted by SIGGRAPH 2025; https://letianhuang.github.io/transparentgs/

  40. arXiv:2504.18158  [pdf, other

    cs.CV

    E-InMeMo: Enhanced Prompting for Visual In-Context Learning

    Authors: Jiahao Zhang, Bowen Wang, Hong Liu, Liangzhi Li, Yuta Nakashima, Hajime Nagahara

    Abstract: Large-scale models trained on extensive datasets have become the standard due to their strong generalizability across diverse tasks. In-context learning (ICL), widely used in natural language processing, leverages these models by providing task-specific prompts without modifying their parameters. This paradigm is increasingly being adapted for computer vision, where models receive an input-output… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Preprint

  41. arXiv:2504.16948  [pdf, ps, other

    cs.CY cs.AI cs.ET

    Intrinsic Barriers to Explaining Deep Foundation Models

    Authors: Zhen Tan, Huan Liu

    Abstract: Deep Foundation Models (DFMs) offer unprecedented capabilities but their increasing complexity presents profound challenges to understanding their internal workings-a critical need for ensuring trust, safety, and accountability. As we grapple with explaining these systems, a fundamental question emerges: Are the difficulties we face merely temporary hurdles, awaiting more sophisticated analytical… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  42. arXiv:2504.16142  [pdf

    eess.SP cs.AI cs.LG

    A Non-Invasive Load Monitoring Method for Edge Computing Based on MobileNetV3 and Dynamic Time Regulation

    Authors: Hangxu Liu, Yaojie Sun, Yu Wang

    Abstract: In recent years, non-intrusive load monitoring (NILM) technology has attracted much attention in the related research field by virtue of its unique advantage of utilizing single meter data to achieve accurate decomposition of device-level energy consumption. Cutting-edge methods based on machine learning and deep learning have achieved remarkable results in load decomposition accuracy by fusing ti… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  43. arXiv:2504.16037  [pdf, other

    cs.RO eess.SY

    Adaptive Fault-tolerant Control of Underwater Vehicles with Thruster Failures

    Authors: Haolin Liu, Shiliang Zhang, Shangbin Jiao, Xiaohui Zhang, Xuehui Ma, Yan Yan, Wenchuan Cui, Youmin Zhang

    Abstract: This paper presents a fault-tolerant control for the trajectory tracking of autonomous underwater vehicles (AUVs) against thruster failures. We formulate faults in AUV thrusters as discrete switching events during a UAV mission, and develop a soft-switching approach in facilitating shift of control strategies across fault scenarios. We mathematically define AUV thruster fault scenarios, and develo… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  44. arXiv:2504.15956  [pdf, other

    cs.LG cs.AI stat.ML

    Universal Approximation with Softmax Attention

    Authors: Jerry Yao-Chieh Hu, Hude Liu, Hong-Yu Chen, Weimin Wu, Han Liu

    Abstract: We prove that with linear transformations, both (i) two-layer self-attention and (ii) one-layer self-attention followed by a softmax function are universal approximators for continuous sequence-to-sequence functions on compact domains. Our main technique is a new interpolation-based method for analyzing attention's internal mechanism. This leads to our key insight: self-attention is able to approx… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  45. arXiv:2504.15920  [pdf, other

    cs.LG

    ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion

    Authors: Xiang Li, Haobing Liu, Jianpeng Qi, Yuan Cao, Guoqing Chao, Yanwei Yu

    Abstract: Graph Neural Networks (GNNs) have demonstrated strong performance across various graph-based tasks by effectively capturing relational information between nodes. These models rely on iterative message passing to propagate node features, enabling nodes to aggregate information from their neighbors. Recent research has significantly improved the message-passing mechanism, enhancing GNN scalability o… ▽ More

    Submitted 24 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  46. arXiv:2504.15612  [pdf, other

    cs.CV

    HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image Classification

    Authors: Hongxing Peng, Kang Lin, Huanai Liu

    Abstract: Hyperspectral image (HSI) classification has been one of the hot topics in remote sensing fields. Recently, the Mamba architecture based on selective state-space models (S6) has demonstrated great advantages in long sequence modeling. However, the unique properties of hyperspectral data, such as high dimensionality and feature inlining, pose challenges to the application of Mamba to HSI classifica… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  47. arXiv:2504.15575  [pdf, other

    eess.AS cs.SD

    Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows

    Authors: Haohe Liu, Thomas Deacon, Wenwu Wang, Matt Paradis, Mark D. Plumbley

    Abstract: Locating the right sound effect efficiently is an important yet challenging topic for audio production. Most current sound-searching systems rely on pre-annotated audio labels created by humans, which can be time-consuming to produce and prone to inaccuracies, limiting the efficiency of audio production. Following the recent advancement of contrastive language-audio pre-training (CLAP) models, we… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  48. arXiv:2504.15524  [pdf, other

    cs.CL cs.AI

    IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

    Authors: Qiyao Wang, Guhong Chen, Hongbo Wang, Huaren Liu, Minghui Zhu, Zhifei Qin, Linwei Li, Yilin Yue, Shiqiang Wang, Jiayan Li, Yihang Wu, Ziqiang Liu, Longze Chen, Run Luo, Liyang Fan, Jiaming Li, Lei Zhang, Kan Xu, Hongfei Lin, Hamid Alinejad-Rokny, Shiwen Ni, Yuan Lin, Min Yang

    Abstract: Intellectual Property (IP) is a unique domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. As large language models (LLMs) continue to advance, they show great potential for processing IP tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks either focus narrowl… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 89 pages, 75 figures, 55 tables

  49. arXiv:2504.15474  [pdf, other

    cs.SE

    Agent for User: Testing Multi-User Interactive Features in TikTok

    Authors: Sidong Feng, Changhao Du, Huaxiao Liu, Qingnan Wang, Zhengwei Lv, Gang Huo, Xu Yang, Chunyang Chen

    Abstract: TikTok, a widely-used social media app boasting over a billion monthly active users, requires effective app quality assurance for its intricate features. Feature testing is crucial in achieving this goal. However, the multi-user interactive features within the app, such as live streaming, voice calls, etc., pose significant challenges for developers, who must handle simultaneous device management… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted to ICSE 2025 Industry paper

  50. arXiv:2504.15171  [pdf, other

    cs.LG

    Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture

    Authors: Meng Cui, Xianghu Yue, Xinyuan Qian, Jinzheng Zhao, Haohe Liu, Xubo Liu, Daoliang Li, Wenwu Wang

    Abstract: Fish Feeding Intensity Assessment (FFIA) is crucial in industrial aquaculture management. Recent multi-modal approaches have shown promise in improving FFIA robustness and efficiency. However, these methods face significant challenges when adapting to new fish species or environments due to catastrophic forgetting and the lack of suitable datasets. To address these limitations, we first introduce… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.