Skip to main content

Showing 1–50 of 2,989 results for author: Chen, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10495  [pdf, ps, other

    cs.LG cs.CL

    RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs

    Authors: Vibha Belavadi, Tushar Vatsa, Dewang Sultania, Suhas Suresha, Ishita Verma, Cheng Chen, Tracy Holloway King, Michael Friedrich

    Abstract: This paper addresses fine-tuning Large Language Models (LLMs) for function calling tasks when real user interaction data is unavailable. In digital content creation tools, where users express their needs through natural language queries that must be mapped to API calls, the lack of real-world task-specific data and privacy constraints for training on it necessitate synthetic data generation. Exist… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing

    Journal ref: https://aclanthology.org/2025.knowledgenlp-1.10/ KnowledgeNLP 2025

  2. arXiv:2505.10359  [pdf, other

    cs.RO cs.CV

    NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning

    Authors: Le Shi, Yifei Shi, Xin Xu, Tenglong Liu, Junhua Xi, Chengyuan Chen

    Abstract: Recent advances in deep generative models demonstrate unprecedented zero-shot generalization capabilities, offering great potential for robot manipulation in unstructured environments. Given a partial observation of a scene, deep generative models could generate the unseen regions and therefore provide more context, which enhances the capability of robots to generalize across unseen environments.… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.10250  [pdf, other

    cs.CV

    ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization

    Authors: Wenhao Shen, Wanqi Yin, Xiaofeng Yang, Cheng Chen, Chaoyue Song, Zhongang Cai, Lei Yang, Hao Wang, Guosheng Lin

    Abstract: Human mesh recovery (HMR) from a single image is inherently ill-posed due to depth ambiguity and occlusions. Probabilistic methods have tried to solve this by generating numerous plausible 3D human mesh predictions, but they often exhibit misalignment with 2D image observations and weak robustness to in-the-wild images. To address these issues, we propose ADHMR, a framework that Aligns a Diffusion… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025. Code: https://github.com/shenwenhao01/ADHMR

  4. arXiv:2505.09979  [pdf, ps, other

    cs.RO

    Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots

    Authors: Huiqiao Fu, Haoyu Dong, Wentao Xu, Zhehao Zhou, Guizhou Deng, Kaiqiang Tang, Daoyi Dong, Chunlin Chen

    Abstract: Achieving animal-like agility is a longstanding goal in quadrupedal robotics. While recent studies have successfully demonstrated imitation of specific behaviors, enabling robots to replicate a broader range of natural behaviors in real-world environments remains an open challenge. Here we propose an integrated controller comprising a Basic Behavior Controller (BBC) and a Task-Specific Controller… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  5. arXiv:2505.09978  [pdf, ps, other

    cs.IT eess.SP

    Low-Complexity Decoding for Low-Rate Block Codes of Short Length Based on Concatenated Coding Structure

    Authors: Mao-Chao Lin, Shih-Kai Lee, Pin Lin, Ching-Chang Lin, Chia-Chun Chen, Teng-Yuan Syu, Huang-Chang Lee

    Abstract: To decode a short linear block code, ordered statics decoding (OSD) and/or the $A^*$ decoding are usually considered. Either OSD or the $A^*$ decoding utilizes the magnitudes of the received symbols to establish the most reliable and independent positions (MRIP) frame. A restricted searched space can be employed to achieve near-optimum decoding with reduced decoding complexity. For a low-rate code… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  6. arXiv:2505.09965  [pdf, ps, other

    cs.CV

    MambaControl: Anatomy Graph-Enhanced Mamba ControlNet with Fourier Refinement for Diffusion-Based Disease Trajectory Prediction

    Authors: Hao Yang, Tao Tan, Shuai Tan, Weiqin Yang, Kunyan Cai, Calvin Chen, Yue Sun

    Abstract: Modelling disease progression in precision medicine requires capturing complex spatio-temporal dynamics while preserving anatomical integrity. Existing methods often struggle with longitudinal dependencies and structural consistency in progressive disorders. To address these limitations, we introduce MambaControl, a novel framework that integrates selective state-space modelling with diffusion pro… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  7. arXiv:2505.09205  [pdf, other

    cs.IR

    HMamba: Hyperbolic Mamba for Sequential Recommendation

    Authors: Qianru Zhang, Honggang Wen, Wei Yuan, Crystal Chen, Menglin Yang, Siu-Ming Yiu, Hongzhi Yin

    Abstract: Sequential recommendation systems have become a cornerstone of personalized services, adept at modeling the temporal evolution of user preferences by capturing dynamic interaction sequences. Existing approaches predominantly rely on traditional models, including RNNs and Transformers. Despite their success in local pattern recognition, Transformer-based methods suffer from quadratic computational… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  8. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.08608  [pdf

    q-bio.QM cs.LG

    Automated Model-Free Sorting of Single-Molecule Fluorescence Events Using a Deep Learning Based Hidden-State Model

    Authors: Wenqi Zeng, Shuqi Zhou, Yuan Yao, Chunlai Chen

    Abstract: Single-molecule fluorescence assays enable high-resolution analysis of biomolecular dynamics, but traditional analysis pipelines are labor-intensive and rely on users' experience, limiting scalability and reproducibility. Recent deep learning models have automated aspects of data processing, yet many still require manual thresholds, complex architectures, or extensive labeled data. Therefore, we p… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  10. arXiv:2505.08325  [pdf, other

    cs.LG cs.AI

    FedRS-Bench: Realistic Federated Learning Datasets and Benchmarks in Remote Sensing

    Authors: Haodong Zhao, Peng Peng, Chiyu Chen, Linqing Huang, Gongshen Liu

    Abstract: Remote sensing (RS) images are usually produced at an unprecedented scale, yet they are geographically and institutionally distributed, making centralized model training challenging due to data-sharing restrictions and privacy concerns. Federated learning (FL) offers a solution by enabling collaborative model training across decentralized RS data sources without exposing raw data. However, there l… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.07834  [pdf, other

    cs.NI cs.AI cs.CR cs.PL

    ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet

    Authors: Yuekang Li, Wei Song, Bangshuo Zhu, Dong Gong, Yi Liu, Gelei Deng, Chunyang Chen, Lei Ma, Jun Sun, Toby Walsh, Jingling Xue

    Abstract: We introduce ai.txt, a novel domain-specific language (DSL) designed to explicitly regulate interactions between AI models, agents, and web content, addressing critical limitations of the widely adopted robots.txt standard. As AI increasingly engages with online materials for tasks such as training, summarization, and content modification, existing regulatory methods lack the necessary granularity… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  12. KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation

    Authors: Ching Han Chen, Ming Fang Shiu

    Abstract: KAQG introduces a decisive breakthrough for Retrieval-Augmented Generation (RAG) by explicitly tackling the two chronic weaknesses of current pipelines: transparent multi-step reasoning and fine-grained cognitive difficulty control. This transforms RAG from a passive retriever into an accountable generator of calibrated exam items. Technically, the framework fuses knowledge graphs, RAG retrieval,… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  13. AgentFlow: Resilient Adaptive Cloud-Edge Framework for Multi-Agent Coordination

    Authors: Ching Han Chen, Ming Fang Shiu

    Abstract: This paper presents AgentFlow, a MAS-based framework for programmable distributed systems in heterogeneous cloud-edge environments. It introduces logistics objects and abstract agent interfaces to enable dynamic service flows and modular orchestration. AgentFlow supports decentralized publish-subscribe messaging and many-to-many service elections, enabling decision coordination without a central s… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 8 pages, 9 figures, 3 tables

  14. arXiv:2505.06928  [pdf, ps, other

    quant-ph cs.LG

    Unraveling Quantum Environments: Transformer-Assisted Learning in Lindblad Dynamics

    Authors: Chi-Sheng Chen, En-Jui Kuo

    Abstract: Understanding dissipation in open quantum systems is crucial for the development of robust quantum technologies. In this work, we introduce a Transformer-based machine learning framework to infer time-dependent dissipation rates in quantum systems governed by the Lindblad master equation. Our approach uses time series of observable quantities, such as expectation values of single Pauli operators,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  15. arXiv:2505.06897  [pdf, other

    cs.AI

    Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence

    Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong

    Abstract: The ultimate goal of artificial intelligence (AI) is to achieve Artificial General Intelligence (AGI). Embodied Artificial Intelligence (EAI), which involves intelligent systems with physical presence and real-time interaction with the environment, has emerged as a key research direction in pursuit of AGI. While advancements in deep learning, reinforcement learning, large-scale language models, an… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 19pages,7 figures,3 tables

  16. arXiv:2505.06538  [pdf, other

    cs.CL

    Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model

    Authors: Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi, Chi Chen, Kaiyu Huang

    Abstract: The rapid development of multimodal large reasoning models (MLRMs) has demonstrated broad application potential, yet their safety and reliability remain critical concerns that require systematic exploration. To address this gap, we conduct a comprehensive and systematic safety evaluation of 11 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models. More… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  17. arXiv:2505.05212  [pdf, other

    cs.CV

    HQC-NBV: A Hybrid Quantum-Classical View Planning Approach

    Authors: Xiaotong Yu, Chang Wen Chen

    Abstract: Efficient view planning is a fundamental challenge in computer vision and robotic perception, critical for tasks ranging from search and rescue operations to autonomous navigation. While classical approaches, including sampling-based and deterministic methods, have shown promise in planning camera viewpoints for scene exploration, they often struggle with computational scalability and solution opt… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  18. arXiv:2505.05151  [pdf, other

    quant-ph cs.LG

    Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning

    Authors: Chuangtao Chen, Qinglin Zhao, MengChu Zhou, Zhimin He, Haozhen Situ

    Abstract: This study explores quantum-enhanced discrete diffusion models to overcome classical limitations in learning high-dimensional distributions. We rigorously prove that classical discrete diffusion models, which calculate per-dimension transition probabilities to avoid exponential computational cost, exhibit worst-case linear scaling of Kullback-Leibler (KL) divergence with data dimension. To address… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Comments are welcome

  19. arXiv:2505.04922  [pdf, ps, other

    cs.CV

    Canny2Palm: Realistic and Controllable Palmprint Generation for Large-scale Pre-training

    Authors: Xingzeng Lan, Xing Duan, Chen Chen, Weiyu Lin, Bo Wang

    Abstract: Palmprint recognition is a secure and privacy-friendly method of biometric identification. One of the major challenges to improve palmprint recognition accuracy is the scarcity of palmprint data. Recently, a popular line of research revolves around the synthesis of virtual palmprints for large-scale pre-training purposes. In this paper, we propose a novel synthesis method named Canny2Palm that ext… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  20. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  21. arXiv:2505.04326  [pdf, ps, other

    cs.NI

    Design and Evaluation of an NDN-Based Network for Distributed Digital Twins

    Authors: Chen Chen, Zihan Jia, Ze Wang, Lin Cui, Fung Po Tso

    Abstract: Digital twins (DT) have received significant attention due to their numerous benefits, such as real-time data analytics and cost reduction in production. DT serves as a fundamental component of many applications, encompassing smart manufacturing, intelligent vehicles, and smart cities. By using Machine Learning (ML) and Artificial Intelligence (AI) techniques, DTs can efficiently facilitate decisi… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  22. arXiv:2505.03743  [pdf

    cs.CR math.NT quant-ph

    Implementation of Shor Algorithm: Factoring a 4096-Bit Integer Under Specific Constraints

    Authors: Abel C. H. Chen

    Abstract: In recent years, advancements in quantum chip technology, such as Willow, have contributed to reducing quantum computation error rates, potentially accelerating the practical adoption of quantum computing. As a result, the design of quantum algorithms suitable for real-world applications has become a crucial research direction. This study focuses on the implementation of Shor algorithm, aiming to… ▽ More

    Submitted 6 April, 2025; originally announced May 2025.

    Comments: in Chinese language

  23. arXiv:2505.03362  [pdf, other

    cs.CV

    3D Surface Reconstruction with Enhanced High-Frequency Details

    Authors: Shikun Zhang, Yiqun Wang, Cunjian Chen, Yong Li, Qiuhong Ke

    Abstract: Neural implicit 3D reconstruction can reproduce shapes without 3D supervision, and it learns the 3D scene through volume rendering methods and neural implicit representations. Current neural surface reconstruction methods tend to randomly sample the entire image, making it difficult to learn high-frequency details on the surface, and thus the reconstruction results tend to be too smooth. We design… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by Journal of Visual Communication and Image Representation

  24. Including Bloom Filters in Bottom-up Optimization

    Authors: Tim Zeyl, Qi Cheng, Reza Pournaghi, Jason Lam, Weicheng Wang, Calvin Wong, Chong Chen, Per-Ake Larson

    Abstract: Bloom filters are used in query processing to perform early data reduction and improve query performance. The optimal query plan may be different when Bloom filters are used, indicating the need for Bloom filter-aware query optimization. To date, Bloom filter-aware query optimization has only been incorporated in a top-down query optimizer and limited to snowflake queries. In this paper, we show h… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  25. arXiv:2505.02977  [pdf, ps, other

    cs.DC cs.DS math.NA

    Parallel GPU-Accelerated Randomized Construction of Approximate Cholesky Preconditioners

    Authors: Tianyu Liang, Chao Chen, Yotam Yaniv, Hengrui Luo, David Tench, Xiaoye S. Li, Aydin Buluc, James Demmel

    Abstract: We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as discretization of a partial differential equation, spectral graph partitioning, and learning problems on graphs. The preconditioner belongs to the family of incom… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  26. arXiv:2505.02922  [pdf, ps, other

    cs.LG

    RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

    Authors: Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

    Abstract: The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages

  27. arXiv:2505.02720  [pdf, other

    cs.CV

    A Rate-Quality Model for Learned Video Coding

    Authors: Sang NguyenQuang, Cheng-Wei Chen, Xiem HoangVan, Wen-Hsiao Peng

    Abstract: Learned video coding (LVC) has recently achieved superior coding performance. In this paper, we model the rate-quality (R-Q) relationship for learned video coding by a parametric function. We learn a neural network, termed RQNet, to characterize the relationship between the bitrate and quality level according to video content and coding context. The predicted (R,Q) results are further integrated w… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  28. arXiv:2505.02370  [pdf, other

    cs.CV cs.AI cs.LG

    SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

    Authors: Ming Li, Xin Gu, Fan Chen, Xiaoying Xing, Longyin Wen, Chen Chen, Sijie Zhu

    Abstract: Due to the challenges of manually collecting accurate editing data, existing datasets are typically constructed using various automated methods, leading to noisy supervision signals caused by the mismatch between editing instructions and original-edited image pairs. Recent efforts attempt to improve editing models through generating higher-quality edited images, pre-training on recognition tasks,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Code, Data and Models are available at: https://github.com/bytedance/SuperEdit

  29. arXiv:2505.01851  [pdf, other

    cs.CV

    Mitigating Group-Level Fairness Disparities in Federated Visual Language Models

    Authors: Chaomeng Chen, Zitong Yu, Junhao Dong, Sen Su, Linlin Shen, Shutao Xia, Xiaochun Cao

    Abstract: Visual language models (VLMs) have shown remarkable capabilities in multimodal tasks but face challenges in maintaining fairness across demographic groups, particularly when deployed in federated learning (FL) environments. This paper addresses the critical issue of group fairness in federated VLMs by introducing FVL-FP, a novel framework that combines FL with fair prompt tuning techniques. We foc… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  30. arXiv:2505.01288  [pdf, other

    cs.RO cs.AI

    ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow

    Authors: Changhe Chen, Quantao Yang, Xiaohao Xu, Nima Fazeli, Olov Andersson

    Abstract: One of the central challenges preventing robots from acquiring complex manipulation skills is the prohibitive cost of collecting large-scale robot demonstrations. In contrast, humans are able to learn efficiently by watching others interact with their environment. To bridge this gap, we introduce semantic action flow as a core intermediate representation capturing the essential spatio-temporal man… ▽ More

    Submitted 12 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

  31. arXiv:2504.21277  [pdf, other

    cs.AI

    Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

    Authors: Guanghao Zhou, Panjia Qiu, Cen Chen, Jie Wang, Zheming Yang, Jian Xu, Minghui Qiu

    Abstract: The integration of reinforcement learning (RL) into the reasoning capabilities of Multimodal Large Language Models (MLLMs) has rapidly emerged as a transformative research direction. While MLLMs significantly extend Large Language Models (LLMs) to handle diverse modalities such as vision, audio, and video, enabling robust reasoning across multimodal inputs remains a major challenge. This survey sy… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  32. arXiv:2504.19142  [pdf, other

    cs.DB cs.AI

    BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning

    Authors: Chenhao Xu, Chunyu Chen, Jinglin Peng, Jiannan Wang, Jun Gao

    Abstract: Most large enterprises build predefined data pipelines and execute them periodically to process operational data using SQL queries for various tasks. A key issue in minimizing the overall makespan of these pipelines is the efficient scheduling of concurrent queries within the pipelines. Existing tools mainly rely on simple heuristic rules due to the difficulty of expressing the complex features an… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Accepted by ICDE '25

  33. arXiv:2504.18990  [pdf, other

    cs.CR cs.SE

    Safety Interventions against Adversarial Patches in an Open-Source Driver Assistance System

    Authors: Cheng Chen, Grant Xiao, Daehyun Lee, Lishan Yang, Evgenia Smirni, Homa Alemzadeh, Xugui Zhou

    Abstract: Drivers are becoming increasingly reliant on advanced driver assistance systems (ADAS) as autonomous driving technology becomes more popular and developed with advanced safety features to enhance road safety. However, the increasing complexity of the ADAS makes autonomous vehicles (AVs) more exposed to attacks and accidental faults. In this paper, we evaluate the resilience of a widely used ADAS a… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: 10 pages, 6 figures, To appear in the 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2025)

  34. arXiv:2504.18564  [pdf, other

    cs.CR cs.AI

    DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization

    Authors: Xinzhe Huang, Kedong Xiu, Tianhang Zheng, Churui Zeng, Wangze Ni, Zhan Qiin, Kui Ren, Chun Chen

    Abstract: Recent research has focused on exploring the vulnerabilities of Large Language Models (LLMs), aiming to elicit harmful and/or sensitive content from LLMs. However, due to the insufficient research on dual-jailbreaking -- attacks targeting both LLMs and Guardrails, the effectiveness of existing attacks is limited when attempting to bypass safety-aligned LLMs shielded by guardrails. Therefore, in th… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 20 pages, 8 figures

  35. arXiv:2504.18383  [pdf, other

    cs.IR cs.AI

    Bridge the Domains: Large Language Models Enhanced Cross-domain Sequential Recommendation

    Authors: Qidong Liu, Xiangyu Zhao, Yejing Wang, Zijian Zhang, Howard Zhong, Chong Chen, Xiang Li, Wei Huang, Feng Tian

    Abstract: Cross-domain Sequential Recommendation (CDSR) aims to extract the preference from the user's historical interactions across various domains. Despite some progress in CDSR, two problems set the barrier for further advancements, i.e., overlap dilemma and transition complexity. The former means existing CDSR methods severely rely on users who own interactions on all domains to learn cross-domain item… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: accepted by SIGIR'25

  36. arXiv:2504.18032  [pdf, other

    cs.CV

    Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

    Authors: Chen Chen, Daochang Liu, Mubarak Shah, Chang Xu

    Abstract: Text-to-image diffusion models have demonstrated remarkable capabilities in creating images highly aligned with user prompts, yet their proclivity for memorizing training set images has sparked concerns about the originality of the generated images and privacy issues, potentially leading to legal complications for both model owners and users, particularly when the memorized images contain propriet… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR 2025. Project page: https://chenchen-usyd.github.io/PRSS-Project-Page/

  37. arXiv:2504.17964  [pdf, ps, other

    cs.HC cs.AI

    Evaluating Machine Expertise: How Graduate Students Develop Frameworks for Assessing GenAI Content

    Authors: Celia Chen, Alex Leitch

    Abstract: This paper examines how graduate students develop frameworks for evaluating machine-generated expertise in web-based interactions with large language models (LLMs). Through a qualitative study combining surveys, LLM interaction transcripts, and in-depth interviews with 14 graduate students, we identify patterns in how these emerging professionals assess and engage with AI-generated content. Our fi… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Under review at ACM Web Science Conference 2025's Human-GenAI Interactions Workshop, 4 pages

  38. arXiv:2504.17934  [pdf, other

    cs.HC cs.CL cs.CR

    Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents

    Authors: Chaoran Chen, Zhiping Zhang, Ibrahim Khalilov, Bingcan Guo, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li

    Abstract: The rise of Large Language Models (LLMs) has revolutionized Graphical User Interface (GUI) automation through LLM-powered GUI agents, yet their ability to process sensitive data with limited human oversight raises significant privacy and security risks. This position paper identifies three key risks of GUI agents and examines how they differ from traditional GUI automation and general autonomous a… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  39. arXiv:2504.17334  [pdf, other

    cs.HC cs.IR

    DataScout: Automatic Data Fact Retrieval for Statement Augmentation with an LLM-Based Agent

    Authors: Chuer Chen, Yuqi Liu, Danqing Shi, Shixiong Cao, Nan Cao

    Abstract: A data story typically integrates data facts from multiple perspectives and stances to construct a comprehensive and objective narrative. However, retrieving these facts demands time for data search and challenges the creator's analytical skills. In this work, we introduce DataScout, an interactive system that automatically performs reasoning and stance-based data facts retrieval to augment the us… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  40. arXiv:2504.17267  [pdf, other

    cs.HC cs.MM

    MV-Crafter: An Intelligent System for Music-guided Video Generation

    Authors: Chuer Chen, Shengqi Dang, Yuqi Liu, Nanxuan Zhao, Yang Shi, Nan Cao

    Abstract: Music videos, as a prevalent form of multimedia entertainment, deliver engaging audio-visual experiences to audiences and have gained immense popularity among singers and fans. Creators can express their interpretations of music naturally through visual elements. However, the creation process of music video demands proficiency in script design, video shooting, and music-video synchronization, posi… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  41. arXiv:2504.16922  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

    Authors: Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi, Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu, Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, Humphrey Shi

    Abstract: Many sparse attention mechanisms such as Neighborhood Attention have typically failed to consistently deliver speedup over the self attention baseline. This is largely due to the level of complexity in attention infrastructure, and the rapid evolution of AI hardware architecture. At the same time, many state-of-the-art foundational models, particularly in computer vision, are heavily bound by atte… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: https://github.com/SHI-Labs/NATTEN/

  42. arXiv:2504.16665  [pdf, other

    cs.CV

    A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification

    Authors: Wenping Ma, Boyou Xue, Mengru Ma, Chuang Chen, Hekai Zhang, Hao Zhu

    Abstract: Multispectral (MS) and panchromatic (PAN) images describe the same land surface, so these images not only have their own advantages, but also have a lot of similar information. In order to separate these similar information and their respective advantages, reduce the feature redundancy in the fusion stage. This paper introduces a diff-attention aware state space fusion model (DAS2F-Model) for mult… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 12 pages,9 figures

  43. arXiv:2504.16003  [pdf, other

    cs.CV

    MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment

    Authors: Yachun Mi, Yu Li, Weicheng Meng, Chaofeng Chen, Chen Hui, Shaohui Liu

    Abstract: The rapid growth of long-duration, high-definition videos has made efficient video quality assessment (VQA) a critical challenge. Existing research typically tackles this problem through two main strategies: reducing model parameters and resampling inputs. However, light-weight Convolution Neural Networks (CNN) and Transformers often struggle to balance efficiency with high performance due to the… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  44. arXiv:2504.15928  [pdf, other

    cs.CV cs.AI

    A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers

    Authors: Meng Wang, Tian Lin, Qingshan Hou, Aidi Lin, Jingcheng Wang, Qingsheng Peng, Truong X. Nguyen, Danqi Fang, Ke Zou, Ting Xu, Cancan Xue, Ten Cheer Quek, Qinkai Yu, Minxin Liu, Hui Zhou, Zixuan Xiao, Guiqin He, Huiyu Liang, Tingkun Shi, Man Chen, Linna Liu, Yuanyuan Peng, Lianyu Wang, Qiuming Hu, Junhong Chen , et al. (15 additional authors not shown)

    Abstract: Artificial intelligence (AI) shows remarkable potential in medical imaging diagnostics, but current models typically require retraining when deployed across different clinical centers, limiting their widespread adoption. We introduce GlobeReady, a clinician-friendly AI platform that enables ocular disease diagnosis without retraining/fine-tuning or technical expertise. GlobeReady achieves high acc… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  45. arXiv:2504.15474  [pdf, other

    cs.SE

    Agent for User: Testing Multi-User Interactive Features in TikTok

    Authors: Sidong Feng, Changhao Du, Huaxiao Liu, Qingnan Wang, Zhengwei Lv, Gang Huo, Xu Yang, Chunyang Chen

    Abstract: TikTok, a widely-used social media app boasting over a billion monthly active users, requires effective app quality assurance for its intricate features. Feature testing is crucial in achieving this goal. However, the multi-user interactive features within the app, such as live streaming, voice calls, etc., pose significant challenges for developers, who must handle simultaneous device management… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted to ICSE 2025 Industry paper

  46. arXiv:2504.15046  [pdf, other

    cs.AI

    Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision

    Authors: Shilin Zhang, Zican Hu, Wenhao Wu, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang

    Abstract: RL systems usually tackle generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source o… ▽ More

    Submitted 22 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: 18 pages, 8 figures

  47. arXiv:2504.13618  [pdf, other

    cs.RO

    On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting

    Authors: Niklas Funk, Changqi Chen, Tim Schneider, Georgia Chalvatzaki, Roberto Calandra, Jan Peters

    Abstract: The field of robotic manipulation has advanced significantly in the last years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract cr… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  48. arXiv:2504.13617  [pdf, other

    cs.CV

    Compile Scene Graphs with Reinforcement Learning

    Authors: Zuyao Chen, Jinlin Wu, Zhen Lei, Marc Pollefeys, Chang Wen Chen

    Abstract: Next-token prediction is the fundamental principle for training large language models (LLMs), and reinforcement learning (RL) further enhances their reasoning performance. As an effective way to model language, image, video, and other modalities, the use of LLMs for end-to-end extraction of structured visual representations, such as scene graphs, remains underexplored. It requires the model to acc… ▽ More

    Submitted 11 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  49. arXiv:2504.13432  [pdf, other

    cs.CV

    Circular Image Deturbulence using Quasi-conformal Geometry

    Authors: Chu Chen, Han Zhang, Lok Ming Lui

    Abstract: The presence of inhomogeneous media between optical sensors and objects leads to distorted imaging outputs, significantly complicating downstream image-processing tasks. A key challenge in image restoration is the lack of high-quality, paired-label images required for training supervised models. In this paper, we introduce the Circular Quasi-Conformal Deturbulence (CQCD) framework, an unsupervised… ▽ More

    Submitted 20 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  50. arXiv:2504.13037  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond

    Authors: Yundi Zhang, Paul Hager, Che Liu, Suprosanna Shit, Chen Chen, Daniel Rueckert, Jiazhen Pan

    Abstract: Cardiac magnetic resonance imaging is the gold standard for non-invasive cardiac assessment, offering rich spatio-temporal views of the cardiac anatomy and physiology. Patient-level health factors, such as demographics, metabolic, and lifestyle, are known to substantially influence cardiovascular health and disease risk, yet remain uncaptured by CMR alone. To holistically understand cardiac health… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.