Skip to main content

Showing 1–50 of 656 results for author: Chang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03893  [pdf, ps, other

    cs.CV cs.AI

    Hierarchical Semantic-Visual Fusion of Visible and Near-infrared Images for Long-range Haze Removal

    Authors: Yi Li, Xiaoxiong Wang, Jiawei Wang, Yi Chang, Kai Cao, Luxin Yan

    Abstract: While image dehazing has advanced substantially in the past decade, most efforts have focused on short-range scenarios, leaving long-range haze removal under-explored. As distance increases, intensified scattering leads to severe haze and signal loss, making it impractical to recover distant details solely from visible images. Near-infrared, with superior fog penetration, offers critical complemen… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: This work has been accepted by IEEE Transactions on Multimedia for publication

  2. arXiv:2507.02824  [pdf, ps, other

    eess.SP cs.AI cs.LG

    DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

    Authors: Po-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang

    Abstract: In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths. In particular, a reconfigurable intelligent surface (RIS) is employed to enhance MIMO transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects. The tradit… ▽ More

    Submitted 3 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

    Comments: 5 pages, 4 figures, 2 tables, accepted by 2024 IEEE Globecom Workshops

  3. arXiv:2507.01756  [pdf, ps, other

    cs.CV

    Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

    Authors: Peng Zheng, Junke Wang, Yi Chang, Yizhou Yu, Rui Ma, Zuxuan Wu

    Abstract: Recent advances in large language models (LLMs) have spurred interests in encoding images as discrete tokens and leveraging autoregressive (AR) frameworks for visual generation. However, the quantization process in AR-based visual generation models inherently introduces information loss that degrades image fidelity. To mitigate this limitation, recent studies have explored to autoregressively pred… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: accepted by iccv 2025

  4. arXiv:2506.18799  [pdf, ps, other

    quant-ph cs.IT

    Spatial Regionalization: A Hybrid Quantum Computing Approach

    Authors: Yunhan Chang, Amr Magdy, Federico M. Spedalieri, Ibrahim Sabek

    Abstract: Quantum computing has shown significant potential to address complex optimization problems; however, its application remains confined to specific problems at limited scales. Spatial regionalization remains largely unexplored in quantum computing due to its complexity and large number of variables. In this paper, we introduce the first hybrid quantum-classical method to spatial regionalization by d… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  5. arXiv:2506.18522  [pdf, ps, other

    cs.LG

    DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling

    Authors: Yang Chang, Kuang-Da Wang, Ping-Chun Hsieh, Cheng-Kuan Lin, Wen-Chih Peng

    Abstract: Uncovering the underlying ordinary differential equations (ODEs) that govern dynamic systems is crucial for advancing our understanding of complex phenomena. Traditional symbolic regression methods often struggle to capture the temporal dynamics and intervariable correlations inherent in ODEs. ODEFormer, a state-of-the-art method for inferring multidimensional ODEs from single trajectories, has ma… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  6. arXiv:2506.16050  [pdf, ps, other

    cs.RO cs.CV

    Noise Fusion-based Distillation Learning for Anomaly Detection in Complex Industrial Environments

    Authors: Jiawen Yu, Jieji Ren, Yang Chang, Qiaojun Yu, Xuan Tong, Boyang Wang, Yan Song, You Li, Xinji Mai, Wenqiang Zhang

    Abstract: Anomaly detection and localization in automated industrial manufacturing can significantly enhance production efficiency and product quality. Existing methods are capable of detecting surface defects in pre-defined or controlled imaging environments. However, accurately detecting workpiece defects in complex and unstructured industrial environments with varying views, poses and illumination remain… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: IROS 2025 Oral

  7. arXiv:2506.15936  [pdf

    quant-ph cs.ET

    Mixed-Signal Quantum Circuit Design for Option Pricing Using Design Compiler

    Authors: Yu-Ting Kao, Yeong-Jar Chang, Ying-Wei Tseng

    Abstract: Prior studies have largely focused on quantum algorithms, often reducing parallel computing designs to abstract models or overly simplified circuits. This has contributed to the misconception that most applications are feasible only through VLSI circuits and cannot be implemented using quantum circuits. To challenge this view, we present a mixed-signal quantum circuit framework incorporating three… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  8. arXiv:2506.15068  [pdf, ps, other

    cs.CL cs.LG

    Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation

    Authors: Zongxia Li, Yapei Chang, Yuhang Zhou, Xiyang Wu, Zichao Liang, Yoo Yeon Sung, Jordan Lee Boyd-Graber

    Abstract: Evaluating open-ended long-form generation is challenging because it is hard to define what clearly separates good from bad outputs. Existing methods often miss key aspects like coherence, style, or relevance, or are biased by pretraining data, making open-ended long-form evaluation an underexplored problem. To address this gap, we propose PrefBERT, a scoring model for evaluating open-ended long-f… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  9. arXiv:2506.12379  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Training-free LLM Merging for Multi-task Learning

    Authors: Zichuan Fu, Xian Wu, Yejing Wang, Wanyu Wang, Shanshan Ye, Hongzhi Yin, Yi Chang, Yefeng Zheng, Xiangyu Zhao

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models tailored for various tasks and languages. In this paper, we explore an important question: is it possible to combine these specialized models to create a unifie… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 14 pages, 6 figures

    Journal ref: ACL 2025 Main

  10. Bounded Memory in Distributed Networks

    Authors: Ran Ben Basat, Keren Censor-Hillel, Yi-Jun Chang, Wenchen Han, Dean Leitersdorf, Gregory Schwartzman

    Abstract: The recent advent of programmable switches makes distributed algorithms readily deployable in real-world datacenter networks. However, there are still gaps between theory and practice that prevent the smooth adaptation of CONGEST algorithms to these environments. In this paper, we focus on the memory restrictions that arise in real-world deployments. We introduce the $μ$-CONGEST model where on top… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted at The 37th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '25). 22 pages

  11. arXiv:2506.11477  [pdf, ps, other

    cs.CV

    FAME: A Lightweight Spatio-Temporal Network for Model Attribution of Face-Swap Deepfakes

    Authors: Wasim Ahmad, Yan-Tsung Peng, Yuan-Hao Chang

    Abstract: The widespread emergence of face-swap Deepfake videos poses growing risks to digital security, privacy, and media integrity, necessitating effective forensic tools for identifying the source of such manipulations. Although most prior research has focused primarily on binary Deepfake detection, the task of model attribution -- determining which generative model produced a given Deepfake -- remains… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  12. arXiv:2506.10025  [pdf, ps, other

    cs.CR cs.CY cs.HC

    Mind the Gap: Revealing Security Barriers through Situational Awareness of Small and Medium Business Key Decision-Makers

    Authors: Yuanhaur Chang, Oren Heller, Yaniv Shlomo, Iddo Bar-Noy, Ella Bokobza, Michal Grinstein-Weiss, Ning Zhang

    Abstract: Key decision-makers in small and medium businesses (SMBs) often lack the awareness and knowledge to implement cybersecurity measures effectively. To gain a deeper understanding of how SMB executives navigate cybersecurity decision-making, we deployed a mixed-method approach, conducting semi-structured interviews (n=21) and online surveys (n=322) with SMB key decision-makers. Using thematic analysi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  13. arXiv:2506.09427  [pdf, other

    cs.CV cs.AI

    A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

    Authors: Yukang Feng, Jianwen Sun, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yifan Chang, Sizhuo Zhou, Shenglin Zhang, Yu Dai, Kaipeng Zhang

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed us… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  14. arXiv:2506.07642  [pdf, ps, other

    cs.CL

    TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review

    Authors: Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, Ngai Wong

    Abstract: While Large Language Models (LLMs) have shown significant potential in assisting peer review, current methods often struggle to generate thorough and insightful reviews while maintaining efficiency. In this paper, we propose TreeReview, a novel framework that models paper review as a hierarchical and bidirectional question-answering process. TreeReview first constructs a tree of review questions b… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 30 pages, 17 figures

  15. arXiv:2506.07454  [pdf, ps, other

    cs.RO cs.AI

    Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

    Authors: Jared Strader, Aaron Ray, Jacob Arkin, Mason B. Peterson, Yun Chang, Nathan Hughes, Christopher Bradley, Yi Xuan Jia, Carlos Nieto-Granda, Rajat Talak, Chuchu Fan, Luca Carlone, Jonathan P. How, Nicholas Roy

    Abstract: In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, vi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 12 pages, 4 figures

  16. arXiv:2506.06708  [pdf, ps, other

    cs.CL

    A Survey of Retentive Network

    Authors: Haiqi Yang, Zhiyuan Li, Yi Chang, Yuan Wu

    Abstract: Retentive Network (RetNet) represents a significant advancement in neural network architecture, offering an efficient alternative to the Transformer. While Transformers rely on self-attention to model dependencies, they suffer from high memory costs and limited scalability when handling long sequences due to their quadratic complexity. To mitigate these limitations, RetNet introduces a retention m… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 15 pages, 3 figures

  17. arXiv:2506.05994  [pdf, ps, other

    cs.LG cs.AR cs.ET

    RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory

    Authors: Yi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang, Camélia Slimani, Jalil Boukhobza, Tei-Wei Kuo

    Abstract: Although deep learning has demonstrated remarkable capabilities in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Rece… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  18. arXiv:2506.05615  [pdf, ps, other

    cs.LG cs.AI

    When Maximum Entropy Misleads Policy Optimization

    Authors: Ruipeng Zhang, Ya-Chien Chang, Sicun Gao

    Abstract: The Maximum Entropy Reinforcement Learning (MaxEnt RL) framework is a leading approach for achieving efficient learning and robust performance across many RL tasks. However, MaxEnt methods have also been shown to struggle with performance-critical control problems in practice, where non-MaxEnt algorithms can successfully learn. In this work, we analyze how the trade-off between robustness and opti… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Journal ref: ICML 2025

  19. arXiv:2506.03373  [pdf, ps, other

    cs.CV cs.AI

    A Foundation Model for Spatial Proteomics

    Authors: Muhammad Shaban, Yuzhou Chang, Huaying Qiu, Yao Yu Yeo, Andrew H. Song, Guillaume Jaume, Yuchen Wang, Luca L. Weishaupt, Tong Ding, Anurag Vaidya, Abdallah Lamane, Daniel Shao, Mohammed Zidane, Yunhao Bai, Paige McCallum, Shuli Luo, Wenrui Wu, Yang Wang, Precious Cramer, Chi Ngai Chan, Pierre Stephan, Johanna Schaffenrath, Jia Le Lee, Hendrik A. Michel, Caiwei Tian , et al. (35 additional authors not shown)

    Abstract: Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-superv… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  20. arXiv:2506.02139  [pdf, ps, other

    cs.AI

    The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

    Authors: Edward Y. Chang

    Abstract: Few-shot learning in large language models (LLMs) reveals a core paradox: certain tasks generalize from just a few examples, while others demand extensive supervision. To explain this, we introduce the Unified Cognitive Consciousness Theory (UCCT), which reconceptualizes LLMs not as deficient agents, but as unconscious substrates: dense, distributed repositories of linguistic and conceptual patter… ▽ More

    Submitted 3 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: 12 pages, 2 figure, 1 table

    ACM Class: I.2.7

  21. arXiv:2506.02050  [pdf, ps, other

    cs.LG cs.AI

    Decoupled Hierarchical Reinforcement Learning with State Abstraction for Discrete Grids

    Authors: Qingyu Xiao, Yuanlin Chang, Youtian Du

    Abstract: Effective agent exploration remains a core challenge in reinforcement learning (RL) for complex discrete state-space environments, particularly under partial observability. This paper presents a decoupled hierarchical RL framework integrating state abstraction (DcHRL-SA) to address this issue. The proposed method employs a dual-level architecture, consisting of a high level RL-based actor and a lo… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 6 pages, 6 figures

  22. arXiv:2505.23715  [pdf, ps, other

    cs.CL

    Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models

    Authors: Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu

    Abstract: Large language models (LLMs) have witnessed rapid advancements, demonstrating remarkable capabilities. However, a notable vulnerability persists: LLMs often uncritically accept flawed or contradictory premises, leading to inefficient reasoning and unreliable outputs. This emphasizes the significance of possessing the \textbf{Premise Critique Ability} for LLMs, defined as the capacity to proactivel… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 31 pages,13 figures,15 tables

  23. arXiv:2505.22126  [pdf, ps, other

    cs.CV cs.AI

    SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model

    Authors: Yifan Chang, Yukang Feng, Jianwen Sun, Jiaxin Ai, Chuanhao Li, S. Kevin Zhou, Kaipeng Zhang

    Abstract: Recent years have seen rapid advances in AI-driven image generation. Early diffusion models emphasized perceptual quality, while newer multimodal models like GPT-4o-image integrate high-level reasoning, improving semantic understanding and structural composition. Scientific illustration generation exemplifies this evolution: unlike general image synthesis, it demands accurate interpretation of tec… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  24. arXiv:2505.22113  [pdf, ps, other

    cs.CL

    THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models

    Authors: Zhiyuan Li, Yi Chang, Yuan Wu

    Abstract: Large reasoning models (LRMs) have achieved impressive performance in complex tasks, often outperforming conventional large language models (LLMs). However, the prevalent issue of overthinking severely limits their computational efficiency. Overthinking occurs when models generate excessive and redundant tokens that contribute little to accurate outcomes, especially in simple tasks, resulting in a… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 20 pages, 8 figures, 6 tables

  25. arXiv:2505.21880  [pdf

    cs.MA cs.AI cs.CL cs.CY

    Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation

    Authors: Yu-Lun Song, Chung-En Tsern, Che-Cheng Wu, Yu-Ming Chang, Syuan-Bo Huang, Wei-Chu Chen, Michael Chia-Liang Lin, Yu-Ta Lin

    Abstract: This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-wor… ▽ More

    Submitted 3 July, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: 8 pages, 8 figures. This paper is reviewed and accepted by the CUPUM (Computational Urban Planning and Urban Management) Conference held by University College London (UCL) in 2025

  26. arXiv:2505.20421  [pdf, ps, other

    cs.GR

    Precise Gradient Discontinuities in Neural Fields for Subspace Physics

    Authors: Mengfei Liu, Yue Chang, Zhecheng Wang, Peter Yichen Chen, Eitan Grinspun

    Abstract: Discontinuities in spatial derivatives appear in a wide range of physical systems, from creased thin sheets to materials with sharp stiffness transitions. Accurately modeling these features is essential for simulation but remains challenging for traditional mesh-based methods, which require discontinuity-aligned remeshing -- entangling geometry with simulation and hindering generalization across s… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  27. arXiv:2505.20350  [pdf, other

    cs.LG cs.AI

    Decision Flow Policy Optimization

    Authors: Jifeng Hu, Sili Huang, Siyuan Guo, Zhaogeng Liu, Li Shen, Lichao Sun, Hechang Chen, Yi Chang, Dacheng Tao

    Abstract: In recent years, generative models have shown remarkable capabilities across diverse fields, including images, videos, language, and decision-making. By applying powerful generative models such as flow-based models to reinforcement learning, we can effectively model complex multi-modal action distributions and achieve superior robotic control in continuous action spaces, surpassing the limitations… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  28. arXiv:2505.19095  [pdf, ps, other

    cs.AI

    ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

    Authors: Runliang Niu, Jinglong Ji, Yi Chang, Qi Wang

    Abstract: The rapid progress of large language models (LLMs) has sparked growing interest in building Artificial General Intelligence (AGI) within Graphical User Interface (GUI) environments. However, existing GUI agents based on LLMs or vision-language models (VLMs) often fail to generalize to novel environments and rely heavily on manually curated, diverse datasets. To overcome these limitations, we intro… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  29. arXiv:2505.18536  [pdf, other

    cs.CL cs.AI cs.CV

    Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

    Authors: Haoyuan Sun, Jiaqi Wu, Bo Xia, Yifu Luo, Yifei Zhao, Kai Qin, Xufei Lv, Tiantian Zhang, Yongzhe Chang, Xueqian Wang

    Abstract: Standing in 2025, at a critical juncture in the pursuit of Artificial General Intelligence (AGI), reinforcement fine-tuning (RFT) has demonstrated significant potential in enhancing the reasoning capability of large language models (LLMs) and has led to the development of cutting-edge AI models such as OpenAI-o1 and DeepSeek-R1. Moreover, the efficient application of RFT to enhance the reasoning c… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  30. arXiv:2505.17918  [pdf, ps, other

    cs.LG

    LLM Meeting Decision Trees on Tabular Data

    Authors: Hangting Ye, Jinmeng Li, He Zhao, Dandan Guo, Yi Chang

    Abstract: Tabular data have been playing a vital role in diverse real-world fields, including healthcare, finance, etc. With the recent success of Large Language Models (LLMs), early explorations of extending LLMs to the domain of tabular data have been developed. Most of these LLM-based methods typically first serialize tabular data into natural language descriptions, and then tune LLMs or directly infer o… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  31. arXiv:2505.17482  [pdf, ps, other

    cs.AI cs.CL

    From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark

    Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger, Yanchuan Chang

    Abstract: Recent reasoning-oriented LLMs have demonstrated strong performance on challenging tasks such as mathematics and science examinations. However, core cognitive faculties of human intelligence, such as abstract reasoning and generalization, remain underexplored. To address this, we evaluate recent reasoning-oriented LLMs on the Abstraction and Reasoning Corpus (ARC) benchmark, which explicitly deman… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  32. arXiv:2505.16470  [pdf, other

    cs.IR cs.CL cs.CV

    Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering

    Authors: Kuicai Dong, Yujing Chang, Shijie Huang, Yasheng Wang, Ruiming Tang, Yong Liu

    Abstract: Document Visual Question Answering (DocVQA) faces dual challenges in processing lengthy multimodal documents (text, images, tables) and performing cross-modal reasoning. Current document retrieval-augmented generation (DocRAG) methods remain limited by their text-centric approaches, frequently missing critical visual information. The field also lacks robust benchmarks for assessing multimodal evid… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: preprint. code available at \url{https://mmdocrag.github.io/MMDocRAG/}

  33. arXiv:2505.15779  [pdf, ps, other

    cs.CV cs.AI

    IA-T2I: Internet-Augmented Text-to-Image Generation

    Authors: Chuanhao Li, Jianwen Sun, Yukang Feng, Mingliang Zhai, Yifan Chang, Kaipeng Zhang

    Abstract: Current text-to-image (T2I) generation models achieve promising results, but they fail on the scenarios where the knowledge implied in the text prompt is uncertain. For example, a T2I model released in February would struggle to generate a suitable poster for a movie premiering in April, because the character designs and styles are uncertain to the model. To solve this problem, we propose an Inter… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 12 pages, 7 figures, a framework that integrates reference images from the Internet into T2I/TI2I models

  34. arXiv:2505.14254  [pdf, other

    cs.CV

    Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

    Authors: Yuanyuan Chang, Yinghua Yao, Tao Qin, Mengmeng Wang, Ivor Tsang, Guang Dai

    Abstract: Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual prompt crafting, which can be time-consuming, introduce irrelevant details, and significantly limit editing performance. In this work, we propose optimizing seman… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  35. arXiv:2505.13957  [pdf, ps, other

    cs.CR cs.CL

    Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation

    Authors: Jiankun Zhang, Shenglai Zeng, Jie Ren, Tianqi Zheng, Hui Liu, Xianfeng Tang, Hui Liu, Yi Chang

    Abstract: Multimodal Retrieval-Augmented Generation (MRAG) systems enhance LMMs by integrating external multimodal databases, but introduce unexplored privacy vulnerabilities. While text-based RAG privacy risks have been studied, multimodal data presents unique challenges. We provide the first systematic analysis of MRAG privacy vulnerabilities across vision-language and speech-language modalities. Using a… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  36. arXiv:2505.12608  [pdf, ps, other

    cs.DC

    Quantum Modeling of Spatial Contiguity Constraints

    Authors: Yunhan Chang, Amr Magdy, Federico M. Spedalieri

    Abstract: Quantum computing has demonstrated potential for solving complex optimization problems; however, its application to spatial regionalization remains underexplored. Spatial contiguity, a fundamental constraint requiring spatial entities to form connected components, significantly increases the complexity of regionalization problems, which are typically challenging for quantum modeling. This paper pr… ▽ More

    Submitted 1 June, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

  37. arXiv:2505.12501  [pdf, ps, other

    cs.AI

    ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning

    Authors: Edward Y. Chang, Longling Geng

    Abstract: Large language models (LLMs) excel at rapid generation of text and multimodal content, yet they falter on transaction-style planning that demands ACID-like guarantees and real-time disruption recovery. We present Adaptive LLM Agent System (ALAS), a framework that tackles four fundamental LLM deficits: (i) absence of self-verification, (ii) context erosion, (iii) next-token myopia, and (iv) lack of… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: 36 pages, 10 figures, 19 tables

    ACM Class: I.2.7

  38. arXiv:2505.11080  [pdf, ps, other

    cs.CL cs.AI cs.LG

    BLEUBERI: BLEU is a surprisingly effective reward for instruction following

    Authors: Yapei Chang, Yekyung Kim, Michael Krumdick, Amir Zadeh, Chuan Li, Chris Tanner, Mohit Iyyer

    Abstract: Reward models are central to aligning LLMs with human preferences, but they are costly to train, requiring large-scale human-labeled preference data and powerful pretrained LLM backbones. Meanwhile, the increasing availability of high-quality synthetic instruction-following datasets raises the question: can simpler, reference-based metrics serve as viable alternatives to reward models during RL-ba… ▽ More

    Submitted 7 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: 28 pages, 11 figures, 15 tables; updated table 1 with random reward results, fixed broken references in appendix

  39. arXiv:2505.09653  [pdf, other

    quant-ph cs.AI cs.ET cs.LG cs.NE

    Differentiable Quantum Architecture Search in Quantum-Enhanced Neural Network Parameter Generation

    Authors: Samuel Yen-Chi Chen, Chen-Yu Liu, Kuan-Cheng Chen, Wei-Jia Huang, Yen-Jui Chang, Wei-Hao Huang

    Abstract: The rapid advancements in quantum computing (QC) and machine learning (ML) have led to the emergence of quantum machine learning (QML), which integrates the strengths of both fields. Among QML approaches, variational quantum circuits (VQCs), also known as quantum neural networks (QNNs), have shown promise both empirically and theoretically. However, their broader adoption is hindered by reliance o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  40. arXiv:2505.09395  [pdf, other

    quant-ph cs.AI cs.LG

    Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

    Authors: Chen-Yu Liu, Kuan-Cheng Chen, Yi-Chien Chen, Samuel Yen-Chi Chen, Wei-Hao Huang, Wei-Jia Huang, Yen-Jui Chang

    Abstract: Typhoon trajectory forecasting is essential for disaster preparedness but remains computationally demanding due to the complexity of atmospheric dynamics and the resource requirements of deep learning models. Quantum-Train (QT), a hybrid quantum-classical framework that leverages quantum neural networks (QNNs) to generate trainable parameters exclusively during training, eliminating the need for q… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  41. arXiv:2505.09115  [pdf, ps, other

    cs.HC cs.AI

    PreCare: Designing AI Assistants for Advance Care Planning (ACP) to Enhance Personal Value Exploration, Patient Knowledge, and Decisional Confidence

    Authors: Yu Lun Hsu, Yun-Rung Chou, Chiao-Ju Chang, Yu-Cheng Chang, Zer-Wei Lee, Rokas Gipiškis, Rachel Li, Chih-Yuan Shih, Jen-Kuei Peng, Hsien-Liang Huang, Jaw-Shiun Tsai, Mike Y. Chen

    Abstract: Advance Care Planning (ACP) allows individuals to specify their preferred end-of-life life-sustaining treatments before they become incapacitated by injury or terminal illness (e.g., coma, cancer, dementia). While online ACP offers high accessibility, it lacks key benefits of clinical consultations, including personalized value exploration, immediate clarification of decision consequences. To brid… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  42. arXiv:2505.03484  [pdf, other

    cs.IR

    STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation

    Authors: Maolin Wang, Sheng Zhang, Ruocheng Guo, Wanyu Wang, Xuetao Wei, Zitao Liu, Hongzhi Yin, Yi Chang, Xiangyu Zhao

    Abstract: Recent deep sequential recommendation models often struggle to effectively model key characteristics of user behaviors, particularly in handling sequence length variations and capturing diverse interaction patterns. We propose STAR-Rec, a novel architecture that synergistically combines preference-aware attention and state-space modeling through a sequence-level mixture-of-experts framework. STAR-… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by SIGIR 2025

  43. arXiv:2505.03116  [pdf, ps, other

    cs.CV

    TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion

    Authors: Haoyue Liu, Jinghan Xu, Yi Chang, Hanyu Zhou, Haozhi Zhao, Lin Wang, Luxin Yan

    Abstract: Video frame interpolation (VFI) that leverages the bio-inspired event cameras as guidance has recently shown better performance and memory efficiency than the frame-based methods, thanks to the event cameras' advantages, such as high temporal resolution. A hurdle for event-based VFI is how to effectively deal with non-linear motion, caused by the dynamic changes in motion direction and speed withi… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  44. arXiv:2505.02483  [pdf, other

    cs.RO cs.AI

    Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning

    Authors: Changxin Huang, Junyang Liang, Yanbin Chang, Jingzhao Xu, Jianqiang Li

    Abstract: Enabling a high-degree-of-freedom robot to learn specific skills is a challenging task due to the complexity of robotic dynamics. Reinforcement learning (RL) has emerged as a promising solution; however, addressing such problems requires the design of multiple reward functions to account for various constraints in robotic motion. Existing approaches typically sum all reward components indiscrimina… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  45. arXiv:2505.01822  [pdf, other

    cs.LG cs.AI

    Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

    Authors: Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  46. arXiv:2504.21214  [pdf, other

    cs.CL cs.AI eess.AS

    Pretraining Large Brain Language Model for Active BCI: Silent Speech

    Authors: Jinzhao Zhou, Zehong Cao, Yiqun Duan, Connor Barkley, Daniel Leong, Xiaowei Jiang, Quoc-Toan Nguyen, Ziyi Zhao, Thomas Do, Yu-Cheng Chang, Sheng-Fu Liang, Chin-teng Lin

    Abstract: This paper explores silent speech decoding in active brain-computer interface (BCI) systems, which offer more natural and flexible communication than traditional BCI applications. We collected a new silent speech dataset of over 120 hours of electroencephalogram (EEG) recordings from 12 subjects, capturing 24 commonly used English words for language model pretraining and decoding. Following the re… ▽ More

    Submitted 3 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  47. arXiv:2504.17097  [pdf, other

    cs.DC cs.DM cs.DS

    Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation

    Authors: Yen-Hsiang Chang, Aydın Buluç, James Demmel

    Abstract: The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on parallelizing the approximate minimum degree algorithm itself. In this paper, we explore different parallelization strategies, and introduce a novel parallel fr… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 11 pages, 7 figures, 5 tables

  48. arXiv:2504.16357  [pdf, other

    cs.DC cs.AI cs.LG

    DP2FL: Dual Prompt Personalized Federated Learning in Foundation Models

    Authors: Ying Chang, Xiaohu Shi, Xiaohui Zhao, Zhaohuang Chen, Deyin Ma

    Abstract: Personalized federated learning (PFL) has garnered significant attention for its ability to address heterogeneous client data distributions while preserving data privacy. However, when local client data is limited, deep learning models often suffer from insufficient training, leading to suboptimal performance. Foundation models, such as CLIP (Contrastive Language-Image Pretraining), exhibit strong… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  49. arXiv:2504.13532  [pdf, other

    quant-ph cs.CV q-fin.PR

    Quantum Walks-Based Adaptive Distribution Generation with Efficient CUDA-Q Acceleration

    Authors: Yen-Jui Chang, Wei-Ting Wang, Chen-Yu Liu, Yun-Yuan Wang, Ching-Ray Chang

    Abstract: We present a novel Adaptive Distribution Generator that leverages a quantum walks-based approach to generate high precision and efficiency of target probability distributions. Our method integrates variational quantum circuits with discrete-time quantum walks, specifically, split-step quantum walks and their entangled extensions, to dynamically tune coin parameters and drive the evolution of quant… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 17 pages, 5 figures

  50. An Addendum to NeBula: Towards Extending TEAM CoSTAR's Solution to Larger Scale Environments

    Authors: Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Sung-Kyun Kim, Muhammad Fadhil Ginting, Xianmei Lei, Jeffrey Edlund, Seyed Fakoorian, Amanda Bouman, Fernando Chavez, Taeyeon Kim, Gustavo J. Correa, Maira Saboia, Angel Santamaria-Navarro, Brett Lopez, Boseong Kim, Chanyoung Jung, Mamoru Sobue, Oriana Claudia Peltzer, Joshua Ott, Robert Trybula, Thomas Touma, Marcel Kaufmann, Tiago Stegun Vaquero , et al. (64 additional authors not shown)

    Abstract: This paper presents an appendix to the original NeBula autonomy solution developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), participating in the DARPA Subterranean Challenge. Specifically, this paper presents extensions to NeBula's hardware, software, and algorithmic components that focus on increasing the range and scale of the exploration environment. From the algorithm… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Journal ref: IEEE Transactions on Field Robotics, vol. 1, pp. 476-526, 2024