Skip to main content

Showing 1–50 of 1,585 results for author: Chen, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10281  [pdf, other

    cs.CV

    MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting

    Authors: Mengqiu Xu, Kaixin Chen, Heng Guo, Yixiang Huang, Ming Wu, Zhenwei Shi, Chuang Zhang, Jun Guo

    Abstract: Deep learning approaches for marine fog detection and forecasting have outperformed traditional methods, demonstrating significant scientific and practical importance. However, the limited availability of open-source datasets remains a major challenge. Existing datasets, often focused on a single region or satellite, restrict the ability to evaluate model performance across diverse conditions and… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09653  [pdf, other

    quant-ph cs.AI cs.ET cs.LG cs.NE

    Differentiable Quantum Architecture Search in Quantum-Enhanced Neural Network Parameter Generation

    Authors: Samuel Yen-Chi Chen, Chen-Yu Liu, Kuan-Cheng Chen, Wei-Jia Huang, Yen-Jui Chang, Wei-Hao Huang

    Abstract: The rapid advancements in quantum computing (QC) and machine learning (ML) have led to the emergence of quantum machine learning (QML), which integrates the strengths of both fields. Among QML approaches, variational quantum circuits (VQCs), also known as quantum neural networks (QNNs), have shown promise both empirically and theoretically. However, their broader adoption is hindered by reliance o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  3. arXiv:2505.09563  [pdf, ps, other

    quant-ph cs.IT

    Improved Sample Upper and Lower Bounds for Trace Estimation of Quantum State Powers

    Authors: Kean Chen, Qisheng Wang

    Abstract: As often emerges in various basic quantum properties such as entropy, the trace of quantum state powers $\operatorname{tr}(ρ^q)$ has attracted a lot of attention. The recent work of Liu and Wang (SODA 2025) showed that $\operatorname{tr}(ρ^q)$ can be estimated to within additive error $\varepsilon$ with a dimension-independent sample complexity of $\widetilde O(1/\varepsilon^{3+\frac{2}{q-1}})$ fo… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 23 pages, 1 table

  4. arXiv:2505.09395  [pdf, other

    quant-ph cs.AI cs.LG

    Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

    Authors: Chen-Yu Liu, Kuan-Cheng Chen, Yi-Chien Chen, Samuel Yen-Chi Chen, Wei-Hao Huang, Wei-Jia Huang, Yen-Jui Chang

    Abstract: Typhoon trajectory forecasting is essential for disaster preparedness but remains computationally demanding due to the complexity of atmospheric dynamics and the resource requirements of deep learning models. Quantum-Train (QT), a hybrid quantum-classical framework that leverages quantum neural networks (QNNs) to generate trainable parameters exclusively during training, eliminating the need for q… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.09252  [pdf, ps, other

    cs.CV

    Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

    Authors: Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

    Abstract: Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH bi… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2505.08849  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Improved Algorithms for Differentially Private Language Model Alignment

    Authors: Keyu Chen, Hao Tang, Qinglin Liu, Yizhao Xu

    Abstract: Language model alignment is crucial for ensuring that large language models (LLMs) align with human preferences, yet it often involves sensitive user data, raising significant privacy concerns. While prior work has integrated differential privacy (DP) with alignment techniques, their performance remains limited. In this paper, we propose novel algorithms for privacy-preserving alignment and rigoro… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2505.08474  [pdf, other

    quant-ph cs.AI cs.DC

    Distributed Quantum Neural Networks on Distributed Photonic Quantum Computing

    Authors: Kuan-Cheng Chen, Chen-Yu Liu, Yu Shang, Felix Burt, Kin K. Leung

    Abstract: We introduce a distributed quantum-classical framework that synergizes photonic quantum neural networks (QNNs) with matrix-product-state (MPS) mapping to achieve parameter-efficient training of classical neural networks. By leveraging universal linear-optical decompositions of $M$-mode interferometers and photon-counting measurement statistics, our architecture generates neural parameters through… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. arXiv:2505.05950  [pdf, ps, other

    cs.LG

    FloE: On-the-Fly MoE Inference on Memory-constrained GPU

    Authors: Yuxin Zhou, Zheng Li, Jun Zhang, Jue Wang, Yiping Wang, Zhongle Xie, Ke Chen, Lidan Shou

    Abstract: With the widespread adoption of Mixture-of-Experts (MoE) models, there is a growing demand for efficient inference on memory-constrained devices. While offloading expert parameters to CPU memory and loading activated experts on demand has emerged as a potential solution, the large size of activated experts overburdens the limited PCIe bandwidth, hindering the effectiveness in latency-sensitive sce… ▽ More

    Submitted 11 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  9. arXiv:2505.05335  [pdf, other

    cs.SD eess.AS

    FLAM: Frame-Wise Language-Audio Modeling

    Authors: Yusong Wu, Christos Tsirigotis, Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, Oriol Nieto, Prem Seetharaman, Justin Salamon

    Abstract: Recent multi-modal audio-language models (ALMs) excel at text-audio retrieval but struggle with frame-wise audio understanding. Prior works use temporal-aware labels or unsupervised training to improve frame-wise capabilities, but they still lack fine-grained labeling capability to pinpoint when an event occurs. While traditional sound event detection models can precisely localize events, they are… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML 2025

  10. arXiv:2505.05114  [pdf, other

    eess.AS cs.SD

    Listen to Extract: Onset-Prompted Target Speaker Extraction

    Authors: Pengjie Shen, Kangrui Chen, Shulin He, Pengru Chen, Shuqi Yuan, He Kong, Xueliang Zhang, Zhong-Qiu Wang

    Abstract: We propose $\textit{listen to extract}$ (LExt), a highly-effective while extremely-simple algorithm for monaural target speaker extraction (TSE). Given an enrollment utterance of a target speaker, LExt aims at extracting the target speaker from the speaker's mixed speech with other speakers. For each mixture, LExt concatenates an enrollment utterance of the target speaker to the mixture signal at… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: in submission

  11. arXiv:2505.04949  [pdf, ps, other

    cs.DS

    With a Little Help From My Friends: Exploiting Probability Distribution Advice in Algorithm Design

    Authors: Clément L. Canonne, Kenny Chen, Julián Mestre

    Abstract: We study online algorithms with predictions using distributional advice, a type of prediction that arises when leveraging expert knowledge or historical data. To demonstrate the usefulness and versatility of this framework, we focus on two fundamental problems: first, the prophet inequality problem, for which we provide an algorithm achieving $\max\{\frac{1}{2}-η-o(1),\frac{1}{e}\}$-competitive ra… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  12. arXiv:2505.03593  [pdf, other

    cs.CY

    A Unifying Bias-aware Multidisciplinary Framework for Investigating Socio-Technical Issues

    Authors: Sacha Hasan, Mehdi Rizvi, Yingfang Yuan, Kefan Chen, Lynne Baillie, Wei Pang

    Abstract: This paper aims to bring together the disciplines of social science (SS) and computer science (CS) in the design and implementation of a novel multidisciplinary framework for systematic, transparent, ethically-informed, and bias-aware investigation of socio-technical issues. For this, various analysis approaches from social science and machine learning (ML) were applied in a structured sequence to… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: First two authors with equal contribution

  13. arXiv:2505.03501  [pdf, other

    cs.CR cs.CL

    BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models

    Authors: Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Kangjie Chen, Tianwei Zhang, Qingchuan Zhao, Guowen Xu

    Abstract: In this paper, we present a new form of backdoor attack against Large Language Models (LLMs): lingual-backdoor attacks. The key novelty of lingual-backdoor attacks is that the language itself serves as the trigger to hijack the infected LLMs to generate inflammatory speech. They enable the precise targeting of a specific language-speaking group, exacerbating racial discrimination by malicious enti… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  14. arXiv:2505.03469  [pdf, other

    cs.CL

    Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

    Authors: Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen

    Abstract: Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the "overthinking" problem from teacher models, producing verbose and redundant… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 11 pages, 2 figures

  15. arXiv:2505.03344  [pdf, other

    cs.RO cs.LG

    RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation

    Authors: Keyu Chen, Wenchao Sun, Hao Cheng, Sifa Zheng

    Abstract: Achieving both realism and controllability in interactive closed-loop traffic simulation remains a key challenge in autonomous driving. Data-driven simulation methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and c… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  16. arXiv:2505.03007  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results

    Authors: Nikolay Safonov, Alexey Bryncev, Andrey Moskalenko, Dmitry Kulikov, Dmitry Vatolin, Radu Timofte, Haibo Lei, Qifan Gao, Qing Luo, Yaqing Li, Jie Song, Shaozhe Hao, Meisong Zheng, Jingyi Xu, Chengbin Wu, Jiahui Liu, Ying Chen, Xin Deng, Mai Xu, Peipei Liang, Jie Ma, Junjie Jin, Yingxue Pang, Fangzhou Luo, Kai Chen , et al. (6 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Challenge on UGC Video Enhancement. The challenge constructed a set of 150 user-generated content videos without reference ground truth, which suffer from real-world degradations such as noise, blur, faded colors, compression artifacts, etc. The goal of the participants was to develop an algorithm capable of improving the visual quality of such vid… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  17. arXiv:2505.02835  [pdf, ps, other

    cs.CV cs.CL

    R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

    Authors: Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, Haojie Ding, Jiankang Chen, Fan Yang, Zhang Zhang, Tingting Gao, Liang Wang

    Abstract: Multimodal Reward Models (MRMs) play a crucial role in enhancing the performance of Multimodal Large Language Models (MLLMs). While recent advancements have primarily focused on improving the model structure and training data of MRMs, there has been limited exploration into the effectiveness of long-term reasoning capabilities for reward modeling and how to activate these capabilities in MRMs. In… ▽ More

    Submitted 9 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: Home page: https://github.com/yfzhang114/r1_reward

  18. arXiv:2505.01976  [pdf, other

    cs.CR

    A Survey on Privacy Risks and Protection in Large Language Models

    Authors: Kang Chen, Xiuze Zhou, Yuanguo Lin, Shibo Feng, Li Shen, Pengcheng Wu

    Abstract: Although Large Language Models (LLMs) have become increasingly integral to diverse applications, their capabilities raise significant privacy concerns. This survey offers a comprehensive overview of privacy risks associated with LLMs and examines current solutions to mitigate these challenges. First, we analyze privacy leakage and attacks in LLMs, focusing on how these models unintentionally expos… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  19. arXiv:2505.01744  [pdf, ps, other

    cs.LG

    Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients

    Authors: Yezhen Wang, Zhouhao Yang, Brian K Chen, Fanyi Pu, Bo Li, Tianyu Gao, Kenji Kawaguchi

    Abstract: Building upon the success of low-rank adapter (LoRA), low-rank gradient projection (LoRP) has emerged as a promising solution for memory-efficient fine-tuning. However, existing LoRP methods typically treat each row of the gradient matrix as the default projection unit, leaving the role of projection granularity underexplored. In this work, we propose a novel framework, VLoRP, that extends low-ran… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  20. arXiv:2505.01583  [pdf, ps, other

    cs.CV cs.AI

    TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

    Authors: Jen-Hao Cheng, Vivian Wang, Huayu Wang, Huapeng Zhou, Yi-Hao Peng, Hou-I Liu, Hsiang-Wei Huang, Kuang-Ming Chen, Cheng-Yen Yang, Wenhao Chai, Yi-Ling Chen, Vibhav Vineet, Qin Cai, Jenq-Neng Hwang

    Abstract: Understanding causal event relationships and achieving fine-grained temporal grounding in videos remain challenging for vision-language models. Existing methods either compress video tokens to reduce temporal resolution, or treat videos as unsegmented streams, which obscures fine-grained event boundaries and limits the modeling of causal dependencies. We propose TEMPURA (Temporal Event Masked Pred… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  21. arXiv:2505.00976  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    Attack and defense techniques in large language models: A survey and new perspectives

    Authors: Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng Chen, Xingwang Huang, Yuanhui Yu

    Abstract: Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present significant security and ethical challenges. This systematic survey explores the evolving landscape of attack and defense techniques in LLMs. We classify attacks into adversarial prompt attack, optimized attacks, model theft, as well as attacks on application of LLMs, d… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  22. arXiv:2505.00853  [pdf, other

    cs.CY

    LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models

    Authors: Junfeng Jiao, Saleh Afroogh, Abhejay Murali, Kevin Chen, David Atkinson, Amit Dhurandhar

    Abstract: This study establishes a novel framework for systematically evaluating the moral reasoning capabilities of large language models (LLMs) as they increasingly integrate into critical societal domains. Current assessment methodologies lack the precision needed to evaluate nuanced ethical decision-making in AI systems, creating significant accountability gaps. Our framework addresses this challenge by… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  23. arXiv:2505.00561  [pdf, other

    quant-ph cs.AI

    Learning to Learn with Quantum Optimization via Quantum Neural Networks

    Authors: Kuan-Cheng Chen, Hiromichi Matsuyama, Wei-Hao Huang

    Abstract: Quantum Approximate Optimization Algorithms (QAOA) promise efficient solutions to classically intractable combinatorial optimization problems by harnessing shallow-depth quantum circuits. Yet, their performance and scalability often hinge on effective parameter optimization, which remains nontrivial due to rugged energy landscapes and hardware noise. In this work, we introduce a quantum meta-learn… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  24. arXiv:2505.00394  [pdf, other

    cs.CV

    SOTA: Spike-Navigated Optimal TrAnsport Saliency Region Detection in Composite-bias Videos

    Authors: Wenxuan Liu, Yao Deng, Kang Chen, Xian Zhong, Zhaofei Yu, Tiejun Huang

    Abstract: Existing saliency detection methods struggle in real-world scenarios due to motion blur and occlusions. In contrast, spike cameras, with their high temporal resolution, significantly enhance visual saliency maps. However, the composite noise inherent to spike camera imaging introduces discontinuities in saliency detection. Low-quality samples further distort model predictions, leading to saliency… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted to IJCAI 2025

  25. arXiv:2505.00364  [pdf, other

    cs.LG

    From GNNs to Trees: Multi-Granular Interpretability for Graph Neural Networks

    Authors: Jie Yang, Yuwen Wang, Kaixuan Chen, Tongya Zheng, Yihe Zhou, Zhenbang Xiao, Ji Cao, Mingli Song, Shunyu Liu

    Abstract: Interpretable Graph Neural Networks (GNNs) aim to reveal the underlying reasoning behind model predictions, attributing their decisions to specific subgraphs that are informative. However, existing subgraph-based interpretable methods suffer from an overemphasis on local structure, potentially overlooking long-range dependencies within the entire graphs. Although recent efforts that rely on graph… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted by ICLR 2025

  26. arXiv:2504.20461  [pdf, other

    cs.DC

    Efficient Graph-Based Approximate Nearest Neighbor Search Achieving: Low Latency Without Throughput Loss

    Authors: Jingjia Luo, Mingxing Zhang, Kang Chen, Xia Liao, Yingdi Shan, Jinlei Jiang, Yongwei Wu

    Abstract: The increase in the dimensionality of neural embedding models has enhanced the accuracy of semantic search capabilities but also amplified the computational demands for Approximate Nearest Neighbor Searches (ANNS). This complexity poses significant challenges in online and interactive services, where query latency is a critical performance metric. Traditional graph-based ANNS methods, while effect… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  27. arXiv:2504.20026  [pdf, other

    cs.CV cs.AI

    LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

    Authors: Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Cheng Zhang, Yufeng Zhu, Carl S. Marshall, Yufeng Ren, Richard Newcombe, Zhao Dong

    Abstract: We present Large Inverse Rendering Model (LIRM), a transformer architecture that jointly reconstructs high-quality shape, materials, and radiance fields with view-dependent effects in less than a second. Our model builds upon the recent Large Reconstruction Models (LRMs) that achieve state-of-the-art sparse-view reconstruction quality. However, existing LRMs struggle to reconstruct unseen parts ac… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025

  28. Provably Secure Public-Key Steganography Based on Admissible Encoding

    Authors: Xin Zhang, Kejiang Chen, Na Zhao, Weiming Zhang, Nenghai Yu

    Abstract: The technique of hiding secret messages within seemingly harmless covertext to evade examination by censors with rigorous security proofs is known as provably secure steganography (PSS). PSS evolves from symmetric key steganography to public-key steganography, functioning without the requirement of a pre-shared key and enabling the extension to multi-party covert communication and identity verific… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 16 pages, 3 figures

    Journal ref: IEEE Transactions on Information Forensics and Security, vol. 20, pp. 3161-3175, 2025

  29. arXiv:2504.19161  [pdf, other

    cs.CV

    RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling

    Authors: Zheng Fang, Kangjun Liu, Ke Chen, Qingyu Liu, Jianguo Zhang, Lingyang Song, Yaowei Wang

    Abstract: The task of radio map estimation aims to generate a dense representation of electromagnetic spectrum quantities, such as the received signal strength at each grid point within a geographic region, based on measurements from a subset of spatially distributed nodes (represented as pixels). Recently, deep vision models such as the U-Net have been adapted to radio map estimation, whose effectiveness c… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  30. arXiv:2504.17670  [pdf, other

    cs.CV

    DiMeR: Disentangled Mesh Reconstruction Model

    Authors: Lutao Jiang, Jiantao Lin, Kanghao Chen, Wenhang Ge, Xin Yang, Yifan Jiang, Yuanhuiyi Lyu, Xu Zheng, Yingcong Chen

    Abstract: With the advent of large-scale 3D datasets, feed-forward 3D generative models, such as the Large Reconstruction Model (LRM), have gained significant attention and achieved remarkable success. However, we observe that RGB images often lead to conflicting training objectives and lack the necessary clarity for geometry reconstruction. In this paper, we revisit the inductive biases associated with mes… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Project Page: https://lutao2021.github.io/DiMeR_page/

  31. arXiv:2504.17449  [pdf, other

    cs.LG cs.AI cs.CL

    HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models

    Authors: Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, Qin Xie, Guiming Xie, Xuejian Gong

    Abstract: The significant computational demands of pretrained language models (PLMs), which often require dedicated hardware, present a substantial challenge in serving them efficiently, especially in multi-tenant environments. To address this, we introduce HMI, a Hierarchical knowledge management-based Multi-tenant Inference system, designed to manage tenants with distinct PLMs resource-efficiently. Our ap… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted by VLDBJ 2025

  32. arXiv:2504.17448  [pdf, other

    cs.LG cs.DB cs.DC

    CHASe: Client Heterogeneity-Aware Data Selection for Effective Federated Active Learning

    Authors: Jun Zhang, Jue Wang, Huan Li, Zhongle Xie, Ke Chen, Lidan Shou

    Abstract: Active learning (AL) reduces human annotation costs for machine learning systems by strategically selecting the most informative unlabeled data for annotation, but performing it individually may still be insufficient due to restricted data diversity and annotation budget. Federated Active Learning (FAL) addresses this by facilitating collaborative data selection and model training, while preservin… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted by TKDE 2025

  33. arXiv:2504.17276  [pdf, other

    cs.LG

    HeRB: Heterophily-Resolved Structure Balancer for Graph Neural Networks

    Authors: Ke-Jia Chen, Wenhui Mu, Zheng Liu

    Abstract: Recent research has witnessed the remarkable progress of Graph Neural Networks (GNNs) in the realm of graph data representation. However, GNNs still encounter the challenge of structural imbalance. Prior solutions to this problem did not take graph heterophily into account, namely that connected nodes process distinct labels or features, thus resulting in a deficiency in effectiveness. Upon verify… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  34. arXiv:2504.15849  [pdf, other

    cs.IR

    NLCTables: A Dataset for Marrying Natural Language Conditions with Table Discovery

    Authors: Lingxi Cui, Huan Li, Ke Chen, Lidan Shou, Gang Chen

    Abstract: With the growing abundance of repositories containing tabular data, discovering relevant tables for in-depth analysis remains a challenging task. Existing table discovery methods primarily retrieve desired tables based on a query table or several vague keywords, leaving users to manually filter large result sets. To address this limitation, we propose a new task: NL-conditional table discovery (nl… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: accepted by SIGIR'25

    MSC Class: 68P20

  35. arXiv:2504.15616  [pdf, other

    cs.LG cs.CV

    SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction

    Authors: Kai Chen, Xiaodong Zhao, Yujie Huang, Guoyu Fang, Xiao Song, Ruiping Wang, Ziyuan Wang

    Abstract: The analysis and prediction of agent trajectories are crucial for decision-making processes in intelligent systems, with precise short-term trajectory forecasting being highly significant across a range of applications. Agents and their social interactions have been quantified and modeled by researchers from various perspectives; however, substantial limitations exist in the current work due to th… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 11 pages,6 figures

  36. arXiv:2504.15139  [pdf, other

    cs.CR

    GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security

    Authors: Xiangkun Wang, Kejiang Chen, Yuang Qi, Ruiheng Liu, Weiming Zhang, Nenghai Yu

    Abstract: Minimum distortion steganography is currently the mainstream method for modification-based steganography. A key issue in this method is how to define steganographic distortion. With the rapid development of deep learning technology, the definition of distortion has evolved from manual design to deep learning design. Concurrently, rapid advancements in image generation have made generated images vi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE TIFS

  37. arXiv:2504.15026  [pdf, other

    cs.CV cs.CR

    Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models

    Authors: Zijin Yang, Xin Zhang, Kejiang Chen, Kai Zeng, Qiyi Yao, Han Fang, Weiming Zhang, Nenghai Yu

    Abstract: Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. Existing methods primarily focus on ensuring that watermark embedding does not degrade the model performance. However, they often overlook critical challenges in real-world dep… ▽ More

    Submitted 13 May, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: 18 pages, 8 figures

  38. arXiv:2504.14061  [pdf, other

    cs.CR

    Benchmarking Differentially Private Tabular Data Synthesis

    Authors: Kai Chen, Xiaochen Li, Chen Gong, Ryan McKenna, Tianhao Wang

    Abstract: Differentially private (DP) tabular data synthesis generates artificial data that preserves the statistical properties of private data while safeguarding individual privacy. The emergence of diverse algorithms in recent years has introduced challenges in practical applications, such as inconsistent data processing methods, lack of in-depth algorithm analysis, and incomplete comparisons due to over… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: GitHub repository link: https://github.com/KaiChen9909/tab_bench 12 pages excluding the references and appendix

  39. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  40. arXiv:2504.13887  [pdf

    cs.HC cs.CL cs.CY

    AI as a deliberative partner fosters intercultural empathy for Americans but fails for Latin American participants

    Authors: Isabel Villanueva, Tara Bobinac, Binwei Yao, Junjie Hu, Kaiping Chen

    Abstract: Despite the growing integration of AI chatbots as conversational agents in public discourse, empirical evidence regarding their capacity to foster intercultural empathy remains limited. Using a randomized dialogue experiment, we examined how different types of AI chatbot interaction, i.e., deliberative versus non-deliberative and culturally aligned versus non-aligned, affect intercultural empathy… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  41. arXiv:2504.13835  [pdf, other

    cs.CL cs.AI

    MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

    Authors: Yicheng Chen, Yining Li, Kai Hu, Zerun Ma, Haochen Ye, Kai Chen

    Abstract: Data quality and diversity are key to the construction of effective instruction-tuning datasets. % With the increasing availability of open-source instruction-tuning datasets, it is advantageous to automatically select high-quality and diverse subsets from a vast amount of data. % Existing methods typically prioritize instance quality and use heuristic rules to maintain diversity. % However, this… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  42. arXiv:2504.13825  [pdf, other

    cs.CL cs.LG

    Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

    Authors: Junjie Yang, Junhao Song, Xudong Han, Ziqian Bi, Tianyang Wang, Chia Xin Liang, Xinyuan Song, Yichao Zhang, Qian Niu, Benji Peng, Keyu Chen, Ming Liu

    Abstract: Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various applications including image classification, object detection, language modeling, text classification, and sentiment analysis. Recent innovations in KD methods, suc… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  43. arXiv:2504.13782  [pdf, other

    quant-ph cs.DC

    Robust Decentralized Quantum Kernel Learning for Noisy and Adversarial Environment

    Authors: Wenxuan Ma, Kuan-Cheng Chen, Shang Yu, Mengxiang Liu, Ruilong Deng

    Abstract: This paper proposes a general decentralized framework for quantum kernel learning (QKL). It has robustness against quantum noise and can also be designed to defend adversarial information attacks forming a robust approach named RDQKL. We analyze the impact of noise on QKL and study the robustness of decentralized QKL to the noise. By integrating robust decentralized optimization techniques, our me… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  44. arXiv:2504.13489  [pdf, ps, other

    cs.DS

    New Results on a General Class of Minimum Norm Optimization Problems

    Authors: Kuowen Chen, Jian Li, Yuval Rabani, Yiran Zhang

    Abstract: We study the general norm optimization for combinatorial problems, initiated by Chakrabarty and Swamy (STOC 2019). We propose a general formulation that captures a large class of combinatorial structures: we are given a set $U$ of $n$ weighted elements and a family of feasible subsets $F$. Each subset $S\in F$ is called a feasible solution/set of the problem. We denote the value vector by… ▽ More

    Submitted 29 April, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: The abstract is shortened due to the length limit of arXiv. This paper has been accepted by ICALP 2025

  45. An Addendum to NeBula: Towards Extending TEAM CoSTAR's Solution to Larger Scale Environments

    Authors: Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Sung-Kyun Kim, Muhammad Fadhil Ginting, Xianmei Lei, Jeffrey Edlund, Seyed Fakoorian, Amanda Bouman, Fernando Chavez, Taeyeon Kim, Gustavo J. Correa, Maira Saboia, Angel Santamaria-Navarro, Brett Lopez, Boseong Kim, Chanyoung Jung, Mamoru Sobue, Oriana Claudia Peltzer, Joshua Ott, Robert Trybula, Thomas Touma, Marcel Kaufmann, Tiago Stegun Vaquero , et al. (64 additional authors not shown)

    Abstract: This paper presents an appendix to the original NeBula autonomy solution developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), participating in the DARPA Subterranean Challenge. Specifically, this paper presents extensions to NeBula's hardware, software, and algorithmic components that focus on increasing the range and scale of the exploration environment. From the algorithm… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Journal ref: IEEE Transactions on Field Robotics, vol. 1, pp. 476-526, 2024

  46. arXiv:2504.13190  [pdf, other

    cs.NI eess.SP

    Cellular-X: An LLM-empowered Cellular Agent for Efficient Base Station Operations

    Authors: Liujianfu Wang, Xinyi Long, Yuyang Du, Xiaoyan Liu, Kexin Chen, Soung Chang Liew

    Abstract: This paper introduces Cellular-X, an LLM-powered agent designed to automate cellular base station (BS) maintenance. Leveraging multimodal LLM and retrieval-augmented generation (RAG) techniques, Cellular-X significantly enhances field engineer efficiency by quickly interpreting user intents, retrieving relevant technical information, and configuring a BS through iterative self-correction. Key feat… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: MobiSys ’25, June 23-27, 2025, Anaheim, CA, USA

  47. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  48. arXiv:2504.11286  [pdf, other

    eess.IV cs.CV

    Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain

    Authors: Pengcheng Zheng, Kecheng Chen, Jiaxin Huang, Bohao Chen, Ju Liu, Yazhou Ren, Xiaorong Pu

    Abstract: Medical image restoration tasks aim to recover high-quality images from degraded observations, exhibiting emergent desires in many clinical scenarios, such as low-dose CT image denoising, MRI super-resolution, and MRI artifact removal. Despite the success achieved by existing deep learning-based restoration methods with sophisticated modules, they struggle with rendering computationally-efficient… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  49. arXiv:2504.10479  [pdf, other

    cs.CV

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Authors: Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang , et al. (26 additional authors not shown)

    Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single p… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report

  50. arXiv:2504.09474  [pdf, other

    cs.SE cs.AI cs.OS

    MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions

    Authors: Pucheng Dang, Di Huang, Dong Li, Kang Chen, Yuanbo Wen, Qi Guo, Xing Hu, Ninghui Sun

    Abstract: Out-of-tree kernel patches are essential for adapting the Linux kernel to new hardware or enabling specific functionalities. Maintaining and updating these patches across different kernel versions demands significant effort from experienced engineers. Large language models (LLMs) have shown remarkable progress across various domains, suggesting their potential for automating out-of-tree kernel pat… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.