Skip to main content

Showing 51–100 of 681 results for author: Gong, Z

.
  1. arXiv:2504.17237  [pdf, ps, other

    quant-ph cs.IT

    Quantum-Enhanced Change Detection and Joint Communication-Detection

    Authors: Zihao Gong, Saikat Guha

    Abstract: Quick detection of transmittance changes in optical channel is crucial for secure communication. We demonstrate that pre-shared entanglement using two-mode squeezed vacuum states significantly reduces detection latency compared to classical and entanglement-augmented coherent-state probes. The change detection latency is inversely proportional to the quantum relative entropy (QRE), which goes to i… ▽ More

    Submitted 15 June, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: 9 pages, 5 figures. to be submitted to Physical Review A. Conference version accepted by ISIT 2025

  2. arXiv:2504.14971  [pdf

    physics.flu-dyn

    A novel hybrid neural network of fluid-structure interaction prediction for two cylinders in tandem arrangement

    Authors: Yanfang Lyu, Yunyang Zhang, Zhiqiang Gong, Xiao Kang, Wen Yao, Yongmao Pei

    Abstract: Deep learning has shown promise in improving computing efficiency while ensuring modeling accuracy in fluid-structure interaction (FSI) analysis. However, its current capabilities are limited when it comes to constructing multi-object coupling systems with dynamic boundaries. To address such limitation, a novel FSI neural solver integrated by a fluid deep learning model with multi-time steps and a… ▽ More

    Submitted 24 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  3. arXiv:2504.11358  [pdf, other

    cs.CR cs.AI

    DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks

    Authors: Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong

    Abstract: LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to determine whether a given input is contaminated by an injected prompt. However, existing detection methods have limited effectiveness against state-of-the-art attacks, let alone adaptive ones. In this w… ▽ More

    Submitted 15 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Distinguished Paper Award in IEEE Symposium on Security and Privacy, 2025

  4. arXiv:2504.10281  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall cs.AI cs.CV cs.LG

    Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

    Authors: Jingyun Yang, Ruoyan Avery Yin, Chi Jiang, Yuepeng Hu, Xiaokai Zhu, Xingjian Hu, Sutharsika Kumar, Xiao Wang, Xiaohua Zhai, Keran Rong, Yunyue Zhu, Tianyi Zhang, Zongyou Yin, Jing Kong, Neil Zhenqiang Gong, Zhichu Ren, Haozhe Wang

    Abstract: Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehendin… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 13 pages, 4 figures

  5. arXiv:2504.06220  [pdf, other

    cs.CV

    Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation

    Authors: Xiaoxing Hu, Ziyang Gong, Yupei Wang, Yuru Jia, Gen Luo, Xue Yang

    Abstract: Parameter-Efficient Fine-Tuning (PEFT) is a technique that allows us to adapt powerful Foundation Models (FMs) to diverse downstream tasks while preserving and unleashing their inherent capabilities. However, we have observed that existing PEFT methods, which are often designed with natural imagery in mind, struggle when applied to Remote Sensing (RS) scenarios. This is primarily due to their inab… ▽ More

    Submitted 16 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  6. arXiv:2504.05735  [pdf

    physics.optics

    Anomalous Maxwell-Garnett theory for photonic time crystals

    Authors: Zheng Gong, Ruoxi Chen, Hongsheng Chen, Xiao Lin

    Abstract: Maxwell-Garnett theory, dating back to James Clerk Maxwell-Garnett's foundational work in 1904, provides a simple yet powerful framework to describe the inhomogeneous structure as an effective homogeneous medium, which significantly reduces the overall complexity of analysis, calculation, and design. As such, the Maxwell-Garnett theory enables many practical applications in diverse realms, ranging… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  7. arXiv:2504.05138  [pdf, other

    cs.LG cs.DC

    Towards Optimal Heterogeneous Client Sampling in Multi-Model Federated Learning

    Authors: Haoran Zhang, Zejun Gong, Zekai Li, Marie Siew, Carlee Joe-Wong, Rachid El-Azouzi

    Abstract: Federated learning (FL) allows edge devices to collaboratively train models without sharing local data. As FL gains popularity, clients may need to train multiple unrelated FL models, but communication constraints limit their ability to train all models simultaneously. While clients could train FL models sequentially, opportunistically having FL clients concurrently train different models -- terme… ▽ More

    Submitted 21 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: 29 pages with full proofs

    ACM Class: I.2.11

  8. arXiv:2504.04041  [pdf, ps, other

    quant-ph cs.CR

    Authenticated Sublinear Quantum Private Information Retrieval

    Authors: Fengxia Liu, Zhiyong Zheng, Kun Tian, Yi Zhang, Heng Guo, Zhe Hu, Oleksiy Zhedanov, Zixian Gong

    Abstract: This paper introduces a novel lower bound on communication complexity using quantum relative entropy and mutual information, refining previous classical entropy-based results. By leveraging Uhlmann's lemma and quantum Pinsker inequalities, the authors establish tighter bounds for information-theoretic security, demonstrating that quantum protocols inherently outperform classical counterparts in ba… ▽ More

    Submitted 26 May, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: 11 pages, 1 figure

  9. arXiv:2504.03116  [pdf, other

    physics.plasm-ph astro-ph.HE

    Electron Penetration Acceleration in Turbulent Magnetic Loops

    Authors: Zheng Gong, Sida Cao, Caleb Redshaw, Matthew R. Edwards

    Abstract: Using particle-in-cell simulations to study fast radio burst (FRB) propagation in a tenuous plasma, we identified a novel mechanism that occurs during the growth of turbulent magnetic loops: electron penetration acceleration. The loops have an electromagnetic left-hand chirality distinct from that of well-known quasistatic magnetic islands. The fast electrons penetrate through the loops and thus a… ▽ More

    Submitted 30 May, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: 9 pages, 6 figures

  10. arXiv:2504.02304  [pdf, other

    cs.CL

    Measurement of LLM's Philosophies of Human Nature

    Authors: Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Wangmeng Zuo

    Abstract: The widespread application of artificial intelligence (AI) in various tasks, along with frequent reports of conflicts or violations involving AI, has sparked societal concerns about interactions with AI systems. Based on Wrightsman's Philosophies of Human Nature Scale (PHNS), a scale empirically validated over decades to effectively assess individuals' attitudes toward human nature, we design the… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  11. arXiv:2503.22557  [pdf, other

    cs.CV

    MO-CTranS: A unified multi-organ segmentation model learning from multiple heterogeneously labelled datasets

    Authors: Zhendi Gong, Susan Francis, Eleanor Cox, Stamatios N. Sotiropoulos, Dorothee P. Auer, Guoping Qiu, Andrew P. French, Xin Chen

    Abstract: Multi-organ segmentation holds paramount significance in many clinical tasks. In practice, compared to large fully annotated datasets, multiple small datasets are often more accessible and organs are not labelled consistently. Normally, an individual model is trained for each of these datasets, which is not an effective way of using data for model learning. It remains challenging to train a single… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted by International Symposium on Biomedical Imaging (ISIB) 2025 as an oral presentation

    ACM Class: I.2; I.4.6

  12. arXiv:2503.22413  [pdf, other

    cs.CR cs.LG

    Instance-Level Data-Use Auditing of Visual ML Models

    Authors: Zonghao Huang, Neil Zhenqiang Gong, Michael K. Reiter

    Abstract: The growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the urgent need for reliable data-use auditing mechanisms to ensure accountability and transparency in ML. In this paper, we present the first proactive instance-level data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML mode… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  13. arXiv:2503.21778  [pdf, other

    cs.CV

    HS-SLAM: Hybrid Representation with Structural Supervision for Improved Dense SLAM

    Authors: Ziren Gong, Fabio Tosi, Youmin Zhang, Stefano Mattoccia, Matteo Poggi

    Abstract: NeRF-based SLAM has recently achieved promising results in tracking and reconstruction. However, existing methods face challenges in providing sufficient scene representation, capturing structural information, and maintaining global consistency in scenes emerging significant movement or being forgotten. To this end, we present HS-SLAM to tackle these problems. To enhance scene representation capac… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: ICRA 2025. Project Page: https://zorangong.github.io/HS-SLAM/

  14. arXiv:2503.17793  [pdf, other

    cs.LG cs.AI cs.CL

    Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

    Authors: Codefuse, Ling Team, :, Wenting Cai, Yuchen Cao, Chaoyu Chen, Chen Chen, Siba Chen, Qing Cui, Peng Di, Junpeng Fang, Zi Gong, Ting Guo, Zhengyu He, Yang Huang, Cong Li, Jianguo Li, Zheng Li, Shijie Lian, BingChang Liu, Songshan Luo, Shuo Mao, Min Shen, Jian Wu, Jiaolong Yang , et al. (8 additional authors not shown)

    Abstract: Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the Deep… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: 20 pages, 6 figures

    ACM Class: I.2.7

  15. arXiv:2503.16023  [pdf, other

    cs.CR

    BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

    Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: Multi-modal large language models (MLLMs) extend large language models (LLMs) to process multi-modal information, enabling them to generate responses to image-text inputs. MLLMs have been incorporated into diverse multi-modal applications, such as autonomous driving and medical diagnosis, via plug-and-play without fine-tuning. This deployment paradigm increases the vulnerability of MLLMs to backdo… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: This paper is accepted by CVPR 2025

  16. Waveguide QED with dissipative light-matter couplings

    Authors: Xing-Liang Dong, Peng-Bo Li, Zongping Gong, Franco Nori

    Abstract: Dissipative light-matter coupling plays a vital role in non-Hermitian physics, but it remains largely unexplored in waveguide QED systems. In this work, we find that by employing pseudo-Hermitian symmetry rather than anti-PT symmetry, the concept of dissipative coupling could be generalized and applied to the field of waveguide QED. This leads to a series of intriguing results, such as spontaneous… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 7 pages, 4 figures

    Journal ref: Phys. Rev. Research 7, L012036 (2025)

  17. arXiv:2503.11074  [pdf, other

    cs.AI cs.CL

    Exploring the Necessity of Reasoning in LLM-based Agent Scenarios

    Authors: Xueyang Zhou, Guiyao Tie, Guowen Zhang, Weidong Wang, Zhigang Zuo, Di Wu, Duanfeng Chu, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning. Yet, this progress disrupts traditional agent frameworks, traditionally anchored by execution-oriented Large Language Models (LLMs). To explore this transformation, we propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving, assessed with th… ▽ More

    Submitted 27 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: 71 pages, 11 figures, 8 tables

  18. arXiv:2503.10484  [pdf, other

    cs.RO

    Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality

    Authors: Wei Xiao, Shangke Lyu, Zhefei Gong, Renjie Wang, Donglin Wang

    Abstract: Existing quadrupedal locomotion learning paradigms usually rely on extensive domain randomization to alleviate the sim2real gap and enhance robustness. It trains policies with a wide range of environment parameters and sensor noises to perform reliably under uncertainty. However, since optimal performance under ideal conditions often conflicts with the need to handle worst-case scenarios, there is… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  19. arXiv:2503.08976  [pdf, other

    cs.LG cs.CR cs.DC

    Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning

    Authors: Zirui Gong, Yanjun Zhang, Leo Yu Zhang, Zhaoxi Zhang, Yong Xiang, Shirui Pan

    Abstract: Federated Ranking Learning (FRL) is a state-of-the-art FL framework that stands out for its communication efficiency and resilience to poisoning attacks. It diverges from the traditional FL framework in two ways: 1) it leverages discrete rankings instead of gradient updates, significantly reducing communication costs and limiting the potential space for malicious updates, and 2) it uses majority v… ▽ More

    Submitted 22 April, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 18 pages. To appear in the IEEE Symposium on Security and Privacy 2025

  20. arXiv:2503.07890  [pdf, other

    cs.CV

    Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?

    Authors: Yuru Jia, Valerio Marsocci, Ziyang Gong, Xue Yang, Maarten Vergauwen, Andrea Nascetti

    Abstract: Self-supervised learning (SSL) has revolutionized representation learning in Remote Sensing (RS), advancing Geospatial Foundation Models (GFMs) to leverage vast unlabeled satellite imagery for diverse downstream tasks. Currently, GFMs primarily focus on discriminative objectives, such as contrastive learning or masked image modeling, owing to their proven success in learning transferable represent… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  21. arXiv:2503.06254  [pdf, other

    cs.CR cs.LG

    Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation

    Authors: Yinuo Liu, Zenghui Yuan, Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

    Abstract: Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) by dynamically accessing information from external knowledge bases. In this work, we introduce \textit{Poisoned-MRAG}, the first knowledge poisoning attack on multimodal RAG systems. Poisoned-MRAG injects a few carefully crafted image-text pairs into the multimodal knowledge da… ▽ More

    Submitted 14 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  22. arXiv:2503.06072  [pdf, other

    cs.CL cs.AI

    Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning

    Authors: Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific per… ▽ More

    Submitted 20 May, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: 87 pages, 21 figures, 9 tables

  23. arXiv:2503.05119  [pdf, other

    cs.LG

    AI-driven Prediction of Insulin Resistance in Normal Populations: Comparing Models and Criteria

    Authors: Weihao Gao, Zhuo Deng, Zheng Gong, Ziyi Jiang, Lan Ma

    Abstract: Insulin resistance (IR) is a key precursor to diabetes and a significant risk factor for cardiovascular disease. Traditional IR assessment methods require multiple blood tests. We developed a simple AI model using only fasting blood glucose to predict IR in non-diabetic populations. Data from the NHANES (1999-2020) and CHARLS (2015) studies were used for model training and validation. Input featur… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 20 pages, 8 figures

    MSC Class: 68T10 ACM Class: J.3

  24. arXiv:2503.04064  [pdf, other

    cs.CL cs.AI cs.CY

    Uncovering inequalities in new knowledge learning by large language models across different languages

    Authors: Chenglong Wang, Haoyu Tang, Xiyuan Yang, Yueqi Xie, Jina Suh, Sunayana Sitaram, Junming Huang, Yu Xie, Zhaoya Gong, Xing Xie, Fangzhao Wu

    Abstract: As large language models (LLMs) gradually become integral tools for problem solving in daily life worldwide, understanding linguistic inequality is becoming increasingly important. Existing research has primarily focused on static analyses that assess the disparities in the existing knowledge and capabilities of LLMs across languages. However, LLMs are continuously evolving, acquiring new knowledg… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  25. arXiv:2503.03964  [pdf, other

    astro-ph.CO

    Cosmology with second and third-order shear statistics for the Dark Energy Survey: Methods and simulated analysis

    Authors: R. C. H. Gomes, S. Sugiyama, B. Jain, M. Jarvis, D. Anbajagane, M. Gatti, D. Gebauer, Z. Gong, A. Halder, G. A. Marques, S. Pandey, J. L. Marshall, S. Allam, O. Alves, F. Andrade-Oliveira, D. Bacon, J. Blazek, S. Bocquet, D. Brooks, A. Carnero Rosell, J. Carretero, L. N. da Costa, P. Doel, C. Doux, S. Everett , et al. (34 additional authors not shown)

    Abstract: We present a new pipeline designed for the robust inference of cosmological parameters using both second- and third-order shear statistics. We build a theoretical model for rapid evaluation of three-point correlations using our fastnc code and integrate it into the CosmoSIS framework. We measure the two-point functions $ξ_{\pm}$ and the full configuration-dependent three-point shear correlation fu… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 25 pages, 15 figures

  26. arXiv:2503.02351  [pdf, other

    q-bio.NC cs.AI

    MindSimulator: Exploring Brain Concept Localization via Synthetic FMRI

    Authors: Guangyin Bao, Qi Zhang, Zixuan Gong, Zhuojia Wu, Duoqian Miao

    Abstract: Concept-selective regions within the human cerebral cortex exhibit significant activation in response to specific visual stimuli associated with particular concepts. Precisely localizing these regions stands as a crucial long-term goal in neuroscience to grasp essential brain functions and mechanisms. Conventional experiment-driven approaches hinge on manually constructed visual stimulus collectio… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 23 pages, ICLR 2025

  27. arXiv:2503.01839  [pdf, other

    cs.CR cs.AI cs.CL cs.CV

    Jailbreaking Safeguarded Text-to-Image Models via Large Language Models

    Authors: Zhengyuan Jiang, Yuepeng Hu, Yuchen Yang, Yinzhi Cao, Neil Zhenqiang Gong

    Abstract: Text-to-Image models may generate harmful content, such as pornographic images, particularly when unsafe prompts are submitted. To address this issue, safety filters are often added on top of text-to-image models, or the models themselves are aligned to reduce harmful outputs. However, these defenses remain vulnerable when an attacker strategically designs adversarial prompts to bypass these safet… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  28. arXiv:2502.20681  [pdf, other

    cs.CL cs.AI cs.LG

    Disentangling Feature Structure: A Mathematically Provable Two-Stage Training Dynamics in Transformers

    Authors: Zixuan Gong, Jiaye Teng, Yong Liu

    Abstract: Transformers may exhibit two-stage training dynamics during the real-world training process. For instance, when training GPT-2 on the Counterfact dataset, the answers progress from syntactically incorrect to syntactically correct to semantically correct. However, existing theoretical analyses hardly account for this two-stage phenomenon. In this paper, we theoretically demonstrate how such two-sta… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  29. arXiv:2502.20623  [pdf, other

    cs.CR cs.CV

    SafeText: Safe Text-to-image Models via Aligning the Text Encoder

    Authors: Yuepeng Hu, Zhengyuan Jiang, Neil Zhenqiang Gong

    Abstract: Text-to-image models can generate harmful images when presented with unsafe prompts, posing significant safety and societal risks. Alignment methods aim to modify these models to ensure they generate only non-harmful images, even when exposed to unsafe prompts. A typical text-to-image model comprises two main components: 1) a text encoder and 2) a diffusion module. Existing alignment methods mainl… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  30. arXiv:2502.20013  [pdf, other

    eess.SY physics.app-ph

    Data-Driven Model Identification of Unbalanced Induction Motor Dynamics and Forces using SINDYc

    Authors: Emma Vancayseele, Philip Desenfans, Zifeng Gong, Dries Vanoost, Herbert De Gersem, Davy Pissoort

    Abstract: This paper identifies the stator currents, torque and unbalanced magnetic pull (UMP) of an unbalanced induction motor by the System Identification of Nonlinear Dynamics with Control (SINDYc) method from time-series data of measurable quantities. The SINDYc model has been trained on data coming from a nonlinear magnetic equivalent circuit model for three rotor eccentricity configurations. When eval… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    ACM Class: I.2; I.6; J.2

  31. arXiv:2502.17024  [pdf, other

    cs.CL cs.LG stat.ML

    Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization

    Authors: Zixuan Gong, Xiaolin Hu, Huayi Tang, Yong Liu

    Abstract: Large language models (LLMs) have demonstrated remarkable in-context learning (ICL) abilities. However, existing theoretical analysis of ICL primarily exhibits two limitations: (a) Limited i.i.d. Setting. Most studies focus on supervised function learning tasks where prompts are constructed with i.i.d. input-label pairs. This i.i.d. assumption diverges significantly from real language learning sce… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Published at ICLR 2025

  32. arXiv:2502.16065  [pdf, other

    cs.CR cs.AI cs.LG

    A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments

    Authors: Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong

    Abstract: Model Extraction Attacks (MEAs) threaten modern machine learning systems by enabling adversaries to steal models, exposing intellectual property and training data. With the increasing deployment of machine learning models in distributed computing environments, including cloud, edge, and federated learning settings, each paradigm introduces distinct vulnerabilities and challenges. Without a unified… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  33. arXiv:2502.14296  [pdf, other

    cs.CY

    On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

    Authors: Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Yujun Zhou, Yanbo Wang, Jiayi Ye, Jiawen Shi, Qihui Zhang, Yuan Li, Han Bao, Zhaoyi Liu, Tianrui Guan, Dongping Chen, Ruoxi Chen, Kehan Guo, Andy Zou, Bryan Hooi Kuen-Yew, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang, Hongzhi Yin, Huan Zhang, Huaxiu Yao , et al. (41 additional authors not shown)

    Abstract: Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, a… ▽ More

    Submitted 11 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  34. arXiv:2502.13508  [pdf, other

    cs.RO

    VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation

    Authors: Wei Zhao, Pengxiang Ding, Min Zhang, Zhefei Gong, Shuanghao Bai, Han Zhao, Donglin Wang

    Abstract: Vision-language-action models (VLAs) have become increasingly popular in robot manipulation for their end-to-end design and remarkable performance. However, existing VLAs rely heavily on vision-language models (VLMs) that only support text-based instructions, neglecting the more natural speech modality for human-robot interaction. Traditional speech integration methods usually involves a separate… ▽ More

    Submitted 21 February, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted as a conference paper at ICLR 2025

  35. arXiv:2502.12604  [pdf, other

    cs.CV

    S2C: Learning Noise-Resistant Differences for Unsupervised Change Detection in Multimodal Remote Sensing Images

    Authors: Lei Ding, Xibing Zuo, Danfeng Hong, Haitao Guo, Jun Lu, Zhihui Gong, Lorenzo Bruzzone

    Abstract: Unsupervised Change Detection (UCD) in multimodal Remote Sensing (RS) images remains a difficult challenge due to the inherent spatio-temporal complexity within data, and the heterogeneity arising from different imaging sensors. Inspired by recent advancements in Visual Foundation Models (VFMs) and Contrastive Learning (CL) methodologies, this research aims to develop CL methodologies to translate… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  36. arXiv:2502.12378  [pdf, ps, other

    cs.CL

    Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges

    Authors: Bolei Ma, Yuting Li, Wei Zhou, Ziwei Gong, Yang Janet Liu, Katja Jasinskaja, Annemarie Friedrich, Julia Hirschberg, Frauke Kreuter, Barbara Plank

    Abstract: Understanding pragmatics-the use of language in context-is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to unde… ▽ More

    Submitted 12 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: ACL 2025

  37. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  38. arXiv:2502.10973  [pdf, ps, other

    cs.CL

    Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues

    Authors: David Sasu, Zehui Wu, Ziwei Gong, Run Chen, Pengyuan Shi, Lin Ai, Julia Hirschberg, Natalie Schluter

    Abstract: In this paper, we introduce the Akan Conversation Emotion (ACE) dataset, the first multimodal emotion dialogue dataset for an African language, addressing the significant lack of resources for low-resource languages in emotion recognition research. ACE, developed for the Akan language, contains 385 emotion-labeled dialogues and 6,162 utterances across audio, visual, and textual modalities, along w… ▽ More

    Submitted 2 June, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

    Comments: Accepted to Findings at ACL 2025

  39. arXiv:2502.08123  [pdf, other

    cs.CR cs.DC cs.LG

    Provably Robust Federated Reinforcement Learning

    Authors: Minghong Fang, Xilong Wang, Neil Zhenqiang Gong

    Abstract: Federated reinforcement learning (FRL) allows agents to jointly learn a global decision-making policy under the guidance of a central server. While FRL has advantages, its decentralized design makes it prone to poisoning attacks. To mitigate this, Byzantine-robust aggregation techniques tailored for FRL have been introduced. Yet, in our work, we reveal that these current Byzantine-robust technique… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: To appear in The Web Conference 2025

  40. arXiv:2502.06805  [pdf, ps, other

    cs.LG cs.GR

    Efficient Diffusion Models: A Survey

    Authors: Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang

    Abstract: Diffusion models have emerged as powerful generative models capable of producing high-quality contents such as images, videos, and audio, demonstrating their potential to revolutionize digital content creation. However, these capabilities come at the cost of their significant computational resources and lengthy generation time, underscoring the critical need to develop efficient techniques for pra… ▽ More

    Submitted 6 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Published in Transactions on Machine Learning Research (TMLR-2025)

  41. arXiv:2502.05424  [pdf, other

    cs.CL cs.AI

    SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation

    Authors: Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, Hui Zhang

    Abstract: Graphs are able to model interconnected entities in many online services, supporting a wide range of applications on the Web. This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain? A major obstacle is that graphs from different domains often exhibit divergent characteristics. Some studies leverage large langua… ▽ More

    Submitted 12 April, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW2025 Main Track

  42. arXiv:2502.04624  [pdf, other

    cond-mat.mtrl-sci

    Pure momentum-shift bulk photovoltaic effect in ferroelectric flat-band Mott insulators

    Authors: Zhuocheng Lu, Zhihao Gong, Jingshan Qi, Hua Wang, Kai Chang

    Abstract: The shift current photovoltaic effect is conventionally understood as the real-space displacement of a wave packet induced by photoexcitation. However, this interpretation becomes insufficient in flat-band systems, where quasiparticles are too massive to accelerate in real space under the optical electric field. Here, we developed a physically consistent method to decompose the shift current into… ▽ More

    Submitted 24 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

  43. arXiv:2502.04574  [pdf

    q-bio.NC cs.IT stat.AP

    Dark Brain Energy: Toward an Integrative Model of Spontaneous Slow Oscillations

    Authors: ZhuQing Gong, XiNian Zuo

    Abstract: Neural oscillations facilitate the functioning of the human brain in spatial and temporal dimensions at various frequencies. These oscillations feature a universal frequency architecture that is governed by brain anatomy, ensuring frequency specificity remains invariant across different measurement techniques. Initial magnetic resonance imaging (MRI) methodology constrained functional MRI (fMRI) i… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: 38 pages, 6 figures

  44. arXiv:2502.03457  [pdf, other

    astro-ph.CO

    Clustering of the extreme: A theoretical description of weak lensing critical points power spectra in the mildly nonlinear regime

    Authors: Zhengyangguang Gong, Alexandre Barthelemy, Sandrine Codis

    Abstract: In cosmic web analysis, complementary to traditional cosmological probes, the extrema (e.g. peaks and voids) two-point correlation functions (2PCFs) are of particular interest for the study of both astrophysical phenomena and cosmological structure formation. However most previous studies constructed those statistics via N-body simulations without a robust theoretical derivation from first princip… ▽ More

    Submitted 6 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 26 pages, 11 figures. Comments are welcome

  45. arXiv:2501.17443  [pdf, ps, other

    cs.LG

    Gradual Domain Adaptation for Graph Learning

    Authors: Pui Ieng Lei, Ximing Chen, Yijun Sheng, Yanyan Liu, Jingzhi Guo, Zhiguo Gong

    Abstract: Existing literature lacks a graph domain adaptation technique for handling large distribution shifts, primarily due to the difficulty in simulating an evolving path from source to target graph. To make a breakthrough, we present a graph gradual domain adaptation (GGDA) framework with the construction of a compact domain sequence that minimizes information loss in adaptations. Our approach starts w… ▽ More

    Submitted 27 June, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  46. arXiv:2501.16741  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall

    Quantum Geometric Origin of Strain-Tunable Giant Second-Harmonic Generation in Bi$_2$O$_2$X (X=S, Se, Te)

    Authors: Zhefeng Lou, Zhihao Gong, Ziye Zhu, Wenbin Li, Xiao Lin, Hua Wang

    Abstract: Two-dimensional (2D) materials with giant nonlinear optical (NLO) responses are essential for the development of advanced on-chip NLO devices. Using first-principles calculations, we predict a remarkable strain-induced enhancement of second-harmonic generation (SHG) in the high-performance 2D semiconductors Bi$_2$O$_2$X (X = S, Se, Te). The SHG susceptibilities of Bi$_2$O$_2$X under strain are on… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: 8 pages, 4 figures

  47. arXiv:2501.15638  [pdf, other

    cs.LG cs.AI

    A Comprehensive Survey on Self-Interpretable Neural Networks

    Authors: Yang Ji, Ying Sun, Yuting Zhang, Zhigaoyuan Wang, Yuanxin Zhuang, Zheng Gong, Dazhong Shen, Chuan Qin, Hengshu Zhu, Hui Xiong

    Abstract: Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides explanations for pre-trained models, is often at risk of robustness and fidelity. This has inspired a rising interest in self-interpretable neural networks, which inher… ▽ More

    Submitted 21 March, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

  48. arXiv:2501.15209  [pdf, other

    quant-ph cond-mat.mes-hall cond-mat.quant-gas

    Optimal spectral transport of non-Hermitian systems

    Authors: Mingtao Xu, Zongping Gong, Wei Yi

    Abstract: The optimal transport problem seeks to minimize the total transportation cost between two distributions, thus providing a measure of distance between them. In this work, we study the optimal transport of the eigenspectrum of one-dimensional non-Hermitian models as the spectrum deforms on the complex plane under a varying imaginary gauge field. Notably, according to the non-Bloch band theory, the d… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: 12 pages,5 figures

    Journal ref: Phys. Rev. B 111, 214305(2025)

  49. arXiv:2501.01366  [pdf, ps, other

    cs.CV cs.AI cs.CL

    ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding

    Authors: Austin T. Wang, ZeMing Gong, Angel X. Chang

    Abstract: 3D visual grounding (3DVG) involves localizing entities in a 3D scene referred to by natural language text. Such models are useful for embodied AI and scene retrieval applications, which involve searching for objects or patterns using natural language descriptions. While recent works have focused on LLM-based scaling of 3DVG datasets, these datasets do not capture the full range of potential promp… ▽ More

    Submitted 7 July, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: 24 pages with 8 figures and 14 tables; updated for ACL 2025 camera-ready with additional discussion and figures

  50. arXiv:2501.00571  [pdf, other

    cs.CL

    KnowRA: Knowledge Retrieval Augmented Method for Document-level Relation Extraction with Comprehensive Reasoning Abilities

    Authors: Chengcheng Mai, Yuxiang Wang, Ziyu Gong, Hanxiang Wang, Yihua Huang

    Abstract: Document-level relation extraction (Doc-RE) aims to extract relations between entities across multiple sentences. Therefore, Doc-RE requires more comprehensive reasoning abilities like humans, involving complex cross-sentence interactions between entities, contexts, and external general knowledge, compared to the sentence-level RE. However, most existing Doc-RE methods focus on optimizing single r… ▽ More

    Submitted 1 May, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

    Comments: This work has been accepted by IJCAI 2025 (CCF A)