Search | arXiv e-print repository

DiffMark: Diffusion-based Robust Watermark Against Deepfakes

Authors: Chen Sun, Haiyang Sun, Zhiqing Guo, Yunfeng Diao, Liejun Wang, Dan Ma, Gaobo Yang, Keqin Li

Abstract: Deepfakes pose significant security and privacy threats through malicious facial manipulations. While robust watermarking can aid in authenticity verification and source tracking, existing methods often lack the sufficient robustness against Deepfake manipulations. Diffusion models have demonstrated remarkable performance in image generation, enabling the seamless fusion of watermark with image du… ▽ More Deepfakes pose significant security and privacy threats through malicious facial manipulations. While robust watermarking can aid in authenticity verification and source tracking, existing methods often lack the sufficient robustness against Deepfake manipulations. Diffusion models have demonstrated remarkable performance in image generation, enabling the seamless fusion of watermark with image during generation. In this study, we propose a novel robust watermarking framework based on diffusion model, called DiffMark. By modifying the training and sampling scheme, we take the facial image and watermark as conditions to guide the diffusion model to progressively denoise and generate corresponding watermarked image. In the construction of facial condition, we weight the facial image by a timestep-dependent factor that gradually reduces the guidance intensity with the decrease of noise, thus better adapting to the sampling process of diffusion model. To achieve the fusion of watermark condition, we introduce a cross information fusion (CIF) module that leverages a learnable embedding table to adaptively extract watermark features and integrates them with image features via cross-attention. To enhance the robustness of the watermark against Deepfake manipulations, we integrate a frozen autoencoder during training phase to simulate Deepfake manipulations. Additionally, we introduce Deepfake-resistant guidance that employs specific Deepfake model to adversarially guide the diffusion sampling process to generate more robust watermarked images. Experimental results demonstrate the effectiveness of the proposed DiffMark on typical Deepfakes. Our code will be available at https://github.com/vpsg-research/DiffMark. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2507.00861 [pdf, ps, other]

SafeMap: Robust HD Map Construction from Incomplete Observations

Authors: Xiaoshuai Hao, Lingdong Kong, Rong Yin, Pengwei Wang, Jing Zhang, Yunfeng Diao, Shu Zhao

Abstract: Robust high-definition (HD) map construction is vital for autonomous driving, yet existing methods often struggle with incomplete multi-view camera data. This paper presents SafeMap, a novel framework specifically designed to secure accuracy even when certain camera views are missing. SafeMap integrates two key components: the Gaussian-based Perspective View Reconstruction (G-PVR) module and the D… ▽ More Robust high-definition (HD) map construction is vital for autonomous driving, yet existing methods often struggle with incomplete multi-view camera data. This paper presents SafeMap, a novel framework specifically designed to secure accuracy even when certain camera views are missing. SafeMap integrates two key components: the Gaussian-based Perspective View Reconstruction (G-PVR) module and the Distillation-based Bird's-Eye-View (BEV) Correction (D-BEVC) module. G-PVR leverages prior knowledge of view importance to dynamically prioritize the most informative regions based on the relationships among available camera views. Furthermore, D-BEVC utilizes panoramic BEV features to correct the BEV representations derived from incomplete observations. Together, these components facilitate the end-to-end map reconstruction and robust HD map generation. SafeMap is easy to implement and integrates seamlessly into existing systems, offering a plug-and-play solution for enhanced robustness. Experimental results demonstrate that SafeMap significantly outperforms previous methods in both complete and incomplete scenarios, highlighting its superior performance and reliability. △ Less

Submitted 1 July, 2025; originally announced July 2025.

Comments: Accepted by ICML 2025

arXiv:2506.23292 [pdf, ps, other]

DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios

Authors: Changtao Miao, Yi Zhang, Weize Gao, Man Luo, Weiwei Feng, Zhiya Tan, Jianshu Li, Ajian Liu, Yunfeng Diao, Qi Chu, Tao Gong, Zhe Li, Weibin Yao, Joey Tianyi Zhou

Abstract: Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge. Although existing deepfake detection models demonstrate outstanding performance in detection metrics, most methods only provide simple binary classification results, lacking interpretability. In critical domai… ▽ More Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge. Although existing deepfake detection models demonstrate outstanding performance in detection metrics, most methods only provide simple binary classification results, lacking interpretability. In critical domains such as law, interpretability is crucial for enhancing the credibility and authority of decisions. Recent studies attempt to improve the interpretability of classification results by providing spatial manipulation masks or temporal forgery segments. However, the practical effectiveness of these methods remains suboptimal due to limitations of the forgery data. Most current deepfake datasets predominantly offer binary labels, only a few datasets with localization annotations. However, they suffer from restricted forgery scenarios, limited diversity in deepfake types, and insufficient data scale, making them inadequate for complex real-world scenarios. To address this predicament, we construct a novel large-scale deepfake detection and localization ($\textbf{DDL}$) dataset containing over $\textbf{1.8M}$ forged samples and encompassing up to $\textbf{75}$ distinct deepfake methods. The DDL design incorporates four key innovations: (1) $\textbf{Diverse Forgery Scenarios}$, (2) $\textbf{Comprehensive Deepfake Methods}$, (3) $\textbf{Varied Manipulation Modes}$, and (4) $\textbf{Fine-grained Forgery Annotations}$. Through these improvements, our DDL not only provides a more challenging benchmark for complex real-world forgeries, but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods. The DDL dataset project page is on https://deepfake-workshop-ijcai2025.github.io/main/index.html. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: This paper is a preliminary version, with an extended and comprehensive version currently under development

arXiv:2506.18065 [pdf, ps, other]

Liouville function, von Mangoldt function and norm forms at random binary forms

Authors: Yijie Diao

Abstract: We analyze the average behavior of various arithmetic functions at the values of degree $d$ binary forms ordered by height, with probability $1$. This approach yields averaged versions of the Chowla conjecture and the Bateman-Horn conjecture for random binary forms. Furthermore, we show that the rational Hasse principle holds for almost all Châtelet varieties defined by a fixed norm form of degree… ▽ More We analyze the average behavior of various arithmetic functions at the values of degree $d$ binary forms ordered by height, with probability $1$. This approach yields averaged versions of the Chowla conjecture and the Bateman-Horn conjecture for random binary forms. Furthermore, we show that the rational Hasse principle holds for almost all Châtelet varieties defined by a fixed norm form of degree $e$ and by varying binary forms of fixed degree $d$, provided $e$ divides $d$. This proves an average version of a conjecture of Colliot-Thélène. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: 39 pages

MSC Class: 11N32 (11N37; 11D57; 11G35)

arXiv:2506.12708 [pdf, ps, other]

Serving Large Language Models on Huawei CloudMatrix384

Authors: Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, Shirui Lu, Zhao Qiu, Peiyang Li, Xianyu Chang, Zhengzhong Yu, Fangzheng Miao, Jia Zheng, Ying Li, Yuan Feng, Bei Wang, Zaijian Zong, Mosong Zhou, Wenli Zhou, Houjiang Chen, Xingyu Liao, Yipeng Li , et al. (21 additional authors not shown)

Abstract: The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-leve… ▽ More The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-level objectives. Addressing these issues requires fundamentally redesigned hardware-software integration. This paper introduces Huawei CloudMatrix, a next-generation AI datacenter architecture, realized in the production-grade CloudMatrix384 supernode. It integrates 384 Ascend 910 NPUs and 192 Kunpeng CPUs interconnected via an ultra-high-bandwidth Unified Bus (UB) network, enabling direct all-to-all communication and dynamic pooling of resources. These features optimize performance for communication-intensive operations, such as large-scale MoE expert parallelism and distributed key-value cache access. To fully leverage CloudMatrix384, we propose CloudMatrix-Infer, an advanced LLM serving solution incorporating three core innovations: a peer-to-peer serving architecture that independently scales prefill, decode, and caching; a large-scale expert parallelism strategy supporting EP320 via efficient UB-based token dispatch; and hardware-aware optimizations including specialized operators, microbatch-based pipelining, and INT8 quantization. Evaluation with the DeepSeek-R1 model shows CloudMatrix-Infer achieves state-of-the-art efficiency: prefill throughput of 6,688 tokens/s per NPU and decode throughput of 1,943 tokens/s per NPU (<50 ms TPOT). It effectively balances throughput and latency, sustaining 538 tokens/s per NPU even under stringent 15 ms latency constraints, while INT8 quantization maintains model accuracy across benchmarks. △ Less

Submitted 19 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

Comments: 59 pages, 24 figures

arXiv:2505.24586 [pdf, ps, other]

All-sky search for individual Primordial Black Hole bursts with LHAASO

Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen , et al. (293 additional authors not shown)

Abstract: Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for… ▽ More Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for individual PBH burst events using the data collected from March 2021 to July 2024 by the Water Cherenkov Detector Array of the Large High Altitude Air Shower Observatory (LHAASO). Three PBH burst durations, 10~s, 20~s, and 100~s, are searched, with no significant PBH bursts observed. The upper limit on the local PBH burst rate density is set to be as low as 181~pc$^{-3}$~yr$^{-1}$ at 99$\%$ confidence level, representing the most stringent limit achieved to date. △ Less

Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

Comments: 8 pages, 2 figures

arXiv:2505.22604 [pdf, ps, other]

Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

Authors: Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang

Abstract: Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), wi… ▽ More Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), widely regarded as the most effective defense, suffers from performance collapse in AIGI detection. Through an information-theoretic lens, we further attribute the cause of collapse to feature entanglement, which disrupts the preservation of feature-label mutual information. Instead, standard detectors show clear feature separation. Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM), the first training-free adversarial defense for AIGI detection. TRIM builds on standard detectors and quantifies feature shifts using prediction entropy and KL divergence. Extensive experiments across multiple datasets and attacks validate the superiority of our TRIM, e.g., outperforming the state-of-the-art defense by 33.88% (28.91%) on ProGAN (GenImage), while well maintaining original accuracy. △ Less

Submitted 30 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.21874 [pdf, ps, other]

MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network

Authors: Ruiguo Yu, Yiyang Zhang, Yuan Tian, Yujie Diao, Di Jin, Witold Pedrycz

Abstract: Medical image segmentation methods generally assume that the process from medical image to segmentation is unbiased, and use neural networks to establish conditional probability models to complete the segmentation task. This assumption does not consider confusion factors, which can affect medical images, such as complex anatomical variations and imaging modality limitations. Confusion factors obfu… ▽ More Medical image segmentation methods generally assume that the process from medical image to segmentation is unbiased, and use neural networks to establish conditional probability models to complete the segmentation task. This assumption does not consider confusion factors, which can affect medical images, such as complex anatomical variations and imaging modality limitations. Confusion factors obfuscate the relevance and causality of medical image segmentation, leading to unsatisfactory segmentation results. To address this issue, we propose a multi-causal aware modeling backdoor-intervention optimization (MAMBO-NET) network for medical image segmentation. Drawing insights from causal inference, MAMBO-NET utilizes self-modeling with multi-Gaussian distributions to fit the confusion factors and introduce causal intervention into the segmentation process. Moreover, we design appropriate posterior probability constraints to effectively train the distributions of confusion factors. For the distributions to effectively guide the segmentation and mitigate and eliminate the Impact of confusion factors on the segmentation, we introduce classical backdoor intervention techniques and analyze their feasibility in the segmentation task. To evaluate the effectiveness of our approach, we conducted extensive experiments on five medical image datasets. The results demonstrate that our method significantly reduces the influence of confusion factors, leading to enhanced segmentation accuracy. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.19459 [pdf, ps, other]

Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation

Authors: Kaichao Jiang, He Wang, Xiaoshuai Hao, Xiulong Yang, Ajian Liu, Qi Chu, Yunfeng Diao

Abstract: Joint Energy-based Models (JEMs), a class of hybrid generative-discriminative models, are well known for their ability to achieve both high classification accuracy and generative capability within a single model. However, their robustness still lags significantly behind the classifiers based adversarial training (AT). Conversely, while AT is currently the most effective approach to improving the c… ▽ More Joint Energy-based Models (JEMs), a class of hybrid generative-discriminative models, are well known for their ability to achieve both high classification accuracy and generative capability within a single model. However, their robustness still lags significantly behind the classifiers based adversarial training (AT). Conversely, while AT is currently the most effective approach to improving the classifier's robustness, it typically sacrifices accuracy on clean data and lacks generative capability. The triple trade-off between classification accuracy, generative capability and robustness, raises a natural question: Can a single model simultaneously achieve high classification accuracy, adversarial robustness, and generative performance? -- a goal that has been rarely explored. To address this question, we systematically analyze the energy distribution differences of clean, adversarial, and generated samples across various JEM variants and adversarially trained models. We observe that AT tends to reduce the energy gap between clean and adversarial samples, while JEMs reduce the gap between clean and synthetic ones. This observation suggests a key insight: if the energy distributions of all three data types can be aligned, we might unify the strengths of AT and JEMs, resolving their inherent trade-offs. Building on this idea, we propose Energy-based Joint Distribution Adversarial Training (EB-JDAT), to jointly model the clean data distribution, the adversarial distribution, and the classifier by maximizing their joint probability. EB-JDAT is a general and flexible optimization method, compatible with various JEM variants. Extensive experimental results demonstrate that EB-JDAT not only maintains near original accuracy and generative capability of JEMs, but also significantly enhances robustness, even surpassing state-of-the-art ATs. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.14447 [pdf, ps, other]

First Identification and Precise Spectral Measurement of the Proton Component in the Cosmic-Ray `Knee'

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (292 additional authors not shown)

Abstract: We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and syst… ▽ More We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and systematic accuracy comparable to satellite data at lower energies. The proton spectrum shows significant hardening relative to low-energy extrapolations, culminating at 3 PeV, followed by sharp softening. This distinct spectral structure - closely aligned with the knee in the all-particle spectrum - points to the emergence of a new CR component at PeV energies, likely linked to the dozens of PeVatrons recently discovered by LHAASO, and offers crucial clues to the origin of Galactic cosmic rays. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.12565 [pdf, ps, other]

mCLM: A Function-Infused and Synthesis-Friendly Modular Chemical Language Model

Authors: Carl Edwards, Chi Han, Gawon Lee, Thao Nguyen, Bowen Jin, Chetan Kumar Prasad, Sara Szymkuć, Bartosz A. Grzybowski, Ying Diao, Jiawei Han, Ge Liu, Hao Peng, Martin D. Burke, Heng Ji

Abstract: Despite their ability to understand chemical knowledge and accurately generate sequential representations, large language models (LLMs) remain limited in their capacity to propose novel molecules with drug-like properties. In addition, the molecules that LLMs propose can often be challenging to make in the lab. To more effectively enable the discovery of functional small molecules, LLMs need to le… ▽ More Despite their ability to understand chemical knowledge and accurately generate sequential representations, large language models (LLMs) remain limited in their capacity to propose novel molecules with drug-like properties. In addition, the molecules that LLMs propose can often be challenging to make in the lab. To more effectively enable the discovery of functional small molecules, LLMs need to learn a molecular language. However, LLMs are currently limited by encoding molecules from atoms. In this paper, we argue that just like tokenizing texts into (sub-)word tokens instead of characters, molecules should be decomposed and reassembled at the level of functional building blocks, i.e., parts of molecules that bring unique functions and serve as effective building blocks for real-world automated laboratory synthesis. This motivates us to propose mCLM, a modular Chemical-Language Model tokenizing molecules into building blocks and learning a bilingual language model of both natural language descriptions of functions and molecule building blocks. By reasoning on such functional building blocks, mCLM guarantees to generate efficiently synthesizable molecules thanks to recent progress in block-based chemistry, while also improving the functions of molecules in a principled manner. In experiments on 430 FDA-approved drugs, we find mCLM capable of significantly improving 5 out of 6 chemical functions critical to determining drug potentials. More importantly, mCLM can reason on multiple functions and improve the FDA-rejected drugs (``fallen angels'') over multiple iterations to greatly improve their shortcomings. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.08614 [pdf, ps, other]

WaveGuard: Robust Deepfake Detection and Source Tracing via Dual-Tree Complex Wavelet and Graph Neural Networks

Authors: Ziyuan He, Zhiqing Guo, Liejun Wang, Gaobo Yang, Yunfeng Diao, Dan Ma

Abstract: Deepfake technology poses increasing risks such as privacy invasion and identity theft. To address these threats, we propose WaveGuard, a proactive watermarking framework that enhances robustness and imperceptibility via frequency-domain embedding and graph-based structural consistency. Specifically, we embed watermarks into high-frequency sub-bands using Dual-Tree Complex Wavelet Transform (DT-CW… ▽ More Deepfake technology poses increasing risks such as privacy invasion and identity theft. To address these threats, we propose WaveGuard, a proactive watermarking framework that enhances robustness and imperceptibility via frequency-domain embedding and graph-based structural consistency. Specifically, we embed watermarks into high-frequency sub-bands using Dual-Tree Complex Wavelet Transform (DT-CWT) and employ a Structural Consistency Graph Neural Network (SC-GNN) to preserve visual quality. We also design an attention module to refine embedding precision. Experimental results on face swap and reenactment tasks demonstrate that WaveGuard outperforms state-of-the-art methods in both robustness and visual quality. Code is available at https://github.com/vpsg-research/WaveGuard. △ Less

Submitted 25 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

Comments: 12 pages, 6 figures, 5 tables

arXiv:2504.11259 [pdf, ps, other]

The Cambridge Report on Database Research

Authors: Anastasia Ailamaki, Samuel Madden, Daniel Abadi, Gustavo Alonso, Sihem Amer-Yahia, Magdalena Balazinska, Philip A. Bernstein, Peter Boncz, Michael Cafarella, Surajit Chaudhuri, Susan Davidson, David DeWitt, Yanlei Diao, Xin Luna Dong, Michael Franklin, Juliana Freire, Johannes Gehrke, Alon Halevy, Joseph M. Hellerstein, Mark D. Hill, Stratos Idreos, Yannis Ioannidis, Christoph Koch, Donald Kossmann, Tim Kraska , et al. (21 additional authors not shown)

Abstract: On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five… ▽ More On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five years to produce a forward looking report. This report summarizes the key takeaways from our discussions. We begin with a retrospective on the academic, open source, and commercial successes of the community over the past five years. We then turn to future opportunities, with a focus on core data systems, particularly in the context of cloud computing and emerging hardware, as well as on the growing impact of data science, data governance, and generative AI. This document is not intended as an exhaustive survey of all technical challenges or industry innovations in the field. Rather, it reflects the perspectives of senior community members on the most pressing challenges and promising opportunities ahead. △ Less

Submitted 15 April, 2025; originally announced April 2025.

arXiv:2504.04818 [pdf, ps, other]

SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement

Authors: Zuying Xie, Changtao Miao, Ajian Liu, Jiabao Guo, Feng Li, Dan Guo, Yunfeng Diao

Abstract: Face recognition systems are vulnerable to physical attacks (e.g., printed photos) and digital threats (e.g., DeepFake), which are currently being studied as independent visual tasks, such as Face Anti-Spoofing and Forgery Detection. The inherent differences among various attack types present significant challenges in identifying a common feature space, making it difficult to develop a unified fra… ▽ More Face recognition systems are vulnerable to physical attacks (e.g., printed photos) and digital threats (e.g., DeepFake), which are currently being studied as independent visual tasks, such as Face Anti-Spoofing and Forgery Detection. The inherent differences among various attack types present significant challenges in identifying a common feature space, making it difficult to develop a unified framework for detecting data from both attack modalities simultaneously. Inspired by the efficacy of Mixture-of-Experts (MoE) in learning across diverse domains, we explore utilizing multiple experts to learn the distinct features of various attack types. However, the feature distributions of physical and digital attacks overlap and differ. This suggests that relying solely on distinct experts to learn the unique features of each attack type may overlook shared knowledge between them. To address these issues, we propose SUEDE, the Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement. SUEDE combines a shared expert (always activated) to capture common features for both attack types and multiple routed experts (selectively activated) for specific attack types. Further, we integrate CLIP as the base network to ensure the shared expert benefits from prior visual knowledge and align visual-text representations in a unified space. Extensive results demonstrate SUEDE achieves superior performance compared to state-of-the-art unified detection methods. △ Less

Submitted 18 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

Comments: Accepted in ICME 2025 (Oral)

arXiv:2504.04470 [pdf, other]

Domain Generalization for Face Anti-spoofing via Content-aware Composite Prompt Engineering

Authors: Jiabao Guo, Ajian Liu, Yunfeng Diao, Jin Zhang, Hui Ma, Bo Zhao, Richang Hong, Meng Wang

Abstract: The challenge of Domain Generalization (DG) in Face Anti-Spoofing (FAS) is the significant interference of domain-specific signals on subtle spoofing clues. Recently, some CLIP-based algorithms have been developed to alleviate this interference by adjusting the weights of visual classifiers. However, our analysis of this class-wise prompt engineering suffers from two shortcomings for DG FAS: (1) T… ▽ More The challenge of Domain Generalization (DG) in Face Anti-Spoofing (FAS) is the significant interference of domain-specific signals on subtle spoofing clues. Recently, some CLIP-based algorithms have been developed to alleviate this interference by adjusting the weights of visual classifiers. However, our analysis of this class-wise prompt engineering suffers from two shortcomings for DG FAS: (1) The categories of facial categories, such as real or spoof, have no semantics for the CLIP model, making it difficult to learn accurate category descriptions. (2) A single form of prompt cannot portray the various types of spoofing. In this work, instead of class-wise prompts, we propose a novel Content-aware Composite Prompt Engineering (CCPE) that generates instance-wise composite prompts, including both fixed template and learnable prompts. Specifically, our CCPE constructs content-aware prompts from two branches: (1) Inherent content prompt explicitly benefits from abundant transferred knowledge from the instruction-based Large Language Model (LLM). (2) Learnable content prompts implicitly extract the most informative visual content via Q-Former. Moreover, we design a Cross-Modal Guidance Module (CGM) that dynamically adjusts unimodal features for fusion to achieve better generalized FAS. Finally, our CCPE has been validated for its effectiveness in multiple cross-domain experiments and achieves state-of-the-art (SOTA) results. △ Less

Submitted 6 April, 2025; originally announced April 2025.

arXiv:2503.23060 [pdf, other]

Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

Authors: Vincent Jacob, Yanlei Diao

Abstract: The widespread adoption of digital services, along with the scale and complexity at which they operate, has made incidents in IT operations increasingly more likely, diverse, and impactful. This has led to the rapid development of a central aspect of "Artificial Intelligence for IT Operations" (AIOps), focusing on detecting anomalies in vast amounts of multivariate time series data generated by se… ▽ More The widespread adoption of digital services, along with the scale and complexity at which they operate, has made incidents in IT operations increasingly more likely, diverse, and impactful. This has led to the rapid development of a central aspect of "Artificial Intelligence for IT Operations" (AIOps), focusing on detecting anomalies in vast amounts of multivariate time series data generated by service entities. In this paper, we begin by introducing a unifying framework for benchmarking unsupervised anomaly detection (AD) methods, and highlight the problem of shifts in normal behaviors that can occur in practical AIOps scenarios. To tackle anomaly detection under domain shift, we then cast the problem in the framework of domain generalization and propose a novel approach, Domain-Invariant VAE for Anomaly Detection (DIVAD), to learn domain-invariant representations for unsupervised anomaly detection. Our evaluation results using the Exathlon benchmark show that the two main DIVAD variants significantly outperform the best unsupervised AD method in maximum performance, with 20% and 15% improvements in maximum peak F1-scores, respectively. Evaluation using the Application Server Dataset further demonstrates the broader applicability of our domain generalization methods. △ Less

Submitted 29 March, 2025; originally announced March 2025.

arXiv:2503.08661 [pdf, other]

Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems

Authors: Yufeng Diao, Yichi Zhang, Daniele De Martini, Philip Guodong Zhao, Emma Liying Li

Abstract: This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck… ▽ More This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck (IB) to enhance data transmission efficiency by prioritizing task-specific information. To mitigate the perceived End-to-End (E2E) delays, we develop a Delay-Aware Trajectory-Guided Control Prediction (DTCP) strategy that integrates trajectory planning with control prediction, predicting commands based on E2E delay. Moreover, the DTCP is co-designed with task-oriented JSCC, focusing on transmitting task-specific information for timely and reliable autonomous driving. Experimental results in the CARLA simulator demonstrate that, under an E2E delay of 1 second (20 time slots), the proposed framework achieves a driving score of 48.12, which is 31.59 points higher than using Better Portable Graphics (BPG) while reducing bandwidth usage by 99.19%. △ Less

Submitted 11 March, 2025; originally announced March 2025.

Comments: This paper has been accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC), with publication expected in 2025

arXiv:2503.04728 [pdf]

doi 10.5121/ijci.2024.130601

Leveraging Large Language Models For Optimized Item Categorization using UNSPSC Taxonomy

Authors: Anmolika Singh, Yuhang Diao

Abstract: Effective item categorization is vital for businesses, enabling the transformation of unstructured datasets into organized categories that streamline inventory management. Despite its importance, item categorization remains highly subjective and lacks a uniform standard across industries and businesses. The United Nations Standard Products and Services Code (UNSPSC) provides a standardized system… ▽ More Effective item categorization is vital for businesses, enabling the transformation of unstructured datasets into organized categories that streamline inventory management. Despite its importance, item categorization remains highly subjective and lacks a uniform standard across industries and businesses. The United Nations Standard Products and Services Code (UNSPSC) provides a standardized system for cataloguing inventory, yet employing UNSPSC categorizations often demands significant manual effort. This paper investigates the deployment of Large Language Models (LLMs) to automate the classification of inventory data into UNSPSC codes based on Item Descriptions. We evaluate the accuracy and efficiency of LLMs in categorizing diverse datasets, exploring their language processing capabilities and their potential as a tool for standardizing inventory classification. Our findings reveal that LLMs can substantially diminish the manual labor involved in item categorization while maintaining high accuracy, offering a scalable solution for businesses striving to enhance their inventory management practices. △ Less

Submitted 27 December, 2024; originally announced March 2025.

Comments: 10 Pages, International Conference on NLP, AI, Computer Science & Engineering (NLAICSE 2024), December 2024, ISBN : 978-1-923107-45-8

Journal ref: International Journal on Cybernetics & Informatics. 13. (2024)

arXiv:2502.15472 [pdf, other]

Aligning Task- and Reconstruction-Oriented Communications for Edge Intelligence

Authors: Yufeng Diao, Yichi Zhang, Changyang She, Philip Guodong Zhao, Emma Liying Li

Abstract: Existing communication systems aim to reconstruct the information at the receiver side, and are known as reconstruction-oriented communications. This approach often falls short in meeting the real-time, task-specific demands of modern AI-driven applications such as autonomous driving and semantic segmentation. As a new design principle, task-oriented communications have been developed. However, it… ▽ More Existing communication systems aim to reconstruct the information at the receiver side, and are known as reconstruction-oriented communications. This approach often falls short in meeting the real-time, task-specific demands of modern AI-driven applications such as autonomous driving and semantic segmentation. As a new design principle, task-oriented communications have been developed. However, it typically requires joint optimization of encoder, decoder, and modified inference neural networks, resulting in extensive cross-system redesigns and compatibility issues. This paper proposes a novel communication framework that aligns reconstruction-oriented and task-oriented communications for edge intelligence. The idea is to extend the Information Bottleneck (IB) theory to optimize data transmission by minimizing task-relevant loss function, while maintaining the structure of the original data by an information reshaper. Such an approach integrates task-oriented communications with reconstruction-oriented communications, where a variational approach is designed to handle the intractability of mutual information in high-dimensional neural network features. We also introduce a joint source-channel coding (JSCC) modulation scheme compatible with classical modulation techniques, enabling the deployment of AI technologies within existing digital infrastructures. The proposed framework is particularly effective in edge-based autonomous driving scenarios. Our evaluation in the Car Learning to Act (CARLA) simulator demonstrates that the proposed framework significantly reduces bits per service by 99.19% compared to existing methods, such as JPEG, JPEG2000, and BPG, without compromising the effectiveness of task execution. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: Accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC)

arXiv:2502.15447 [pdf, other]

doi 10.1016/j.xinn.2025.100802

Ultra-high-energy $γ$-ray emission associated with the tail of a bow-shock pulsar wind nebula

Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen, S. Z. Chen , et al. (274 additional authors not shown)

Abstract: In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola f… ▽ More In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola function with $N0 = (1.93\pm0.23) \times 10^{-16} \rm{TeV^{-1}\,cm^{-2}\,s^{-2}}$, $α= 2.14\pm0.27$, and $β= 1.20\pm0.41$ at E0 = 30$\,$TeV. The associated pulsar, PSR J1740+1000, resides at a high galactic latitude and powers a bow-shock pulsar wind nebula (BSPWN) with an extended X-ray tail. The best-fit position of the gamma-ray source appeared to be shifted by $0.2^{\circ}$ with respect to the pulsar position. As the (i) currently identified pulsar halos do not demonstrate such offsets, and (ii) centroid of the gamma-ray emission is approximately located at the extension of the X-ray tail, we speculate that the UHE $γ$-ray emission may originate from re-accelerated electron/positron pairs that are advected away in the bow-shock tail. △ Less

Submitted 24 February, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

Comments: Corrected spelling errors in several author names

Journal ref: The Innovation (2025), 100802

arXiv:2502.04848 [pdf, other]

Broadband $γ$-ray spectrum of supernova remnant Cassiopeia A

Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen, S. Z. Chen , et al. (293 additional authors not shown)

Abstract: The core-collapse supernova remnant (SNR) Cassiopeia A (Cas A) is one of the brightest galactic radio sources with an angular radius of $\sim$ 2.5 $\arcmin$. Although no extension of this source has been detected in the $γ$-ray band, using more than 1000 days of LHAASO data above $\sim 0.8$ TeV, we find that its spectrum is significantly softer than those obtained with Imaging Air Cherenkov Telesc… ▽ More The core-collapse supernova remnant (SNR) Cassiopeia A (Cas A) is one of the brightest galactic radio sources with an angular radius of $\sim$ 2.5 $\arcmin$. Although no extension of this source has been detected in the $γ$-ray band, using more than 1000 days of LHAASO data above $\sim 0.8$ TeV, we find that its spectrum is significantly softer than those obtained with Imaging Air Cherenkov Telescopes (IACTs) and its flux near $\sim 1$ TeV is about two times higher. In combination with analyses of more than 16 years of \textit{Fermi}-LAT data covering $0.1 \, \mathrm{GeV} - 1 \, \mathrm{TeV}$, we find that the spectrum above 30 GeV deviates significantly from a single power-law, and is best described by a smoothly broken power-law with a spectral index of $1.90 \pm 0.15_\mathrm{stat}$ ($3.41 \pm 0.19_\mathrm{stat}$) below (above) a break energy of $0.63 \pm 0.21_\mathrm{stat} \, \mathrm{TeV}$. Given differences in the angular resolution of LHAASO-WCDA and IACTs, TeV $γ$-ray emission detected with LHAASO may have a significant contribution from regions surrounding the SNR illuminated by particles accelerated earlier, which, however, are treated as background by IACTs. Detailed modelling can be used to constrain acceleration processes of TeV particles in the early stage of SNR evolution. △ Less

Submitted 7 February, 2025; originally announced February 2025.

arXiv:2502.04377 [pdf, other]

MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction

Authors: Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, Hui Zhang, Weiming Li, Shu Zhao, Yu Liu

Abstract: Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches ofte… ▽ More Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches often neglect modality interaction and rely on simple fusion strategies, which suffer from the problems of misalignment and information loss. To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction. Specifically, to solve the semantic misalignment problem between camera and LiDAR BEV features, we introduce the Cross-modal Interaction Transform (CIT) module, enabling interaction between two BEV feature spaces and enhancing feature representation through a self-attention mechanism. Additionally, we propose an effective Dual Dynamic Fusion (DDF) module to adaptively select valuable information from different modalities, which can take full advantage of the inherent information between different modalities. Moreover, MapFusion is designed to be simple and plug-and-play, easily integrated into existing pipelines. We evaluate MapFusion on two map construction tasks, including High-definition (HD) map and BEV map segmentation, to show its versatility and effectiveness. Compared with the state-of-the-art methods, MapFusion achieves 3.6% and 6.2% absolute improvements on the HD map construction and BEV map segmentation tasks on the nuScenes dataset, respectively, demonstrating the superiority of our approach. △ Less

Submitted 5 February, 2025; originally announced February 2025.

arXiv:2412.20833 [pdf, ps, other]

Inclusion 2024 Global Multimedia Deepfake Detection Challenge: Towards Multi-dimensional Face Forgery Detection

Authors: Yi Zhang, Weize Gao, Changtao Miao, Man Luo, Jianshu Li, Wenzhong Deng, Zhe Li, Bingyu Hu, Weibin Yao, Yunfeng Diao, Wenbo Zhou, Tao Gong, Qi Chu

Abstract: In this paper, we present the Global Multimedia Deepfake Detection held concurrently with the Inclusion 2024. Our Multimedia Deepfake Detection aims to detect automatic image and audio-video manipulations including but not limited to editing, synthesis, generation, Photoshop,etc. Our challenge has attracted 1500 teams from all over the world, with about 5000 valid result submission counts. We invi… ▽ More In this paper, we present the Global Multimedia Deepfake Detection held concurrently with the Inclusion 2024. Our Multimedia Deepfake Detection aims to detect automatic image and audio-video manipulations including but not limited to editing, synthesis, generation, Photoshop,etc. Our challenge has attracted 1500 teams from all over the world, with about 5000 valid result submission counts. We invite the top 20 teams to present their solutions to the challenge, from which the top 3 teams are awarded prizes in the grand finale. In this paper, we present the solutions from the top 3 teams of the two tracks, to boost the research work in the field of image and audio-video forgery detection. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection systems and we encourage participants to open source their methods. △ Less

Submitted 3 June, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

Comments: Inclusion 2024 Global Multimedia Deepfake Detection Competition Top Team Technical Report

arXiv:2412.16483 [pdf, other]

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights

Authors: Jingjing Hu, Dan Guo, Zhan Si, Deguang Liu, Yunfeng Diao, Jing Zhang, Jinxing Zhou, Meng Wang

Abstract: Molecular representation learning plays a crucial role in various downstream tasks, such as molecular property prediction and drug design. To accurately represent molecules, Graph Neural Networks (GNNs) and Graph Transformers (GTs) have shown potential in the realm of self-supervised pretraining. However, existing approaches often overlook the relationship between molecular structure and electroni… ▽ More Molecular representation learning plays a crucial role in various downstream tasks, such as molecular property prediction and drug design. To accurately represent molecules, Graph Neural Networks (GNNs) and Graph Transformers (GTs) have shown potential in the realm of self-supervised pretraining. However, existing approaches often overlook the relationship between molecular structure and electronic information, as well as the internal semantic reasoning within molecules. This omission of fundamental chemical knowledge in graph semantics leads to incomplete molecular representations, missing the integration of structural and electronic data. To address these issues, we introduce MOL-Mamba, a framework that enhances molecular representation by combining structural and electronic insights. MOL-Mamba consists of an Atom & Fragment Mamba-Graph (MG) for hierarchical structural reasoning and a Mamba-Transformer (MT) fuser for integrating molecular structure and electronic correlation learning. Additionally, we propose a Structural Distribution Collaborative Training and E-semantic Fusion Training framework to further enhance molecular representation learning. Extensive experiments demonstrate that MOL-Mamba outperforms state-of-the-art baselines across eleven chemical-biological molecular datasets. △ Less

Submitted 5 February, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

Comments: Accepted by AAAI2025

arXiv:2412.07229 [pdf, ps, other]

Moderating the Generalization of Score-based Generative Model

Authors: Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong

Abstract: Score-based Generative Models (SGMs) have demonstrated remarkable generalization abilities, e.g. generating unseen, but natural data. However, the greater the generalization power, the more likely the unintended generalization, and the more dangerous the abuse. Research on moderated generalization in SGMs remains limited. To fill this gap, we first examine the current 'gold standard' in Machine Un… ▽ More Score-based Generative Models (SGMs) have demonstrated remarkable generalization abilities, e.g. generating unseen, but natural data. However, the greater the generalization power, the more likely the unintended generalization, and the more dangerous the abuse. Research on moderated generalization in SGMs remains limited. To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i.e., re-training the model after removing the undesirable training data, and find it does not work in SGMs. Further analysis of score functions reveals that the MU 'gold standard' does not alter the original score function, which explains its ineffectiveness. Based on this insight, we propose the first Moderated Score-based Generative Model (MSGM), which introduces a novel score adjustment strategy that redirects the score function away from undesirable data during the continuous-time stochastic differential equation process. Extensive experimental results demonstrate that MSGM significantly reduces the likelihood of generating undesirable content while preserving high visual quality for normal image generation. Albeit designed for SGMs, MSGM is a general and flexible MU framework that is compatible with diverse diffusion architectures (SGM and DDPM) and training strategies (re-training and fine-tuning), and enables zero-shot transfer of the pre-trained models to downstream tasks, e.g. image inpainting and reconstruction. The code will be shared upon acceptance. △ Less

Submitted 26 June, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

arXiv:2412.06103 [pdf, other]

The Enumeration of Alternating Pretzel Links

Authors: Charlotte Aspinwall, Tobias Clark, Yuanan Diao

Abstract: In this paper, we tabulate the set of alternating pretzel links. Specifically, for any given crossing number $c$, we derive a closed formula that would allow us to compute $\mathcal{P}(c)$, the total number of alternating pretzel links with crossing number $c$. Numerical computation suggests that $\mathcal{P}(c)\approx 0.155e^{0.588c}$. That is, the number of alternating pretzel links with a given… ▽ More In this paper, we tabulate the set of alternating pretzel links. Specifically, for any given crossing number $c$, we derive a closed formula that would allow us to compute $\mathcal{P}(c)$, the total number of alternating pretzel links with crossing number $c$. Numerical computation suggests that $\mathcal{P}(c)\approx 0.155e^{0.588c}$. That is, the number of alternating pretzel links with a given crossing number $c$ grows exponentially in terms of $c$. △ Less

Submitted 15 February, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

Comments: 22 pages, 6 figures

MSC Class: (2020): Primary: 57K10; Secondary: 57K14

arXiv:2410.17986 [pdf, other]

Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data

Authors: Zhaomin Wu, Junyi Hou, Yiqun Diao, Bingsheng He

Abstract: Federated Learning (FL) is an evolving paradigm that enables multiple parties to collaboratively train models without sharing raw data. Among its variants, Vertical Federated Learning (VFL) is particularly relevant in real-world, cross-organizational collaborations, where distinct features of a shared instance group are contributed by different parties. In these scenarios, parties are often linked… ▽ More Federated Learning (FL) is an evolving paradigm that enables multiple parties to collaboratively train models without sharing raw data. Among its variants, Vertical Federated Learning (VFL) is particularly relevant in real-world, cross-organizational collaborations, where distinct features of a shared instance group are contributed by different parties. In these scenarios, parties are often linked using fuzzy identifiers, leading to a common practice termed as multi-party fuzzy VFL. Existing models generally address either multi-party VFL or fuzzy VFL between two parties. Extending these models to practical multi-party fuzzy VFL typically results in significant performance degradation and increased costs for maintaining privacy. To overcome these limitations, we introduce the Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers. FeT innovatively encodes these identifiers into data representations and employs a transformer architecture distributed across different parties, incorporating three new techniques to enhance performance. Furthermore, we have developed a multi-party privacy framework for VFL that integrates differential privacy with secure multi-party computation, effectively protecting local representations while minimizing associated utility costs. Our experiments demonstrate that the FeT surpasses the baseline models by up to 46\% in terms of accuracy when scaled to 50 parties. Additionally, in two-party fuzzy VFL settings, FeT also shows improved performance and privacy over cutting-edge VFL models. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Journal ref: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

arXiv:2410.02082 [pdf, other]

FARM: Functional Group-Aware Representations for Small Molecules

Authors: Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji

Abstract: We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which directly incorporates functional group information into the representations. This strategic reduction in tokenization granularity… ▽ More We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which directly incorporates functional group information into the representations. This strategic reduction in tokenization granularity is intentionally aligned with key drivers of functional properties (i.e., functional groups), enhancing the model's understanding of chemical language. By expanding the chemical lexicon, FARM more effectively bridges SMILES and natural language, ultimately advancing the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. By leveraging contrastive learning, FARM aligns these two views of representations into a unified molecular embedding. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research. △ Less

Submitted 6 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

Comments: Preprint

arXiv:2409.06712 [pdf, other]

A Meta-analysis of College Students' Intention to Use Generative Artificial Intelligence

Authors: Yifei Diao, Ziyi Li, Jiateng Zhou, Wei Gao, Xin Gong

Abstract: It is of critical importance to analyse the factors influencing college students' intention to use generative artificial intelligence (GenAI) to understand and predict learners' learning behaviours and academic outcomes. Nevertheless, a lack of congruity has been shown in extant research results. This study, therefore, conducted a meta-analysis of 27 empirical studies under an integrated theoretic… ▽ More It is of critical importance to analyse the factors influencing college students' intention to use generative artificial intelligence (GenAI) to understand and predict learners' learning behaviours and academic outcomes. Nevertheless, a lack of congruity has been shown in extant research results. This study, therefore, conducted a meta-analysis of 27 empirical studies under an integrated theoretical framework, including 87 effect sizes of independent research and 33,833 sample data. The results revealed that the main variables are strongly correlated with students' behavioural intention to use GenAI. Among them, performance expectancy (r = 0.389) and attitudes (r = 0.576) play particularly critical roles, and effort expectancy and habit are moderated by locational factors. Gender, notably, only moderated attitudes on students' behavioural intention to use GenAI. This study provides valuable insights for addressing the debate regarding students' intention to use GenAI in existed research, improving educational technology, as well as offering support for school decision-makers and educators to apply GenAI in school settings. △ Less

Submitted 25 August, 2024; originally announced September 2024.

arXiv:2409.02483 [pdf, other]

TASAR: Transfer-based Attack on Skeletal Action Recognition

Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xiaoshuai Hao, Xingxing Wei, Meng Wang, He Wang

Abstract: Skeletal sequence data, as a widely employed representation of human actions, are crucial in Human Activity Recognition (HAR). Recently, adversarial attacks have been proposed in this area, which exposes potential security concerns, and more importantly provides a good tool for model robustness test. Within this research, transfer-based attack is an important tool as it mimics the real-world scena… ▽ More Skeletal sequence data, as a widely employed representation of human actions, are crucial in Human Activity Recognition (HAR). Recently, adversarial attacks have been proposed in this area, which exposes potential security concerns, and more importantly provides a good tool for model robustness test. Within this research, transfer-based attack is an important tool as it mimics the real-world scenario where an attacker has no knowledge of the target model, but is under-explored in Skeleton-based HAR (S-HAR). Consequently, existing S-HAR attacks exhibit weak adversarial transferability and the reason remains largely unknown. In this paper, we investigate this phenomenon via the characterization of the loss function. We find that one prominent indicator of poor transferability is the low smoothness of the loss function. Led by this observation, we improve the transferability by properly smoothening the loss when computing the adversarial examples. This leads to the first Transfer-based Attack on Skeletal Action Recognition, TASAR. TASAR explores the smoothened model posterior of pre-trained surrogates, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike existing transfer-based methods which overlook the temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack, effectively disrupting the spatial-temporal coherence of S-HARs. For exhaustive evaluation, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense models. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the https://github.com/yunfengdiao/Skeleton-Robustness-Benchmark. △ Less

Submitted 12 February, 2025; v1 submitted 4 September, 2024; originally announced September 2024.

Comments: Accepted in ICLR 2025

arXiv:2408.03774 [pdf, ps, other]

Class numbers and integer points on some Pellian surfaces

Authors: Yijie Diao

Abstract: We provide an estimate for the number of nontrivial integer points on the Pellian surface $t^2 - du^2 = 1$ in a bounded region. We give a lower bound on the size of fundamental solutions for almost all $d$ in a certain class, based on a recent conjecture of Browning and Wilsch about integer points on log K3 surfaces. We also obtain an upper bound on the average of class number in this class, assum… ▽ More We provide an estimate for the number of nontrivial integer points on the Pellian surface $t^2 - du^2 = 1$ in a bounded region. We give a lower bound on the size of fundamental solutions for almost all $d$ in a certain class, based on a recent conjecture of Browning and Wilsch about integer points on log K3 surfaces. We also obtain an upper bound on the average of class number in this class, assuming the same conjecture. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 14 pages

MSC Class: 11D25 (11N56)

arXiv:2407.20836 [pdf, other]

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

Authors: Yunfeng Diao, Naixin Zhai, Changtao Miao, Zitong Yu, Xingxing Wei, Xun Yang, Meng Wang

Abstract: Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversaria… ▽ More Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. To this end, we propose a new method to attack AIGI detectors. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous AIGI detectors, e.g. transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as Frequency-based Post-train Bayesian Attack, or FPBA. Through FPBA, we show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection, which is a crucial real-world detection scenario. The code will be shared upon acceptance. △ Less

Submitted 10 March, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.08572

Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang

Abstract: Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that t… ▽ More Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that the loss landscape of the action recognizers is rugged and sharp. Given the established correlation in prior studies~\cite{qin2022boosting,wu2020towards} between loss landscape and adversarial transferability, we assume and empirically validate that smoothing the loss landscape could potentially improve adversarial transferability on S-HAR. This is achieved by proposing a new post-train Dual Bayesian strategy, which can effectively explore the model posterior space for a collection of surrogates without the need for re-training. Furthermore, to craft adversarial examples along the motion manifold, we incorporate the attack gradient with information of the motion dynamics in a Bayesian manner. Evaluated on benchmark datasets, e.g. HDM05 and NTU 60, the average transfer success rate can reach as high as 35.9\% and 45.5\% respectively. In comparison, current state-of-the-art skeletal attacks achieve only 3.6\% and 9.8\%. The high adversarial transferability remains consistent across various surrogate, victim, and even defense models. Through a comprehensive analysis of the results, we provide insights on what surrogates are more likely to exhibit transferability, to shed light on future research. △ Less

Submitted 5 September, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: We have submitted a new version of our work at arXiv:2409.02483. This version, arXiv:2407.08572, is no longer valid. Any update for this work will be conducted in arXiv:2409.02483

arXiv:2407.00238 [pdf, other]

The Braid Indices of Pretzel Links: A Comprehensive Study, Part II

Authors: Yuanan Diao, Claus Ernst, Gabor Hetyei

Abstract: This paper is the second part of our comprehensive study on the braid index problem of pretzel links. Our ultimate goal is to completely determine the braid indices of all pretzel links, alternating or non alternating. In our approach, we divide the pretzel links into three types as follows. Let $D$ be a standard diagram of an oriented pretzel link $\mathcal{L}$, $S(D)$ be the Seifert circle decom… ▽ More This paper is the second part of our comprehensive study on the braid index problem of pretzel links. Our ultimate goal is to completely determine the braid indices of all pretzel links, alternating or non alternating. In our approach, we divide the pretzel links into three types as follows. Let $D$ be a standard diagram of an oriented pretzel link $\mathcal{L}$, $S(D)$ be the Seifert circle decomposition of $D$, and $C_1$, $C_2$ be the Seifert circles in $S(D)$ containing the top and bottom long strands of $D$ respectively, then $\mathcal{L}$ is classified as a Type 1 (Type 2) pretzel link if $C_1\not=C_2$ and $C_1$, $C_2$ have different (identical) orientations. In the case that $C_1=C_2$, then $\mathcal{L}$ is classified as a Type 3 pretzel link. In our previous paper, we succeeded in reaching our goal for all Type 1 and Type 2 pretzel links. That is, we successfully derived precise braid index formulas for all Type 1 and Type 2 pretzel links. In this paper, we present the results of our study on Type 3 pretzel links. In this case, we are very close to reaching our goal. More precisely, with the exception of a small percentage of Type 3 pretzel links, we are able to determine the precise braid indices for the majority of Type 3 pretzel links. Even for those exceptional ones, we are able to determine their braid indices within two consecutive integers. With some numerical evidence, we conjecture that in such a case, the braid index of the Type 3 pretzel link is given by the larger of the two consecutive integers given by our formulas. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 29 pages, 13 figures

MSC Class: Primary: 5725; Secondary: 5727

arXiv:2405.14203 [pdf, other]

GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices

Authors: Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, Heng Ji

Abstract: This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, whic… ▽ More This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox, and SIDER), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: In progress

arXiv:2403.14094 [pdf, other]

The Braid Indices of Pretzel Links: A Comprehensive Study, Part I

Authors: Yuanan Diao, Claus Ernst, Gabor Hetyei

Abstract: The determination of the braid index of an oriented link is generally a hard problem. In the case of alternating links, some significant progresses have been made in recent years which made explicit and precise braid index computations possible for links from various families of alternating links, including the family of all alternating Montesinos links. However, much less is known for non-alterna… ▽ More The determination of the braid index of an oriented link is generally a hard problem. In the case of alternating links, some significant progresses have been made in recent years which made explicit and precise braid index computations possible for links from various families of alternating links, including the family of all alternating Montesinos links. However, much less is known for non-alternating links. For example, even for the non-alternating pretzel links, which are special (and simpler) Montesinos links, the braid index is only known for a very limited few special cases. In this paper and its sequel, we study the braid indices for all non-alternating pretzel links by a systematic approach. We classify the pretzel links into three different types according to the Seifert circle decompositions of their standard link diagrams. More specifically, if $D$ is a standard diagram of an oriented pretzel link $\mathcal{L}$, $S(D)$ is the Seifert circle decomposition of $D$, and $C_1$, $C_2$ are the Seifert circles in $S(D)$ containing the top and bottom long strands of $D$ respectively, then $\mathcal{L}$ is classified as a Type 1 (Type 2) pretzel link if $C_1\not=C_2$ and $C_1$, $C_2$ have different (identical) orientations. In the case that $C_1=C_2$, then $\mathcal{L}$ is classified as a Type 3 pretzel link. In this paper, we present the results of our study on Type 1 and Type 2 pretzel links. Our results allow us to determine the precise braid index for any non-alternating Type 1 or Type 2 pretzel link. Since the braid indices are already known for all alternating pretzel links from our previous work, it means that we have now completely determined the braid indices for all Type 1 and Type 2 pretzel links. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 25 pages, 18 figures

MSC Class: 2010. Primary: 5725; Secondary: 5727

arXiv:2403.00995 [pdf, other]

doi 10.14778/3681954.3682021

A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning

Authors: Chenghao Lyu, Qi Fan, Philippe Guyard, Yanlei Diao

Abstract: As Spark becomes a common big data analytics platform, its growing complexity makes automatic tuning of numerous parameters critical for performance. Our work on Spark parameter tuning is particularly motivated by two recent trends: Spark's Adaptive Query Execution (AQE) based on runtime statistics, and the increasingly popular Spark cloud deployments that make cost-performance reasoning crucial f… ▽ More As Spark becomes a common big data analytics platform, its growing complexity makes automatic tuning of numerous parameters critical for performance. Our work on Spark parameter tuning is particularly motivated by two recent trends: Spark's Adaptive Query Execution (AQE) based on runtime statistics, and the increasingly popular Spark cloud deployments that make cost-performance reasoning crucial for the end user. This paper presents our design of a Spark optimizer that controls all tunable parameters of each query in the new AQE architecture to explore its performance benefits and, at the same time, casts the tuning problem in the theoretically sound multi-objective optimization (MOO) setting to better adapt to user cost-performance preferences. To this end, we propose a novel hybrid compile-time/runtime approach to multi-granularity tuning of diverse, correlated Spark parameters, as well as a suite of modeling and optimization techniques to solve the tuning problem in the MOO setting while meeting the stringent time constraint of 1-2 seconds for cloud use. Evaluation results using TPC-H and TPC-DS benchmarks demonstrate the superior performance of our approach: (i) When prioritizing latency, it achieves 63% and 65% reduction for TPC-H and TPC-DS, respectively, under an average solving time of 0.7-0.8 sec, outperforming the most competitive MOO method that reduces only 18-25% latency with 2.6-15 sec solving time. (ii) When shifting preferences between latency and cost, our approach dominates the solutions of alternative methods, exhibiting superior adaptability to varying preferences. △ Less

Submitted 18 July, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Journal ref: PVLDB, 15(11): 3098-3111, 2022

arXiv:2312.06290 [pdf, other]

Exploiting Label Skews in Federated Learning with Model Concatenation

Authors: Yiqun Diao, Qinbin Li, Bingsheng He

Abstract: Federated Learning (FL) has emerged as a promising solution to perform deep learning on different data owners without exchanging raw data. However, non-IID data has been a key challenge in FL, which could significantly degrade the accuracy of the final model. Among different non-IID types, label skews have been challenging and common in image classification and other tasks. Instead of averaging th… ▽ More Federated Learning (FL) has emerged as a promising solution to perform deep learning on different data owners without exchanging raw data. However, non-IID data has been a key challenge in FL, which could significantly degrade the accuracy of the final model. Among different non-IID types, label skews have been challenging and common in image classification and other tasks. Instead of averaging the local models in most previous studies, we propose FedConcat, a simple and effective approach that concatenates these local models as the base of the global model to effectively aggregate the local knowledge. To reduce the size of the global model, we adopt the clustering technique to group the clients by their label distributions and collaboratively train a model inside each cluster. We theoretically analyze the advantage of concatenation over averaging by analyzing the information bottleneck of deep neural networks. Experimental results demonstrate that FedConcat achieves significantly higher accuracy than previous state-of-the-art FL methods in various heterogeneous label skew distribution settings and meanwhile has lower communication costs. Our code is publicly available at https://github.com/sjtudyq/FedConcat. △ Less

Submitted 16 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2310.11791 [pdf, other]

STW-MD: A Novel Spatio-Temporal Weighting and Multi-Step Decision Tree Method for Considering Spatial Heterogeneity in Brain Gene Expression Data

Authors: Shanjun Mao, Xiao Huang, Runjiu Chen, Chenyang Zhang, Yizhu Diao, Zongjin Li, Qingzhe Wang, Shan Tang, Shuixia Guo

Abstract: Motivation: Gene expression during brain development or abnormal development is a biological process that is highly dynamic in spatio and temporal. Due to the lack of comprehensive integration of spatial and temporal dimensions of brain gene expression data, previous studies have mainly focused on individual brain regions or a certain developmental stage. Our motivation is to address this gap by i… ▽ More Motivation: Gene expression during brain development or abnormal development is a biological process that is highly dynamic in spatio and temporal. Due to the lack of comprehensive integration of spatial and temporal dimensions of brain gene expression data, previous studies have mainly focused on individual brain regions or a certain developmental stage. Our motivation is to address this gap by incorporating spatio-temporal information to gain a more complete understanding of the mechanisms underlying brain development or disorders associated with abnormal brain development, such as Alzheimer's disease (AD), and to identify potential determinants of response. Results: In this study, we propose a novel two-step framework based on spatial-temporal information weighting and multi-step decision trees. This framework can effectively exploit the spatial similarity and temporal dependence between different stages and different brain regions, and facilitate differential gene analysis in brain regions with high heterogeneity. We focus on two datasets: the AD dataset, which includes gene expression data from early, middle, and late stages, and the brain development dataset, spanning fetal development to adulthood. Our findings highlight the advantages of the proposed framework in discovering gene classes and elucidating their impact on brain development and AD progression across diverse brain regions and stages. These findings align with existing studies and provide insights into the processes of normal and abnormal brain development. Availability: The code of STW-MD is available at https://github.com/tsnm1/STW-MD. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 11 pages, 6 figures

arXiv:2309.05622 [pdf, other]

Task-Oriented Cross-System Design for Timely and Accurate Modeling in the Metaverse

Authors: Zhen Meng, Kan Chen, Yufeng Diao, Changyang She, Guodong Zhao, Muhammad Ali Imran, Branka Vucetic

Abstract: In this paper, we establish a task-oriented cross-system design framework to minimize the required packet rate for timely and accurate modeling of a real-world robotic arm in the Metaverse, where sensing, communication, prediction, control, and rendering are considered. To optimize a scheduling policy and prediction horizons, we design a Constraint Proximal Policy Optimization(C-PPO) algorithm by… ▽ More In this paper, we establish a task-oriented cross-system design framework to minimize the required packet rate for timely and accurate modeling of a real-world robotic arm in the Metaverse, where sensing, communication, prediction, control, and rendering are considered. To optimize a scheduling policy and prediction horizons, we design a Constraint Proximal Policy Optimization(C-PPO) algorithm by integrating domain knowledge from relevant systems into the advanced reinforcement learning algorithm, Proximal Policy Optimization(PPO). Specifically, the Jacobian matrix for analyzing the motion of the robotic arm is included in the state of the C-PPO algorithm, and the Conditional Value-at-Risk(CVaR) of the state-value function characterizing the long-term modeling error is adopted in the constraint. Besides, the policy is represented by a two-branch neural network determining the scheduling policy and the prediction horizons, respectively. To evaluate our algorithm, we build a prototype including a real-world robotic arm and its digital model in the Metaverse. The experimental results indicate that domain knowledge helps to reduce the convergence time and the required packet rate by up to 50%, and the cross-system design framework outperforms a baseline framework in terms of the required packet rate and the tail distribution of the modeling error. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: This paper is accepted by IEEE Journal on Selected Areas in Communications, JSAC-SI-HCM 2024

arXiv:2308.15059 [pdf, other]

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Authors: Yiqun Diao, Yutong Yang, Qinbin Li, Bingsheng He, Mian Lu

Abstract: How to get insights from relational data streams in a timely manner is a hot research topic. Data streams can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluatio… ▽ More How to get insights from relational data streams in a timely manner is a hot research topic. Data streams can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluations are mostly conducted with synthetic datasets. Thus, a natural question is how those open environment challenges look like and how existing incremental learning algorithms perform on real-world relational data streams. To fill this gap, we develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in real-world relational data streams. Specifically, we investigate 55 real-world relational data streams and establish that open environment scenarios are indeed widespread, which presents significant challenges for stream learning algorithms. Through benchmarks with existing incremental learning algorithms, we find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios, where machine learning models can be significantly compromised by missing values, distribution drifts, or anomalies in real-world data streams. The current techniques are insufficient in effectively mitigating these challenges brought by open environments. More researches are needed to address real-world open environment challenges. All datasets and code are open-sourced in https://github.com/sjtudyq/OEBench. △ Less

Submitted 15 December, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

arXiv:2306.16979 [pdf, other]

Post-train Black-box Defense via Bayesian Boundary Correction

Authors: He Wang, Yunfeng Diao

Abstract: Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the… ▽ More Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new post-train black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact, avoiding re-training. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods. △ Less

Submitted 11 June, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.04713

arXiv:2305.09241 [pdf, other]

Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples

Authors: Wan Jiang, Yunfeng Diao, He Wang, Jianxin Sun, Meng Wang, Richang Hong

Abstract: Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on ori… ▽ More Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on original clean distribution. Unfortunately, we find UEs provide a false sense of security, because they cannot stop unauthorized users from utilizing other unprotected data to remove the protection, by turning unlearnable data into learnable again. Motivated by this observation, we formally define a new threat by introducing \textit{learnable unauthorized examples} (LEs) which are UEs with their protection removed. The core of this approach is a novel purification process that projects UEs onto the manifold of LEs. This is realized by a new joint-conditional diffusion model which denoises UEs conditioned on the pixel and perceptual similarity between UEs and LEs. Extensive experiments demonstrate that LE delivers state-of-the-art countering performance against both supervised UEs and unsupervised UEs in various scenarios, which is the first generalizable countermeasure to UEs across supervised learning and unsupervised learning. Our code is available at \url{https://github.com/jiangw-0/LE_JCDP}. △ Less

Submitted 3 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted in MM 2023

arXiv:2302.05579 [pdf, other]

doi 10.2140/agt.2024.24.2957

The Braid Indices of the Reverse Parallel Links of Alternating Knots

Authors: Yuanan Diao, Hugh Morton

Abstract: The braid indices of most links remain unknown as there is no known universal method that can be used to determine the braid index of an arbitrary knot. This is also the case for alternating knots. In this paper, we show that if $K$ is an alternating knot, then the braid index of any reverse parallel link of $K$ can be precisely determined. More precisely, if $D$ is a reduced diagram of $K$,… ▽ More The braid indices of most links remain unknown as there is no known universal method that can be used to determine the braid index of an arbitrary knot. This is also the case for alternating knots. In this paper, we show that if $K$ is an alternating knot, then the braid index of any reverse parallel link of $K$ can be precisely determined. More precisely, if $D$ is a reduced diagram of $K$, $v_+(D)$ ($v_-(D)$) is the number of regions in the checkerboard shading of $D$ for which all crossings are positive (negative), $w(D)$ is the writhe of $D$, then the braid index of a reverse parallel link of $K$ with framing $f$, denoted by $\mathbb{K}_f$, is given by the following precise formula $$\textbf{b}(\mathbb{K}_f)=\left\{ \begin{array}{ll} c(D)+2+a(D)-f, &\ {\rm if}\ f < a(D),\\ c(D)+2, &\ {\rm if}\ a(D)\le f \le b(D),\\ c(D)+2-b(D)+f, &\ {\rm if}\ f > b(D),\\ \end{array} \right. $$ where $a(D)=-v_-(D)+w(D)$ and $b(D)=v_+(D)+w(D)$. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: 15 pages, 2 figures

MSC Class: 57K10; 57K31

Journal ref: Algebr. Geom. Topol. 24 (2024) 2957-2970

arXiv:2211.11312 [pdf, other]

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Authors: Yunfeng Diao, He Wang, Tianjia Shao, Yong-Liang Yang, Kun Zhou, David Hogg, Meng Wang

Abstract: Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars, where safety and lives are at stake. Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks. However, the proposed attacks require the full-knowledge of the attacked classifier, which is overly restrictive. In this paper,… ▽ More Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars, where safety and lives are at stake. Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks. However, the proposed attacks require the full-knowledge of the attacked classifier, which is overly restrictive. In this paper, we show such threats indeed exist, even when the attacker only has access to the input/output of the model. To this end, we propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR. BASAR explores the interplay between the classification boundary and the natural motion manifold. To our best knowledge, this is the first time data manifold is introduced in adversarial attacks on time series. Via BASAR, we find on-manifold adversarial samples are extremely deceitful and rather common in skeletal motions, in contrast to the common belief that adversarial samples only exist off-manifold. Through exhaustive evaluation, we show that BASAR can deliver successful attacks across classifiers, datasets, and attack modes. By attack, BASAR helps identify the potential causes of the model vulnerability and provides insights on possible improvements. Finally, to mitigate the newly identified threat, we propose a new adversarial training approach by leveraging the sophisticated distributions of on/off-manifold adversarial samples, called mixed manifold-based adversarial training (MMAT). MMAT can successfully help defend against adversarial attacks without compromising classification accuracy. △ Less

Submitted 6 May, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Accepted in Pattern Recognition. arXiv admin note: substantial text overlap with arXiv:2103.05266

arXiv:2208.00123 [pdf, other]

doi 10.1017/S0305004124000288

The ropelength conjecture of alternating knots

Authors: Yuanan Diao

Abstract: A long standing conjecture states that the ropelength of any alternating knot is at least proportional to its crossing number. In this paper we prove that this conjecture is true. That is, there exists a constant $b_0>0$ such that $R(K)\ge b_0Cr(K)$ for any alternating knot $K$, where $R(K)$ is the ropelength of $K$ and $Cr(K)$ is the crossing number of $K$. In this paper, we prove that this conje… ▽ More A long standing conjecture states that the ropelength of any alternating knot is at least proportional to its crossing number. In this paper we prove that this conjecture is true. That is, there exists a constant $b_0>0$ such that $R(K)\ge b_0Cr(K)$ for any alternating knot $K$, where $R(K)$ is the ropelength of $K$ and $Cr(K)$ is the crossing number of $K$. In this paper, we prove that this conjecture is true. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: 4 pages, 1 figure

MSC Class: Primary: 57K10; 57K31; 57K99

Journal ref: Math. Proc. Camb. Phil. Soc. 177 (2024) 367-369

arXiv:2207.02026 [pdf, other]

doi 10.14778/3551793.3551855

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

Authors: Chenghao Lyu, Qi Fan, Fei Song, Arnab Sinha, Yanlei Diao, Wei Chen, Li Ma, Yihui Feng, Yaliang Li, Kai Zeng, Jingren Zhou

Abstract: Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires mul… ▽ More Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires multi-objective optimization (MOO), and is compounded by the scale and complexity of big data systems while having to meet stringent time constraints for scheduling. This paper presents a MaxCompute-based integrated system to support multi-objective resource optimization via fine-grained instance-level modeling and optimization. We propose a new architecture that breaks RO into a series of simpler problems, new fine-grained predictive models, and novel optimization methods that exploit these models to make effective instance-level recommendations in a hierarchical MOO framework. Evaluation using production workloads shows that our new RO system could reduce 37-72% latency and 43-78% cost at the same time, compared to the current optimizer and scheduler, while running in 0.02-0.23s. △ Less

Submitted 9 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

Journal ref: PVLDB, 17(11): 3565-3579, 2024

arXiv:2204.12538 [pdf, other]

The average genus of oriented rational links with a given crossing number

Authors: Dawn Ray, Yuanan Diao

Abstract: In this paper, we enumerate the number of oriented rational knots and the number of oriented rational links with any given crossing number and minimum genus. This allows us to obtain a precise formula for the average minimal genus of oriented rational knots and links with any given crossing number. In this paper, we enumerate the number of oriented rational knots and the number of oriented rational links with any given crossing number and minimum genus. This allows us to obtain a precise formula for the average minimal genus of oriented rational knots and links with any given crossing number. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: 11 pages, 9 figures, 3 tables

MSC Class: Primary: 57K10; Secondary: 57K31

arXiv:2203.04713 [pdf, other]

Defending Black-box Skeleton-based Human Activity Classifiers

Authors: He Wang, Yunfeng Diao, Zichang Tan, Guodong Guo

Abstract: Skeletal motions have been heavily replied upon for human activity recognition (HAR). Recently, a universal vulnerability of skeleton-based HAR has been identified across a variety of classifiers and data, calling for mitigation. To this end, we propose the first black-box defense method for skeleton-based HAR to our best knowledge. Our method is featured by full Bayesian treatments of the clean d… ▽ More Skeletal motions have been heavily replied upon for human activity recognition (HAR). Recently, a universal vulnerability of skeleton-based HAR has been identified across a variety of classifiers and data, calling for mitigation. To this end, we propose the first black-box defense method for skeleton-based HAR to our best knowledge. Our method is featured by full Bayesian treatments of the clean data, the adversaries and the classifier, leading to (1) a new Bayesian Energy-based formulation of robust discriminative classifiers, (2) a new adversary sampling scheme based on natural motion manifolds, and (3) a new post-train Bayesian strategy for black-box defense. We name our framework Bayesian Energy-based Adversarial Training or BEAT. BEAT is straightforward but elegant, which turns vulnerable black-box classifiers into robust ones without sacrificing accuracy. It demonstrates surprising and universal effectiveness across a wide range of skeletal HAR classifiers and datasets, under various attacks. Code is available at https://github.com/realcrane/RobustActionRecogniser. △ Less

Submitted 2 December, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted in AAAI 2023

arXiv:2203.00595 [pdf]

doi 10.1002/mrm.29495

Parameter estimation for WMTI-Watson model of white matter using encoder-decoder recurrent neural network

Authors: Yujian Diao, Ileana Ozana Jelescu

Abstract: Biophysical modelling of the diffusion MRI signal provides estimates of specific microstructural tissue properties. Although nonlinear optimization such as non-linear least squares (NLLS) is the most widespread method for model estimation, it suffers from local minima and high computational cost. Deep Learning approaches are steadily replacing NL fitting, but come with the limitation that the mode… ▽ More Biophysical modelling of the diffusion MRI signal provides estimates of specific microstructural tissue properties. Although nonlinear optimization such as non-linear least squares (NLLS) is the most widespread method for model estimation, it suffers from local minima and high computational cost. Deep Learning approaches are steadily replacing NL fitting, but come with the limitation that the model needs to be retrained for each acquisition protocol and noise level. The White Matter Tract Integrity (WMTI)-Watson model was proposed as an implementation of the Standard Model of diffusion in white matter that estimates model parameters from the diffusion and kurtosis tensors (DKI). Here we proposed a deep learning approach based on the encoder-decoder recurrent neural network (RNN) to increase the robustness and accelerate the parameter estimation of WMTI-Watson. We use an embedding approach to render the model insensitive to potential differences in distributions between training data and experimental data. This RNN-based solver thus has the advantage of being highly efficient in computation and more readily translatable to other datasets, irrespective of acquisition protocol and underlying parameter distributions as long as DKI was pre-computed from the data. In this study, we evaluated the performance of NLLS, the RNN-based method and a multilayer perceptron (MLP) on synthetic and in vivo datasets of rat and human brain. We showed that the proposed RNN-based fitting approach had the advantage of highly reduced computation time over NLLS (from hours to seconds), with similar accuracy and precision but improved robustness, and superior translatability to new datasets over MLP. △ Less

Submitted 2 March, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Journal ref: Magn Reson Med. 2022;1-14

Showing 1–50 of 92 results for author: Diao, Y