Search | arXiv e-print repository

HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba

Authors: Yinuo Wang, Yuanyang Qi, Jinzhao Zhou, Gavin Tao

Abstract: End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a… ▽ More End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy. △ Less

Submitted 22 September, 2025; originally announced September 2025.

Comments: 10 pages

arXiv:2509.16950 [pdf, ps, other]

Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving

Authors: Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang

Abstract: Assessing the safety of autonomous driving (AD) systems against security threats, particularly backdoor attacks, is a stepping stone for real-world deployment. However, existing works mainly focus on pixel-level triggers that are impractical to deploy in the real world. We address this gap by introducing a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehi… ▽ More Assessing the safety of autonomous driving (AD) systems against security threats, particularly backdoor attacks, is a stepping stone for real-world deployment. However, existing works mainly focus on pixel-level triggers that are impractical to deploy in the real world. We address this gap by introducing a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehicles' trajectories as triggers. To generate precise trigger trajectories, we first use temporal logic (TL) specifications to define the behaviors of attacker vehicles. Configurable behavior models are then used to generate these trajectories, which are quantitatively evaluated and iteratively refined based on the TL specifications. We further develop a negative training strategy by incorporating patch trajectories that are similar to triggers but are designated not to activate the backdoor. It enhances the stealthiness of the attack and refines the system's responses to trigger scenarios. Through extensive experiments on 5 offline reinforcement learning (RL) driving agents with 6 trigger patterns and target action combinations, we demonstrate the flexibility and effectiveness of our proposed attack, showing the under-exploration of existing end-to-end AD systems' vulnerabilities to such trajectory-based backdoor attacks. △ Less

Submitted 11 October, 2025; v1 submitted 21 September, 2025; originally announced September 2025.

arXiv:2509.11752 [pdf, ps, other]

A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications

Authors: Hongyuan Zhang, Yuheng Wu, Mingyang Zhao, Zhiwei Chen, Rebecca Li, Fei Zhu, Haohan Zhao, Xiaohua Yuan, Meng Yang, Chunli Qiu, Xiang Cong, Haiyan Chen, Lina Luan, Randolph H. L. Wong, Huai Liao, Colin A Graham, Shi Chang, Guowei Tao, Dong Yi, Zhen Lei, Nassir Navab, Sebastien Ourselin, Jiebo Luo, Hongbin Liu, Gaofeng Meng

Abstract: Artificial intelligence (AI) that can effectively learn ultrasound representations by integrating multi-source data holds significant promise for advancing clinical care. However, the scarcity of large labeled datasets in real-world clinical environments and the limited generalizability of task-specific models have hindered the development of generalizable clinical AI models for ultrasound applica… ▽ More Artificial intelligence (AI) that can effectively learn ultrasound representations by integrating multi-source data holds significant promise for advancing clinical care. However, the scarcity of large labeled datasets in real-world clinical environments and the limited generalizability of task-specific models have hindered the development of generalizable clinical AI models for ultrasound applications. In this study, we present EchoCare, a novel ultrasound foundation model for generalist clinical use, developed via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData. EchoCareData comprises 4.5 million ultrasound images, sourced from over 23 countries across 5 continents and acquired via a diverse range of distinct imaging devices, thus encompassing global cohorts that are multi-center, multi-device, and multi-ethnic. Unlike prior studies that adopt off-the-shelf vision foundation model architectures, we introduce a hierarchical classifier into EchoCare to enable joint learning of pixel-level and representation-level features, capturing both global anatomical contexts and local ultrasound characteristics. With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks of varying diagnostic difficulties, spanning disease diagnosis, lesion segmentation, organ detection, landmark prediction, quantitative regression, imaging enhancement and report generation. The code and pretrained model are publicly released, rendering EchoCare accessible for fine-tuning and local adaptation, supporting extensibility to additional applications. EchoCare provides a fully open and generalizable foundation model to boost the development of AI technologies for diverse clinical ultrasound applications. △ Less

Submitted 15 September, 2025; originally announced September 2025.

arXiv:2509.07593 [pdf, ps, other]

Can SSD-Mamba2 Unlock Reinforcement Learning for End-to-End Motion Control?

Authors: Gavin Tao, Yinuo Wang, Jinzhao Zhou

Abstract: End-to-end reinforcement learning for motion control promises unified perception-action policies that scale across embodiments and tasks, yet most deployed controllers are either blind (proprioception-only) or rely on fusion backbones with unfavorable compute-memory trade-offs. Recurrent controllers struggle with long-horizon credit assignment, and Transformer-based fusion incurs quadratic cost in… ▽ More End-to-end reinforcement learning for motion control promises unified perception-action policies that scale across embodiments and tasks, yet most deployed controllers are either blind (proprioception-only) or rely on fusion backbones with unfavorable compute-memory trade-offs. Recurrent controllers struggle with long-horizon credit assignment, and Transformer-based fusion incurs quadratic cost in token length, limiting temporal and spatial context. We present a vision-driven cross-modal RL framework built on SSD-Mamba2, a selective state-space backbone that applies state-space duality (SSD) to enable both recurrent and convolutional scanning with hardware-aware streaming and near-linear scaling. Proprioceptive states and exteroceptive observations (e.g., depth tokens) are encoded into compact tokens and fused by stacked SSD-Mamba2 layers. The selective state-space updates retain long-range dependencies with markedly lower latency and memory use than quadratic self-attention, enabling longer look-ahead, higher token resolution, and stable training under limited compute. Policies are trained end-to-end under curricula that randomize terrain and appearance and progressively increase scene complexity. A compact, state-centric reward balances task progress, energy efficiency, and safety. Across diverse motion-control scenarios, our approach consistently surpasses strong state-of-the-art baselines in return, safety (collisions and falls), and sample efficiency, while converging faster at the same compute budget. These results suggest that SSD-Mamba2 provides a practical fusion backbone for scalable, foresightful, and efficient end-to-end motion control. △ Less

Submitted 9 September, 2025; originally announced September 2025.

Comments: 4 figures and 6 tables

arXiv:2509.03976 [pdf]

Harnessing modal fields retrieved from speckle for multi-dimensional metrology

Authors: Qingbo Liu, Zhongyang Xu, Guangkui Tao, Xiuyuan Sun, Min Xue, Weihao Yuan, Shilong Pan

Abstract: Although speckle is a powerful tool for high-precision metrology, large datasets and cumbersome training are always required to learn from the encoded speckle patterns, which is unfavorable for rapid deployment and multi-dimensional metrology. To enable high accuracy and fast training, physics-informed machine learning enforces physical laws to address high-dimensional problems. Here, we harness t… ▽ More Although speckle is a powerful tool for high-precision metrology, large datasets and cumbersome training are always required to learn from the encoded speckle patterns, which is unfavorable for rapid deployment and multi-dimensional metrology. To enable high accuracy and fast training, physics-informed machine learning enforces physical laws to address high-dimensional problems. Here, we harness the modal fields in a few-mode fiber, which follow the law of beam propagation, to enable high-accuracy and fast-training parameter estimation. Anti-noise fast mode decomposition is implemented to retrieve the modal fields from the speckles. The accuracy is enhanced since the modal fields enable parameter estimation at random points in the continuous space-time domain. Artificial tactile perception and multi-dimensional metrology are achieved with high accuracy because the modal fields respond diversely to different parameters. Meanwhile, the number of specklegrams for training is reduced by around 5 times. The training time of machine learning is significantly reduced by 800 times, from 9 hours and 45 minutes to 40 seconds. Therefore, harnessing the modal fields paves a new way for the speckle-based metrology to develop efficient, low-cost, multi-dimensional sensors, making it suitable for intelligent wearable devices, industrial robots and healthcare applications. △ Less

Submitted 4 September, 2025; originally announced September 2025.

arXiv:2508.19153 [pdf, ps, other]

QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning

Authors: Yinuo Wang, Gavin Tao

Abstract: We address vision-guided quadruped motion control with reinforcement learning (RL) and highlight the necessity of combining proprioception with vision for robust control. We propose QuadKAN, a spline-parameterized cross-modal policy instantiated with Kolmogorov-Arnold Networks (KANs). The framework incorporates a spline encoder for proprioception and a spline fusion head for proprioception-vision… ▽ More We address vision-guided quadruped motion control with reinforcement learning (RL) and highlight the necessity of combining proprioception with vision for robust control. We propose QuadKAN, a spline-parameterized cross-modal policy instantiated with Kolmogorov-Arnold Networks (KANs). The framework incorporates a spline encoder for proprioception and a spline fusion head for proprioception-vision inputs. This structured function class aligns the state-to-action mapping with the piecewise-smooth nature of gait, improving sample efficiency, reducing action jitter and energy consumption, and providing interpretable posture-action sensitivities. We adopt Multi-Modal Delay Randomization (MMDR) and perform end-to-end training with Proximal Policy Optimization (PPO). Evaluations across diverse terrains, including both even and uneven surfaces and scenarios with static or dynamic obstacles, demonstrate that QuadKAN achieves consistently higher returns, greater distances, and fewer collisions than state-of-the-art (SOTA) baselines. These results show that spline-parameterized policies offer a simple, effective, and interpretable alternative for robust vision-guided locomotion. A repository will be made available upon acceptance. △ Less

Submitted 6 September, 2025; v1 submitted 26 August, 2025; originally announced August 2025.

Comments: 14pages, 9 figures, Journal paper

arXiv:2508.11849 [pdf, ps, other]

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

Authors: Yinuo Wang, Gavin Tao

Abstract: We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight co… ▽ More We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget. △ Less

Submitted 28 August, 2025; v1 submitted 15 August, 2025; originally announced August 2025.

Comments: 13 pages

arXiv:2507.03619 [pdf, ps, other]

Blackbox Dataset Inference for LLM

Authors: Ruikai Zhou, Kang Yang, Xun Chen, Wendy Hui Wang, Guanhong Tao, Jun Xu

Abstract: Today, the training of large language models (LLMs) can involve personally identifiable information and copyrighted material, incurring dataset misuse. To mitigate the problem of dataset misuse, this paper explores \textit{dataset inference}, which aims to detect if a suspect model $\mathcal{M}$ used a victim dataset $\mathcal{D}$ in training. Previous research tackles dataset inference by aggrega… ▽ More Today, the training of large language models (LLMs) can involve personally identifiable information and copyrighted material, incurring dataset misuse. To mitigate the problem of dataset misuse, this paper explores \textit{dataset inference}, which aims to detect if a suspect model $\mathcal{M}$ used a victim dataset $\mathcal{D}$ in training. Previous research tackles dataset inference by aggregating results of membership inference attacks (MIAs) -- methods to determine whether individual samples are a part of the training dataset. However, restricted by the low accuracy of MIAs, previous research mandates grey-box access to $\mathcal{M}$ to get intermediate outputs (probabilities, loss, perplexity, etc.) for obtaining satisfactory results. This leads to reduced practicality, as LLMs, especially those deployed for profits, have limited incentives to return the intermediate outputs. In this paper, we propose a new method of dataset inference with only black-box access to the target model (i.e., assuming only the text-based responses of the target model are available). Our method is enabled by two sets of locally built reference models, one set involving $\mathcal{D}$ in training and the other not. By measuring which set of reference model $\mathcal{M}$ is closer to, we determine if $\mathcal{M}$ used $\mathcal{D}$ for training. Evaluations of real-world LLMs in the wild show that our method offers high accuracy in all settings and presents robustness against bypassing attempts. △ Less

Submitted 18 July, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

arXiv:2507.01401 [pdf, ps, other]

Medical-Knowledge Driven Multiple Instance Learning for Classifying Severe Abdominal Anomalies on Prenatal Ultrasound

Authors: Huanwen Liang, Jingxian Xu, Yuanji Zhang, Yuhao Huang, Yuhan Zhang, Xin Yang, Ran Li, Xuedong Deng, Yanjun Liu, Guowei Tao, Yun Wu, Sheng Zhao, Xinru Gao, Dong Ni

Abstract: Fetal abdominal malformations are serious congenital anomalies that require accurate diagnosis to guide pregnancy management and reduce mortality. Although AI has demonstrated significant potential in medical diagnosis, its application to prenatal abdominal anomalies remains limited. Most existing studies focus on image-level classification and rely on standard plane localization, placing less emp… ▽ More Fetal abdominal malformations are serious congenital anomalies that require accurate diagnosis to guide pregnancy management and reduce mortality. Although AI has demonstrated significant potential in medical diagnosis, its application to prenatal abdominal anomalies remains limited. Most existing studies focus on image-level classification and rely on standard plane localization, placing less emphasis on case-level diagnosis. In this paper, we develop a case-level multiple instance learning (MIL)-based method, free of standard plane localization, for classifying fetal abdominal anomalies in prenatal ultrasound. Our contribution is three-fold. First, we adopt a mixture-of-attention-experts module (MoAE) to weight different attention heads for various planes. Secondly, we propose a medical-knowledge-driven feature selection module (MFS) to align image features with medical knowledge, performing self-supervised image token selection at the case-level. Finally, we propose a prompt-based prototype learning (PPL) to enhance the MFS. Extensively validated on a large prenatal abdominal ultrasound dataset containing 2,419 cases, with a total of 24,748 images and 6 categories, our proposed method outperforms the state-of-the-art competitors. Codes are available at:https://github.com/LL-AC/AAcls. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: Accepted by MICCAI 2025

arXiv:2506.08211 [pdf, ps, other]

Standard LSParameter Estimators Ensure Finite Convergence Time for Linear Regression Equations Under an Interval Excitation Assumption

Authors: Romeo Ortega, Jose Guadalupe Romero, Stanislav Aranovskiy, Gang Tao

Abstract: In this brief note we recall the little-known fact that, for linear regression equations (LRE) with intervally excited (IE) regressors, standard Least Square (LS) parameter estimators ensure finite convergence time (FCT) of the estimated parameters. The convergence time being equal to the time length needed to comply with the IE assumption. As is well-known, IE is necessary and sufficient for the… ▽ More In this brief note we recall the little-known fact that, for linear regression equations (LRE) with intervally excited (IE) regressors, standard Least Square (LS) parameter estimators ensure finite convergence time (FCT) of the estimated parameters. The convergence time being equal to the time length needed to comply with the IE assumption. As is well-known, IE is necessary and sufficient for the identifiability of the LRE-hence, it is the weakest assumption for the on-or off-line solution of the parameter estimation problem. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.07214 [pdf, other]

Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation

Authors: Zhiyuan Zhong, Zhen Sun, Yepang Liu, Xinlei He, Guanhong Tao

Abstract: Vision Language Models (VLMs) have shown remarkable performance, but are also vulnerable to backdoor attacks whereby the adversary can manipulate the model's outputs through hidden triggers. Prior attacks primarily rely on single-modality triggers, leaving the crucial cross-modal fusion nature of VLMs largely unexplored. Unlike prior work, we identify a novel attack surface that leverages cross-mo… ▽ More Vision Language Models (VLMs) have shown remarkable performance, but are also vulnerable to backdoor attacks whereby the adversary can manipulate the model's outputs through hidden triggers. Prior attacks primarily rely on single-modality triggers, leaving the crucial cross-modal fusion nature of VLMs largely unexplored. Unlike prior work, we identify a novel attack surface that leverages cross-modal semantic mismatches as implicit triggers. Based on this insight, we propose BadSem (Backdoor Attack with Semantic Manipulation), a data poisoning attack that injects stealthy backdoors by deliberately misaligning image-text pairs during training. To perform the attack, we construct SIMBad, a dataset tailored for semantic manipulation involving color and object attributes. Extensive experiments across four widely used VLMs show that BadSem achieves over 98% average ASR, generalizes well to out-of-distribution datasets, and can transfer across poisoning modalities. Our detailed analysis using attention visualization shows that backdoored models focus on semantically sensitive regions under mismatched conditions while maintaining normal behavior on clean inputs. To mitigate the attack, we try two defense strategies based on system prompt and supervised fine-tuning but find that both of them fail to mitigate the semantic backdoor. Our findings highlight the urgent need to address semantic vulnerabilities in VLMs for their safer deployment. △ Less

Submitted 8 June, 2025; originally announced June 2025.

arXiv:2505.12360 [pdf, ps, other]

LaPON: A Lagrange's-mean-value-theorem-inspired operator network for solving PDEs and its application on NSE

Authors: Siwen Zhang, Xizeng Zhao, Zhengzhi Deng, Zhaoyuan Huang, Gang Tao, Nuo Xu, Zhouteng Ye

Abstract: Accelerating the solution of nonlinear partial differential equations (PDEs) while maintaining accuracy at coarse spatiotemporal resolution remains a key challenge in scientific computing. Physics-informed machine learning (ML) methods such as Physics-Informed Neural Networks (PINNs) introduce prior knowledge through loss functions to ensure physical consistency, but their "soft constraints" are u… ▽ More Accelerating the solution of nonlinear partial differential equations (PDEs) while maintaining accuracy at coarse spatiotemporal resolution remains a key challenge in scientific computing. Physics-informed machine learning (ML) methods such as Physics-Informed Neural Networks (PINNs) introduce prior knowledge through loss functions to ensure physical consistency, but their "soft constraints" are usually not strictly satisfied. Here, we propose LaPON, an operator network inspired by the Lagrange's mean value theorem, which embeds prior knowledge directly into the neural network architecture instead of the loss function, making the neural network naturally satisfy the given constraints. This is a hybrid framework that combines neural operators with traditional numerical methods, where neural operators are used to compensate for the effect of discretization errors on the analytical scale in under-resolution simulations. As evaluated on turbulence problem modeled by the Navier-Stokes equations (NSE), the multiple time step extrapolation accuracy and stability of LaPON exceed the direct numerical simulation baseline at 8x coarser grids and 8x larger time steps, while achieving a vorticity correlation of more than 0.98 with the ground truth. It is worth noting that the model can be well generalized to unseen flow states, such as turbulence with different forcing, without retraining. In addition, with the same training data, LaPON's comprehensive metrics on the out-of-distribution test set are at least approximately twice as good as two popular ML baseline methods. By combining numerical computing with machine learning, LaPON provides a scalable and reliable solution for high-fidelity fluid dynamics simulation, showing the potential for wide application in fields such as weather forecasting and engineering design. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.10464 [pdf, ps, other]

HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation

Authors: Jiaming Liang, Lihuan Dai, Xiaoqi Sheng, Xiangguang Chen, Chun Yao, Guihua Tao, Qibin Leng, Hongmin Cai, Xi Zhong

Abstract: Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial res… ▽ More Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial resource expenditure and a potential decline in analysis accuracy. To address those challenges, we have made two major contributions: First, we publicly disseminate the GCM 2025 dataset, which serves as the first large-scale, open-source collection of gastric cancer multimodal MRI scans, featuring professionally annotated FS-T2W, CE-T1W, and ADC images from 500 patients. Second, we introduce HWA-UNETR, a novel 3D segmentation framework that employs an original HWA block with learnable window aggregation layers to establish dynamic feature correspondences between different modalities' anatomical structures, and leverages the innovative tri-orientated fusion mamba mechanism for context modeling and capturing long-range spatial dependencies. Extensive experiments on our GCM 2025 dataset and the publicly BraTS 2021 dataset validate the performance of our framework, demonstrating that the new approach surpasses existing methods by up to 1.68\% in the Dice score while maintaining solid robustness. The dataset and code are public via https://github.com/JeMing-creater/HWA-UNETR. △ Less

Submitted 26 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

Comments: This work has been provisionally accepted for MICCAI 2025

arXiv:2504.09757 [pdf, other]

Alleviating the Fear of Losing Alignment in LLM Fine-tuning

Authors: Kang Yang, Guanhong Tao, Xun Chen, Jun Xu

Abstract: Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called \textit{alignment} can help. Yet, alignment can be unexpectedly co… ▽ More Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called \textit{alignment} can help. Yet, alignment can be unexpectedly compromised when fine-tuning an LLM for downstream tasks. This paper focuses on recovering the alignment lost during fine-tuning. We observe that there are two distinct directions inherent in an aligned LLM: the \textit{aligned direction} and the \textit{harmful direction}. An LLM is inclined to answer questions in the aligned direction while refusing queries in the harmful direction. Therefore, we propose to recover the harmful direction of the fine-tuned model that has been compromised. Specifically, we restore a small subset of the fine-tuned model's weight parameters from the original aligned model using gradient descent. We also introduce a rollback mechanism to avoid aggressive recovery and maintain downstream task performance. Our evaluation on 125 fine-tuned LLMs demonstrates that our method can reduce their harmful rate (percentage of answering harmful questions) from 33.25\% to 1.74\%, without sacrificing task performance much. In contrast, the existing methods either only reduce the harmful rate to a limited extent or significantly impact the normal functionality. Our code is available at https://github.com/kangyangWHU/LLMAlignment △ Less

Submitted 13 April, 2025; originally announced April 2025.

arXiv:2503.15554 [pdf, other]

A Comprehensive Study of LLM Secure Code Generation

Authors: Shih-Chieh Dai, Jun Xu, Guanhong Tao

Abstract: LLMs are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current evaluation schemes leave several concerns unaddressed. Specifically, most existing studies evaluate security and functional correctness separately, using different datasets. That is, they… ▽ More LLMs are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current evaluation schemes leave several concerns unaddressed. Specifically, most existing studies evaluate security and functional correctness separately, using different datasets. That is, they assess vulnerabilities using security-related code datasets while validating functionality with general code datasets. In addition, prior research primarily relies on a single static analyzer, CodeQL, to detect vulnerabilities in generated code, which limits the scope of security evaluation. In this work, we conduct a comprehensive study to systematically assess the improvements introduced by four state-of-the-art secure code generation techniques. Specifically, we apply both security inspection and functionality validation to the same generated code and evaluate these two aspects together. We also employ three popular static analyzers and two LLMs to identify potential vulnerabilities in the generated code. Our study reveals that existing techniques often compromise the functionality of generated code to enhance security. Their overall performance remains limited when evaluating security and functionality together. In fact, many techniques even degrade the performance of the base LLM. Our further inspection reveals that these techniques often either remove vulnerable lines of code entirely or generate ``garbage code'' that is unrelated to the intended task. Moreover, the commonly used static analyzer CodeQL fails to detect several vulnerabilities, further obscuring the actual security improvements achieved by existing techniques. Our study serves as a guideline for a more rigorous and comprehensive evaluation of secure code generation performance in future work. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2503.14906 [pdf, other]

doi 10.1016/j.media.2025.103725

FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis

Authors: Yaofei Duan, Tao Tan, Zhiyuan Zhu, Yuhao Huang, Yuanji Zhang, Rui Gao, Patrick Cheong-Iao Pang, Xinru Gao, Guowei Tao, Xiang Cong, Zhou Li, Lianying Liang, Guangzhi He, Linliang Yin, Xuedong Deng, Xin Yang, Dong Ni

Abstract: Fetal ultrasound (US) examinations require the acquisition of multiple planes, each providing unique diagnostic information to evaluate fetal development and screening for congenital anomalies. However, obtaining a comprehensive, multi-plane annotated fetal US dataset remains challenging, particularly for rare or complex anomalies owing to their low incidence and numerous subtypes. This poses diff… ▽ More Fetal ultrasound (US) examinations require the acquisition of multiple planes, each providing unique diagnostic information to evaluate fetal development and screening for congenital anomalies. However, obtaining a comprehensive, multi-plane annotated fetal US dataset remains challenging, particularly for rare or complex anomalies owing to their low incidence and numerous subtypes. This poses difficulties in training novice radiologists and developing robust AI models, especially for detecting abnormal fetuses. In this study, we introduce a Flexible Fetal US image generation framework (FetalFlex) to address these challenges, which leverages anatomical structures and multimodal information to enable controllable synthesis of fetal US images across diverse planes. Specifically, FetalFlex incorporates a pre-alignment module to enhance controllability and introduces a repaint strategy to ensure consistent texture and appearance. Moreover, a two-stage adaptive sampling strategy is developed to progressively refine image quality from coarse to fine levels. We believe that FetalFlex is the first method capable of generating both in-distribution normal and out-of-distribution abnormal fetal US images, without requiring any abnormal data. Experiments on multi-center datasets demonstrate that FetalFlex achieved state-of-the-art performance across multiple image quality metrics. A reader study further confirms the close alignment of the generated results with expert visual assessments. Furthermore, synthetic images by FetalFlex significantly improve the performance of six typical deep models in downstream classification and anomaly detection tasks. Lastly, FetalFlex's anatomy-level controllable generation offers a unique advantage for anomaly simulation and creating paired or counterfactual data at the pixel level. The demo is available at: https://dyf1023.github.io/FetalFlex/. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: 18 pages, 10 figures

arXiv:2502.03698 [pdf, ps, other]

How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies

Authors: Akansha Kalra, Basavasagar Patil, Guanhong Tao, Daniel S. Brown

Abstract: Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Polic… ▽ More Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantizied Behavior Transformer (VQ-BET). We study the vulnerability of these methods to universal adversarial perturbations. Our experiments on several simulated robotic manipulation tasks reveal that most of the current methods are highly vulnerable to adversarial perturbations. We also show that these attacks are often transferable across algorithms, architectures, and tasks, raising concerning security vulnerabilities to black-box attacks. To the best of our knowledge, we are the first to present a systematic study of the vulnerabilities of different LfD algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern BC algorithms, paving the way for future work in addressing such limitations. △ Less

Submitted 13 October, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2501.03544 [pdf, ps, other]

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Authors: Lingzhi Yuan, Xinfeng Li, Chejian Xu, Guanhong Tao, Xiaojun Jia, Yihao Huang, Wei Dong, Yang Liu, Bo Li

Abstract: Recent text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions. However, these models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content, such as sexually explicit, violent, political, and disturbing images, raising serious ethical concerns. In this work, we present PromptGuard, a novel content modera… ▽ More Recent text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions. However, these models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content, such as sexually explicit, violent, political, and disturbing images, raising serious ethical concerns. In this work, we present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models (LLMs) for safety alignment. Unlike LLMs, T2I models lack a direct interface for enforcing behavioral guidelines. Our key idea is to optimize a safety soft prompt that functions as an implicit system prompt within the T2I model's textual embedding space. This universal soft prompt (P*) directly moderates NSFW inputs, enabling safe yet realistic image generation without altering the inference efficiency or requiring proxy models. We further enhance its reliability and helpfulness through a divide-and-conquer strategy, which optimizes category-specific soft prompts and combines them into holistic safety guidance. Extensive experiments across five datasets demonstrate that PromptGuard effectively mitigates NSFW content generation while preserving high-quality benign outputs. PromptGuard achieves 3.8 times faster than prior content moderation methods, surpassing eight state-of-the-art defenses with an optimal unsafe ratio down to 5.84%. △ Less

Submitted 5 September, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

Comments: 15 pages, 8 figures, 14 tables

arXiv:2412.11454 [pdf, ps, other]

Adaptive Output Tracking Control with Reference Model System Uncertainties: Extensions

Authors: Gang Tao

Abstract: This paper develops some extensions to the work of [1] which studied the continuous-time adaptive output tracking control schemes with the reference output signal generated from an unknown reference model system. The presented extensions include adaptive control schemes with reference model system uncertainties for single-input single-output (SISO) discrete-time systems and multi-input multi-outpu… ▽ More This paper develops some extensions to the work of [1] which studied the continuous-time adaptive output tracking control schemes with the reference output signal generated from an unknown reference model system. The presented extensions include adaptive control schemes with reference model system uncertainties for single-input single-output (SISO) discrete-time systems and multi-input multi-output (MIMO) discrete-time, continuous-time and feedback linearizable systems as well. To deal with such reference model system uncertainties, the adaptive controller structures are expanded to include a parametrized estimator of the equivalent reference input signal, to ensure a completely parametrized error system with a known regressor vector, suitable for stable adaptive controller parameter update law design. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2411.15367 [pdf, other]

Exploiting Watermark-Based Defense Mechanisms in Text-to-Image Diffusion Models for Unauthorized Data Usage

Authors: Soumil Datta, Shih-Chieh Dai, Leo Yu, Guanhong Tao

Abstract: Text-to-image diffusion models, such as Stable Diffusion, have shown exceptional potential in generating high-quality images. However, recent studies highlight concerns over the use of unauthorized data in training these models, which may lead to intellectual property infringement or privacy violations. A promising approach to mitigate these issues is to apply a watermark to images and subsequentl… ▽ More Text-to-image diffusion models, such as Stable Diffusion, have shown exceptional potential in generating high-quality images. However, recent studies highlight concerns over the use of unauthorized data in training these models, which may lead to intellectual property infringement or privacy violations. A promising approach to mitigate these issues is to apply a watermark to images and subsequently check if generative models reproduce similar watermark features. In this paper, we examine the robustness of various watermark-based protection methods applied to text-to-image models. We observe that common image transformations are ineffective at removing the watermark effect. Therefore, we propose RATTAN, that leverages the diffusion process to conduct controlled image generation on the protected input, preserving the high-level features of the input while ignoring the low-level details utilized by watermarks. A small number of generated images are then used to fine-tune protected models. Our experiments on three datasets and 140 text-to-image diffusion models reveal that existing state-of-the-art protections are not robust against RATTAN. △ Less

Submitted 26 November, 2024; v1 submitted 22 November, 2024; originally announced November 2024.

arXiv:2407.11372 [pdf, other]

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Authors: Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang

Abstract: Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent ad… ▽ More Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: The 18th European Conference on Computer Vision ECCV 2024

arXiv:2406.05580 [pdf, ps, other]

Adaptive Output Tracking Control with Reference Model System Uncertainties

Authors: Gang Tao

Abstract: This paper develops adaptive output tracking control schemes with the reference output signal generated from an unknown reference system whose output derivatives are also unknown. To deal with such reference system uncertainties, an expanded adaptive controller structure is developed to include a parametrized estimator of the equivalent reference input signal. Without using the knowledge of the re… ▽ More This paper develops adaptive output tracking control schemes with the reference output signal generated from an unknown reference system whose output derivatives are also unknown. To deal with such reference system uncertainties, an expanded adaptive controller structure is developed to include a parametrized estimator of the equivalent reference input signal. Without using the knowledge of the reference system transfer function and equivalent input, both are the critical components of a traditional model reference adaptive control (MRAC) scheme, the developed new MRAC schemes designed for various cases plant and reference model uncertainties, ensure completely parametrized error systems and stable parameter adaptation, leading to the desired closed-loop system stability and asymptotic output tracking. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.07804 [pdf]

Multiple stochastic resonances and inverse stochastic resonances in asymmetric bistable system under the ultra-high frequency excitation

Authors: Cong Wang, Zhongqiu Wang, Jianhua Yang, Miguel A. F. Sanjuán, Gong Tao, Zhen Shan, Mengen Shen

Abstract: Ultra-high frequency linear frequency modulation (UHF-LFM) signal, as a kind of typical non-stationary signal, has been widely used in microwave radar and other fields, with advantages such as long transmission distance, strong anti-interference ability, and wide bandwidth. Utilizing optimal dynamics response has unique advantages in weak feature identification under strong background noise. We pr… ▽ More Ultra-high frequency linear frequency modulation (UHF-LFM) signal, as a kind of typical non-stationary signal, has been widely used in microwave radar and other fields, with advantages such as long transmission distance, strong anti-interference ability, and wide bandwidth. Utilizing optimal dynamics response has unique advantages in weak feature identification under strong background noise. We propose a new stochastic resonance method in an asymmetric bistable system with the time-varying parameter to handle this special non-stationary signal. Interestingly, the nonlinear response exhibits multiple stochastic resonances (MSR) and inverse stochastic resonances (ISR) under UHF-LFM signal excitation, and some resonance regions may deviate or collapse due to the influence of system asymmetry. In addition, we analyze the responses of each resonance region and the mechanism and evolution law of each resonance region in detail. Finally, we significantly expand the resonance region within the parameter range by optimizing the time scale, which verifies the effectiveness of the proposed time-varying scale method. The mechanism and evolution law of MSR and ISR will provide references for researchers in related fields. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 23 pages, 13 figures

arXiv:2404.10944 [pdf, other]

Threat Behavior Textual Search by Attention Graph Isomorphism

Authors: Chanwoo Bae, Guanhong Tao, Zhuo Zhang, Xiangyu Zhang

Abstract: Cyber attacks cause over \$1 trillion loss every year. An important task for cyber security analysts is attack forensics. It entails understanding malware behaviors and attack origins. However, existing automated or manual malware analysis can only disclose a subset of behaviors due to inherent difficulties (e.g., malware cloaking and obfuscation). As such, analysts often resort to text search tec… ▽ More Cyber attacks cause over \$1 trillion loss every year. An important task for cyber security analysts is attack forensics. It entails understanding malware behaviors and attack origins. However, existing automated or manual malware analysis can only disclose a subset of behaviors due to inherent difficulties (e.g., malware cloaking and obfuscation). As such, analysts often resort to text search techniques to identify existing malware reports based on the symptoms they observe, exploiting the fact that malware samples share a lot of similarity, especially those from the same origin. In this paper, we propose a novel malware behavior search technique that is based on graph isomorphism at the attention layers of Transformer models. We also compose a large dataset collected from various agencies to facilitate such research. Our technique outperforms state-of-the-art methods, such as those based on sentence embeddings and keywords by 6-14%. In the case study of 10 real-world malwares, our technique can correctly attribute 8 of them to their ground truth origins while using Google only works for 3 cases. △ Less

Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Journal ref: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

arXiv:2403.17235 [pdf, ps, other]

A Discrete-Time Least-Squares Adaptive State Tracking Control Scheme with A Mobile-Robot System Study

Authors: Qianhong Zhao, Gang Tao

Abstract: This paper develops an adaptive state tracking control scheme for discrete-time systems, using the least-squares algorithm, as the new solution to the long-standing discrete-time adaptive state tracking control problem to which the Lyapunov method (well-developed for the continuous-time adaptive state tracking problem) is not applicable. The new adaptive state tracking scheme is based on a recentl… ▽ More This paper develops an adaptive state tracking control scheme for discrete-time systems, using the least-squares algorithm, as the new solution to the long-standing discrete-time adaptive state tracking control problem to which the Lyapunov method (well-developed for the continuous-time adaptive state tracking problem) is not applicable. The new adaptive state tracking scheme is based on a recently-developed new discrete-time error model which has been used for gradient algorithm based state tracking control schemes, and uses the least-squares algorithm for parameter adaptation. The new least-squares algorithm is derived to minimize an accumulative estimation error, to ensure certain optimality for parameter estimation. The system stability and output tracking properties are studied. Technical results are presented in terms of plant-model matching, error model, adaptive law, optimality formulation, and stability and tracking analysis. The developed adaptive control scheme is applied to a discrete-time multiple mobile robot system to meet an adaptive state tracking objective. In addition, a collision avoidance mechanism is proposed to prevent collisions in the whole tracking process. Simulation results are presented, which verify the desired system state tracking properties under the developed least-squares algorithm based adaptive control scheme. △ Less

Submitted 1 February, 2025; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.17188 [pdf, other]

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Authors: Siyuan Cheng, Guanhong Tao, Yingqi Liu, Guangyu Shen, Shengwei An, Shiwei Feng, Xiangzhe Xu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

Abstract: Backdoor attack poses a significant security threat to Deep Learning applications. Existing attacks are often not evasive to established backdoor detection techniques. This susceptibility primarily stems from the fact that these attacks typically leverage a universal trigger pattern or transformation function, such that the trigger can cause misclassification for any input. In response to this, re… ▽ More Backdoor attack poses a significant security threat to Deep Learning applications. Existing attacks are often not evasive to established backdoor detection techniques. This susceptibility primarily stems from the fact that these attacks typically leverage a universal trigger pattern or transformation function, such that the trigger can cause misclassification for any input. In response to this, recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions. While these approaches manage to evade detection to some extent, they reveal vulnerability to existing backdoor mitigation techniques. To address and enhance both evasiveness and resilience, we introduce a novel backdoor attack LOTUS. Specifically, it leverages a secret function to separate samples in the victim class into a set of partitions and applies unique triggers to different partitions. Furthermore, LOTUS incorporates an effective trigger focusing mechanism, ensuring only the trigger corresponding to the partition can induce the backdoor behavior. Extensive experimental results show that LOTUS can achieve high attack success rate across 4 datasets and 7 model structures, and effectively evading 13 backdoor detection and mitigation techniques. The code is available at https://github.com/Megum1/LOTUS. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

arXiv:2403.04303 [pdf, other]

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Authors: Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, Yong Liu, Chengjie Wang

Abstract: Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even rea… ▽ More Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even reach dozens, further exacerbating this issue. To mitigate this problem, we introduce LORS (LOw-rank Residual Structure). LORS allows stacked modules to share the majority of parameters, requiring a much smaller number of unique ones per module to match or even surpass the performance of using entirely distinct ones, thereby significantly reducing parameter usage. We validate our method by applying it to the stacked decoders of a query-based object detector, and conduct extensive experiments on the widely used MS COCO dataset. Experimental results demonstrate the effectiveness of our method, as even with a 70\% reduction in the parameters of the decoder, our method still enables the model to achieve comparable or △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 9 pages, 5 figures, 11 tables, CVPR2024 accepted

arXiv:2402.10930 [pdf, other]

ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters

Authors: Shiwei Liu, Guanchen Tao, Yifei Zou, Derek Chow, Zichen Fan, Kauna Lei, Bangfei Pan, Dennis Sylvester, Gregory Kielian, Mehdi Saligane

Abstract: The self-attention mechanism distinguishes transformer-based large language models (LLMs) apart from convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon remains challenging due to the extensive use of Softmax in self-attention. In addition to the non-linearity, the low arithmetic intensity significantly limits processing pa… ▽ More The self-attention mechanism distinguishes transformer-based large language models (LLMs) apart from convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon remains challenging due to the extensive use of Softmax in self-attention. In addition to the non-linearity, the low arithmetic intensity significantly limits processing parallelism, especially when working with longer contexts. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design that serves as an efficient alternative to Softmax. ConSmax utilizes differentiable normalization parameters to eliminate the need for maximum searching and denominator summation in Softmax. This approach enables extensive parallelization while still executing the essential functions of Softmax. Moreover, a scalable ConSmax hardware design with a bitwidth-split look-up table (LUT) can achieve lossless non-linear operations and support mixed-precision computing. Experimental results show that ConSmax achieves a minuscule power consumption of 0.2mW and an area of 0.0008mm^2 at 1250MHz working frequency in 16nm FinFET technology. For open-source contribution, we further implement our design with the OpenROAD toolchain under SkyWater's 130nm CMOS technology. The corresponding power is 2.69mW and the area is 0.007mm^2. ConSmax achieves 3.35x power savings and 2.75x area savings in 16nm technology, and 3.15x power savings and 4.14x area savings with the open-source EDA toolchain. In the meantime, it also maintains comparable accuracy on the GPT-2 model and the WikiText103 dataset. The project is available at https://github.com/ReaLLMASIC/ConSmax △ Less

Submitted 14 November, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Journal ref: International Conference on Computer-Aided Design 2024

arXiv:2402.05467 [pdf, other]

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia

Authors: Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang

Abstract: Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities. As they find increased use in sensitive tasks, safety concerns have gained widespread attention. Extensive efforts have been dedicated to aligning LLMs with human moral principles to ensure their safe deployment. Despite their potential,… ▽ More Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities. As they find increased use in sensitive tasks, safety concerns have gained widespread attention. Extensive efforts have been dedicated to aligning LLMs with human moral principles to ensure their safe deployment. Despite their potential, recent research indicates aligned LLMs are prone to specialized jailbreaking prompts that bypass safety measures to elicit violent and harmful content. The intrinsic discrete nature and substantial scale of contemporary LLMs pose significant challenges in automatically generating diverse, efficient, and potent jailbreaking prompts, representing a continuous obstacle. In this paper, we introduce RIPPLE (Rapid Optimization via Subconscious Exploitation and Echopraxia), a novel optimization-based method inspired by two psychological concepts: subconsciousness and echopraxia, which describe the processes of the mind that occur without conscious awareness and the involuntary mimicry of actions, respectively. Evaluations across 6 open-source LLMs and 4 commercial LLM APIs show RIPPLE achieves an average Attack Success Rate of 91.5\%, outperforming five current methods by up to 47.0\% with an 8x reduction in overhead. Furthermore, it displays significant transferability and stealth, successfully evading established detection mechanisms. The code of our work is available at \url{https://github.com/SolidShen/RIPPLE_official/tree/official} △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.00905 [pdf, other]

Opening A Pandora's Box: Things You Should Know in the Era of Custom GPTs

Authors: Guanhong Tao, Siyuan Cheng, Zhuo Zhang, Junmin Zhu, Guangyu Shen, Xiangyu Zhang

Abstract: The emergence of large language models (LLMs) has significantly accelerated the development of a wide range of applications across various fields. There is a growing trend in the construction of specialized platforms based on LLMs, such as the newly introduced custom GPTs by OpenAI. While custom GPTs provide various functionalities like web browsing and code execution, they also introduce signific… ▽ More The emergence of large language models (LLMs) has significantly accelerated the development of a wide range of applications across various fields. There is a growing trend in the construction of specialized platforms based on LLMs, such as the newly introduced custom GPTs by OpenAI. While custom GPTs provide various functionalities like web browsing and code execution, they also introduce significant security threats. In this paper, we conduct a comprehensive analysis of the security and privacy issues arising from the custom GPT platform. Our systematic examination categorizes potential attack scenarios into three threat models based on the role of the malicious actor, and identifies critical data exchange channels in custom GPTs. Utilizing the STRIDE threat modeling framework, we identify 26 potential attack vectors, with 19 being partially or fully validated in real-world settings. Our findings emphasize the urgent need for robust security and privacy measures in the custom GPT ecosystem, especially in light of the forthcoming launch of the official GPT store by OpenAI. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.10479 [pdf, other]

A Soft Contrastive Learning-based Prompt Model for Few-shot Sentiment Analysis

Authors: Jingyi Zhou, Jie Zhou, Jiabao Zhao, Siyin Wang, Haijun Shan, Gui Tao, Qi Zhang, Xuanjing Huang

Abstract: Few-shot text classification has attracted great interest in both academia and industry due to the lack of labeled data in many fields. Different from general text classification (e.g., topic classification), few-shot sentiment classification is more challenging because the semantic distances among the classes are more subtle. For instance, the semantic distances between the sentiment labels in a… ▽ More Few-shot text classification has attracted great interest in both academia and industry due to the lack of labeled data in many fields. Different from general text classification (e.g., topic classification), few-shot sentiment classification is more challenging because the semantic distances among the classes are more subtle. For instance, the semantic distances between the sentiment labels in a positive or negative polarity (e.g., ``love" and ``joy", ``remorse" and ``sadness") are close, while the distances are large for the sentiment labels in two opposite polarities (e.g., ``love" and ``sadness"). To address this problem, we propose a Soft Contrastive learning-based Prompt (\texttt{SCP}) model for few-shot sentiment analysis. First, we design a sentiment-aware chain of thought prompt module to guide the model to predict the sentiment from coarse grain to fine grain via a series of intermediate reasoning steps. Then, we propose a soft contrastive learning algorithm to take the correlation of the labels into account. A series of experiments on several sentiment analysis datasets show the great advantages of \texttt{SCP} by comparing it with SOTA baselines (e.g., ChatGPT). △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP

arXiv:2312.04782 [pdf, other]

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

Authors: Zhuo Zhang, Guangyu Shen, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang

Abstract: Large Language Models (LLMs) are now widely used in various applications, making it crucial to align their ethical standards with human values. However, recent jail-breaking methods demonstrate that this alignment can be undermined using carefully constructed prompts. In our study, we reveal a new threat to LLM alignment when a bad actor has access to the model's output logits, a common feature in… ▽ More Large Language Models (LLMs) are now widely used in various applications, making it crucial to align their ethical standards with human values. However, recent jail-breaking methods demonstrate that this alignment can be undermined using carefully constructed prompts. In our study, we reveal a new threat to LLM alignment when a bad actor has access to the model's output logits, a common feature in both open-source LLMs and many commercial LLM APIs (e.g., certain GPT models). It does not rely on crafting specific prompts. Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits. By forcefully selecting lower-ranked output tokens during the auto-regressive generation process at a few critical output positions, we can compel the model to reveal these hidden responses. We term this process model interrogation. This approach differs from and outperforms jail-breaking methods, achieving 92% effectiveness compared to 62%, and is 10 to 20 times faster. The harmful content uncovered through our method is more relevant, complete, and clear. Additionally, it can complement jail-breaking strategies, with which results in further boosting attack performance. Our findings indicate that interrogation can extract toxic knowledge even from models specifically designed for coding tasks. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.00050 [pdf, other]

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

Authors: Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

Abstract: Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image… ▽ More Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image (e.g., an improper photo). However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility. △ Less

Submitted 4 February, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2308.15449 [pdf, other]

PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

Authors: Xiangzhe Xu, Zhou Xuan, Shiwei Feng, Siyuan Cheng, Yapeng Ye, Qingkai Shi, Guanhong Tao, Le Yu, Zhuo Zhang, Xiangyu Zhang

Abstract: Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper,… ▽ More Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%. △ Less

Submitted 29 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.06605 [pdf, other]

Towards Exascale Computation for Turbomachinery Flows

Authors: Yuhang Fu, Weiqi Shen, Jiahuan Cui, Yao Zheng, Guangwen Yang, Zhao Liu, Jifa Zhang, Tingwei Ji, Fangfang Xie, Xiaojing Lv, Hanyue Liu, Xu Liu, Xiyang Liu, Xiaoyu Song, Guocheng Tao, Yan Yan, Paul Tucker, Steven A. E. Miller, Shirui Luo, Seid Koric, Weimin Zheng

Abstract: A state-of-the-art large eddy simulation code has been developed to solve compressible flows in turbomachinery. The code has been engineered with a high degree of scalability, enabling it to effectively leverage the many-core architecture of the new Sunway system. A consistent performance of 115.8 DP-PFLOPs has been achieved on a high-pressure turbine cascade consisting of over 1.69 billion mesh e… ▽ More A state-of-the-art large eddy simulation code has been developed to solve compressible flows in turbomachinery. The code has been engineered with a high degree of scalability, enabling it to effectively leverage the many-core architecture of the new Sunway system. A consistent performance of 115.8 DP-PFLOPs has been achieved on a high-pressure turbine cascade consisting of over 1.69 billion mesh elements and 865 billion Degree of Freedoms (DOFs). By leveraging a high-order unstructured solver and its portability to large heterogeneous parallel systems, we have progressed towards solving the grand challenge problem outlined by NASA, which involves a time-dependent simulation of a complete engine, incorporating all the aerodynamic and heat transfer components. △ Less

Submitted 29 December, 2023; v1 submitted 12 August, 2023; originally announced August 2023.

Comments: SC23, November, 2023, Denver, CO., USA

arXiv:2308.02484 [pdf, ps, other]

Discrete-Time Adaptive State Tracking Control Schemes Using Gradient Algorithms

Authors: Gang Tao

Abstract: This paper conducts a comprehensive study of a classical adaptive control problem: adaptive control of a state-space plant model: $\dot{x}(t) = A x(t) + B u(t)$ in continuous time, or $x(t+1) = A x(t) + B u(t)$ in discrete time, for state tracking of a chosen stable reference model system: $\dot{x}_m(t) = A_m x_m(t) + B_m r(t)$ in continuous time, or $x_m(t+1) = A_m x_m(t) + B_m r(t)$ in discrete… ▽ More This paper conducts a comprehensive study of a classical adaptive control problem: adaptive control of a state-space plant model: $\dot{x}(t) = A x(t) + B u(t)$ in continuous time, or $x(t+1) = A x(t) + B u(t)$ in discrete time, for state tracking of a chosen stable reference model system: $\dot{x}_m(t) = A_m x_m(t) + B_m r(t)$ in continuous time, or $x_m(t+1) = A_m x_m(t) + B_m r(t)$ in discrete time. Adaptive state tracking control schemes for continuous-time systems have been reported in the literature, using a Lyapunov design and analysis method which has not been successfully applied to discrete-time systems, so that the discrete-time adaptive state tracking problem has remained to be open. In this paper, new adaptive state tracking control schemes are developed for discrete-time systems, using a gradient method for the design of adaptive laws for updating the controller parameters. Both direct and indirect adaptive designs are presented, which have the standard and desired adaptive law properties. Such a new gradient algorithm based framework is also developed for adaptive state tracking control of continuous-time systems, as compared with the Lyapunov method based framework. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2308.02122 [pdf, other]

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Authors: Lu Yan, Zhuo Zhang, Guanhong Tao, Kaiyuan Zhang, Xuan Chen, Guangyu Shen, Xiangyu Zhang

Abstract: Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific triggers in the input can lead poisoned models to misclassify these inputs to predetermined target classes. Current detection mechanisms are limited by their inability to address more covert backdoor strategies, such as style-based attacks. In this work, we propose an inn… ▽ More Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific triggers in the input can lead poisoned models to misclassify these inputs to predetermined target classes. Current detection mechanisms are limited by their inability to address more covert backdoor strategies, such as style-based attacks. In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the interpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to stay stealthy. Based on this observation, we hypothesize that while the model's predictions for paraphrased clean samples should remain stable, predictions for poisoned samples should revert to their true labels upon the mutations applied to triggers during the paraphrasing process. We employ ChatGPT, a state-of-the-art large language model, as our paraphraser and formulate the trigger-removal task as a prompt engineering problem. We adopt fuzzing, a technique commonly used for unearthing software vulnerabilities, to discover optimal paraphrase prompts that can effectively eliminate triggers while concurrently maintaining input semantics. Experiments on 4 types of backdoor attacks, including the subtle style backdoors, and 4 distinct datasets demonstrate that our approach surpasses baseline methods, including STRIP, RAP, and ONION, in precision and recall. △ Less

Submitted 27 October, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

arXiv:2305.17506 [pdf, other]

Backdooring Neural Code Search

Authors: Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, Bin Luo

Abstract: Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to t… ▽ More Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11%. Our attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60%. Our user study demonstrates that our attack is more stealthy than the baseline by two times based on the F1 score. △ Less

Submitted 12 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: Accepted to the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)

MSC Class: 68T01 ACM Class: I.2.2; D.2.13

arXiv:2305.03778 [pdf, ps, other]

Koopman System Approximation Based Optimal Control of Multiple Robots -- Part II: Simulations and Evaluations

Authors: Qianhong Zhao, Gang Tao

Abstract: This report presents the results of a simulation study of the linear model and bilinear model approximations of the Koopman system model of the nonlinear utility functions in optimal control of a 3-robot system. In such a control problem, the nonlinear utility functions are maximized to achieve the control objective of moving the robots to their target positions and avoiding collisions. With the l… ▽ More This report presents the results of a simulation study of the linear model and bilinear model approximations of the Koopman system model of the nonlinear utility functions in optimal control of a 3-robot system. In such a control problem, the nonlinear utility functions are maximized to achieve the control objective of moving the robots to their target positions and avoiding collisions. With the linear and bilinear model approximations of the utility functions, the optimal control problem is solved, based on the approximate model state variables rather than the original nonlinear utility functions. This transforms the original nonlinear game theory problem to a linear optimization problem. This report studies both the centralized and decentralized implementations of the approximation model based control signals for the 3-robot system control problem. The simulation results show that the maximum value of the posteriori estimation error of the bilinear approximation model is several thousand times less than the linear approxiamtion model. This indicates that the bilinear model has more capacity to approximate the nonlinear utility functions. Both the centralized and decentralized bilinear approximation model based control signals can achieve the control objective of moving the robots to their target positions. Based on the analysis of the simulation time, the bilinear model based optimal control solution is fast enough for real-time control implementation. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.03777 [pdf, ps, other]

Koopman System Approximation Based Optimal Control of Multiple Robots -- Part I: Concepts and Formulations

Authors: Gang Tao, Qianhong Zhao

Abstract: This paper presents a study of the Koopman operator theory and its application to optimal control of a multi-robot system. The Koopman operator, while operating on a set of observation functions of the state vector of a nonlinear system, produces a set of dynamic equations which, through a dynamic transformation, form a new dynamic system. As an operator, it has a rich spectrum of mathematical pro… ▽ More This paper presents a study of the Koopman operator theory and its application to optimal control of a multi-robot system. The Koopman operator, while operating on a set of observation functions of the state vector of a nonlinear system, produces a set of dynamic equations which, through a dynamic transformation, form a new dynamic system. As an operator, it has a rich spectrum of mathematical properties, and as a tool for dynamic system analysis and control design, it has a unique collection of practical meanings. The Koopman system technique is then applied to the development of a linear or bilinear model approximation of nonlinear utility functions for optimal control of a system of multiple (mobile) robots, by selecting the utility functions as the Koopman system state variables and expressing the set of Koopman variables as the state variables of a linear or bilinear system whose parameters are determined through optimization. An iterative (online) algorithm is developed for adaptively estimating the parameters of the approximation model of the robot system with nonlinear utility functions. Finally, the control problems based on a linear or bilinear model for the nonlinear utility functions are formulated for optimal control of the multi-robot system, by transforming the nonlinear programming problem to a linear programming problem. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.14614 [pdf, other]

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection

Authors: Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang

Abstract: Multi-sensor fusion (MSF) is widely used in autonomous vehicles (AVs) for perception, particularly for 3D object detection with camera and LiDAR sensors. The purpose of fusion is to capitalize on the advantages of each modality while minimizing its weaknesses. Advanced deep neural network (DNN)-based fusion techniques have demonstrated the exceptional and industry-leading performance. Due to the r… ▽ More Multi-sensor fusion (MSF) is widely used in autonomous vehicles (AVs) for perception, particularly for 3D object detection with camera and LiDAR sensors. The purpose of fusion is to capitalize on the advantages of each modality while minimizing its weaknesses. Advanced deep neural network (DNN)-based fusion techniques have demonstrated the exceptional and industry-leading performance. Due to the redundant information in multiple modalities, MSF is also recognized as a general defence strategy against adversarial attacks. In this paper, we attack fusion models from the camera modality that is considered to be of lesser importance in fusion but is more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks. Our approach employs a two-stage optimization-based strategy that first thoroughly evaluates vulnerable image areas under adversarial attacks, and then applies dedicated attack strategies for different fusion models to generate deployable patches. The evaluations with six advanced camera-LiDAR fusion models and one camera-only model indicate that our attacks successfully compromise all of them. Our approach can either decrease the mean average precision (mAP) of detection performance from 0.824 to 0.353, or degrade the detection score of a target object from 0.728 to 0.156, demonstrating the efficacy of our proposed attack framework. Code is available. △ Less

Submitted 2 March, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: Accepted at ICLR'2024

arXiv:2303.15180 [pdf, other]

Detecting Backdoors in Pre-trained Encoders

Authors: Shiwei Feng, Guanhong Tao, Siyuan Cheng, Guangyu Shen, Xiangzhe Xu, Yingqi Liu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

Abstract: Self-supervised learning in computer vision trains on unlabeled data, such as images or (image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input data. Emerging backdoor attacks towards encoders expose crucial vulnerabilities of self-supervised learning, since downstream classifiers (even further trained on clean data) may inherit backdoor behaviors from encoder… ▽ More Self-supervised learning in computer vision trains on unlabeled data, such as images or (image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input data. Emerging backdoor attacks towards encoders expose crucial vulnerabilities of self-supervised learning, since downstream classifiers (even further trained on clean data) may inherit backdoor behaviors from encoders. Existing backdoor detection methods mainly focus on supervised learning settings and cannot handle pre-trained encoders especially when input labels are not available. In this paper, we propose DECREE, the first backdoor detection approach for pre-trained encoders, requiring neither classifier headers nor input labels. We evaluate DECREE on over 400 encoders trojaned under 3 paradigms. We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs. Our method consistently has a high detection accuracy even if we have only limited or no access to the pre-training dataset. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR 2023. Code is available at https://github.com/GiantSeaweed/DECREE

arXiv:2301.13487 [pdf, other]

Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks

Authors: Zhiyuan Cheng, James Liang, Guanhong Tao, Dongfang Liu, Xiangyu Zhang

Abstract: Monocular Depth Estimation (MDE) is a critical component in applications such as autonomous driving. There are various attacks against MDE networks. These attacks, especially the physical ones, pose a great threat to the security of such systems. Traditional adversarial training method requires ground-truth labels hence cannot be directly applied to self-supervised MDE that does not have ground-tr… ▽ More Monocular Depth Estimation (MDE) is a critical component in applications such as autonomous driving. There are various attacks against MDE networks. These attacks, especially the physical ones, pose a great threat to the security of such systems. Traditional adversarial training method requires ground-truth labels hence cannot be directly applied to self-supervised MDE that does not have ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) ignore the domain knowledge of MDE and can hardly achieve optimal performance. In this work, we propose a novel adversarial training method for self-supervised MDE models based on view synthesis without using ground-truth depth. We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training. We compare our method with supervised learning based and contrastive learning based methods that are tailored for MDE. Results on two representative MDE networks show that we achieve better robustness against various adversarial attacks with nearly no benign performance degradation. △ Less

Submitted 2 April, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: Initially accepted at ICLR2023 (Spotlight)

arXiv:2301.12318 [pdf, other]

doi 10.14722/ndss.2024.24450

Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering

Authors: Rui Zhu, Di Tang, Siyuan Tang, Guanhong Tao, Shiqing Ma, Xiaofeng Wang, Haixu Tang

Abstract: Most existing methods to detect backdoored machine learning (ML) models take one of the two approaches: trigger inversion (aka. reverse engineer) and weight analysis (aka. model diagnosis). In particular, the gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques, as evidenced by the TrojAI competition, Trojan Detection Challenge and backdoorBen… ▽ More Most existing methods to detect backdoored machine learning (ML) models take one of the two approaches: trigger inversion (aka. reverse engineer) and weight analysis (aka. model diagnosis). In particular, the gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques, as evidenced by the TrojAI competition, Trojan Detection Challenge and backdoorBench. However, little has been done to understand why this technique works so well and, more importantly, whether it raises the bar to the backdoor attack. In this paper, we report the first attempt to answer this question by analyzing the change rate of the backdoored model around its trigger-carrying inputs. Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs, which are easy to capture by gradient-based trigger inversion. In the meantime, we found that the low change rate is not necessary for a backdoor attack to succeed: we design a new attack enhancement called \textit{Gradient Shaping} (GRASP), which follows the opposite direction of adversarial training to reduce the change rate of a backdoored model with regard to the trigger, without undermining its backdoor effect. Also, we provide a theoretic analysis to explain the effectiveness of this new technique and the fundamental weakness of gradient-based trigger inversion. Finally, we perform both theoretical and experimental analysis, showing that the GRASP enhancement does not reduce the effectiveness of the stealthy attacks against the backdoor detection methods based on weight analysis, as well as other backdoor mitigation methods without using detection. △ Less

Submitted 2 March, 2024; v1 submitted 28 January, 2023; originally announced January 2023.

Journal ref: NDSS Symposium 2024

arXiv:2301.06241 [pdf, other]

BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense

Authors: Siyuan Cheng, Guanhong Tao, Yingqi Liu, Shengwei An, Xiangzhe Xu, Shiwei Feng, Guangyu Shen, Kaiyuan Zhang, Qiuling Xu, Shiqing Ma, Xiangyu Zhang

Abstract: Deep Learning backdoor attacks have a threat model similar to traditional cyber attacks. Attack forensics, a critical counter-measure for traditional cyber attacks, is hence of importance for defending model backdoor attacks. In this paper, we propose a novel model backdoor forensics technique. Given a few attack samples such as inputs with backdoor triggers, which may represent different types of… ▽ More Deep Learning backdoor attacks have a threat model similar to traditional cyber attacks. Attack forensics, a critical counter-measure for traditional cyber attacks, is hence of importance for defending model backdoor attacks. In this paper, we propose a novel model backdoor forensics technique. Given a few attack samples such as inputs with backdoor triggers, which may represent different types of backdoors, our technique automatically decomposes them to clean inputs and the corresponding triggers. It then clusters the triggers based on their properties to allow automatic attack categorization and summarization. Backdoor scanners can then be automatically synthesized to find other instances of the same type of backdoor in other models. Our evaluation on 2,532 pre-trained models, 10 popular attacks, and comparison with 9 baselines show that our technique is highly effective. The decomposed clean inputs and triggers closely resemble the ground truth. The synthesized scanners substantially outperform the vanilla versions of existing scanners that can hardly generalize to different kinds of attacks. △ Less

Submitted 15 January, 2023; originally announced January 2023.

arXiv:2212.11473 [pdf, other]

Restoring Vision in Hazy Weather with Hierarchical Contrastive Learning

Authors: Tao Wang, Guangpin Tao, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Xiaoqin Zhang, Tong Lu

Abstract: Image restoration under hazy weather condition, which is called single image dehazing, has been of significant interest for various computer vision applications. In recent years, deep learning-based methods have achieved success. However, existing image dehazing methods typically neglect the hierarchy of features in the neural network and fail to exploit their relationships fully. To this end, we… ▽ More Image restoration under hazy weather condition, which is called single image dehazing, has been of significant interest for various computer vision applications. In recent years, deep learning-based methods have achieved success. However, existing image dehazing methods typically neglect the hierarchy of features in the neural network and fail to exploit their relationships fully. To this end, we propose an effective image dehazing method named Hierarchical Contrastive Dehazing (HCD), which is based on feature fusion and contrastive learning strategies. HCD consists of a hierarchical dehazing network (HDN) and a novel hierarchical contrastive loss (HCL). Specifically, the core design in the HDN is a hierarchical interaction module, which utilizes multi-scale activation to revise the feature responses hierarchically. To cooperate with the training of HDN, we propose HCL which performs contrastive learning on hierarchically paired exemplars, facilitating haze removal. Extensive experiments on public datasets, RESIDE, HazeRD, and DENSE-HAZE, demonstrate that HCD quantitatively outperforms the state-of-the-art methods in terms of PSNR, SSIM and achieves better visual quality. △ Less

Submitted 23 September, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: 30 pages, 10 figures

Journal ref: Pattern Recognition, 2023

arXiv:2211.15929 [pdf, other]

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

Authors: Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang

Abstract: We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilitie… ▽ More We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities. We find that natural backdoors are widely existing, with most injected backdoor attacks having natural correspondences. We categorize these natural backdoors and propose a general detection framework. It finds 315 natural backdoors in the 56 normally trained models downloaded from the Internet, covering all the different categories, while existing scanners designed for injected backdoors can at most detect 65 backdoors. We also study the root causes and defense of natural backdoors. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2210.12873 [pdf, other]

FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning

Authors: Kaiyuan Zhang, Guanhong Tao, Qiuling Xu, Siyuan Cheng, Shengwei An, Yingqi Liu, Shiwei Feng, Guangyu Shen, Pin-Yu Chen, Shiqing Ma, Xiangyu Zhang

Abstract: Federated Learning (FL) is a distributed learning paradigm that enables different parties to train a model together for high quality and strong privacy protection. In this scenario, individual participants may get compromised and perform backdoor attacks by poisoning the data (or gradients). Existing work on robust aggregation and certified FL robustness does not study how hardening benign clients… ▽ More Federated Learning (FL) is a distributed learning paradigm that enables different parties to train a model together for high quality and strong privacy protection. In this scenario, individual participants may get compromised and perform backdoor attacks by poisoning the data (or gradients). Existing work on robust aggregation and certified FL robustness does not study how hardening benign clients can affect the global model (and the malicious clients). In this work, we theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting. Moreover, we propose a trigger reverse engineering based defense and show that our method can achieve robustness improvement with guarantee (i.e., reducing the attack success rate) without affecting benign accuracy. We conduct comprehensive experiments across different datasets and attack settings. Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks. Code is available at https://github.com/KaiyuanZh/FLIP. △ Less

Submitted 27 February, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: Accepted by ICLR 2023. Code is available at https://github.com/KaiyuanZh/FLIP

arXiv:2207.04718 [pdf, other]

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

Authors: Zhiyuan Cheng, James Liang, Hongjun Choi, Guanhong Tao, Zhiwen Cao, Dongfang Liu, Xiangyu Zhang

Abstract: Deep learning has substantially boosted the performance of Monocular Depth Estimation (MDE), a critical component in fully vision-based autonomous driving (AD) systems (e.g., Tesla and Toyota). In this work, we develop an attack against learning-based MDE. In particular, we use an optimization-based method to systematically generate stealthy physical-object-oriented adversarial patches to attack d… ▽ More Deep learning has substantially boosted the performance of Monocular Depth Estimation (MDE), a critical component in fully vision-based autonomous driving (AD) systems (e.g., Tesla and Toyota). In this work, we develop an attack against learning-based MDE. In particular, we use an optimization-based method to systematically generate stealthy physical-object-oriented adversarial patches to attack depth estimation. We balance the stealth and effectiveness of our attack with object-oriented adversarial design, sensitive region localization, and natural style camouflage. Using real-world driving scenarios, we evaluate our attack on concurrent MDE models and a representative downstream task for AD (i.e., 3D object detection). Experimental results show that our method can generate stealthy, effective, and robust adversarial patches for different target objects and models and achieves more than 6 meters mean depth estimation error and 93% attack success rate (ASR) in object detection with a patch of 1/9 of the vehicle's rear area. Field tests on three different driving routes with a real vehicle indicate that we cause over 6 meters mean depth estimation error and reduce the object detection rate from 90.70% to 5.16% in continuous video frames. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: ECCV2022

arXiv:2206.09272 [pdf, other]

DECK: Model Hardening for Defending Pervasive Backdoors

Authors: Guanhong Tao, Yingqi Liu, Siyuan Cheng, Shengwei An, Zhuo Zhang, Qiuling Xu, Guangyu Shen, Xiangyu Zhang

Abstract: Pervasive backdoors are triggered by dynamic and pervasive input perturbations. They can be intentionally injected by attackers or naturally exist in normally trained models. They have a different nature from the traditional static and localized backdoors that can be triggered by perturbing a small input area with some fixed pattern, e.g., a patch with solid color. Existing defense techniques are… ▽ More Pervasive backdoors are triggered by dynamic and pervasive input perturbations. They can be intentionally injected by attackers or naturally exist in normally trained models. They have a different nature from the traditional static and localized backdoors that can be triggered by perturbing a small input area with some fixed pattern, e.g., a patch with solid color. Existing defense techniques are highly effective for traditional backdoors. However, they may not work well for pervasive backdoors, especially regarding backdoor removal and model hardening. In this paper, we propose a novel model hardening technique against pervasive backdoors, including both natural and injected backdoors. We develop a general pervasive attack based on an encoder-decoder architecture enhanced with a special transformation layer. The attack can model a wide range of existing pervasive backdoor attacks and quantify them by class distances. As such, using the samples derived from our attack in adversarial training can harden a model against these backdoor vulnerabilities. Our evaluation on 9 datasets with 15 model structures shows that our technique can enlarge class distances by 59.65% on average with less than 1% accuracy degradation and no robustness loss, outperforming five hardening techniques such as adversarial training, universal adversarial training, MOTH, etc. It can reduce the attack success rate of six pervasive backdoor attacks from 99.06% to 1.94%, surpassing seven state-of-the-art backdoor removal techniques. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Showing 1–50 of 69 results for author: Tao, G