-
Representation Bending for Large Language Model Safety
Authors:
Ashkan Yousefpour,
Taeheon Kim,
Ryan S. Kwon,
Seungbeen Lee,
Wonje Jeung,
Seungju Han,
Alvin Wan,
Harrison Ngan,
Youngjae Yu,
Jonghyun Choi
Abstract:
Large Language Models (LLMs) have emerged as powerful tools, but their inherent safety risks - ranging from harmful content generation to broader societal harms - pose significant challenges. These risks can be amplified by the recent adversarial attacks, fine-tuning vulnerabilities, and the increasing deployment of LLMs in high-stakes environments. Existing safety-enhancing techniques, such as fi…
▽ More
Large Language Models (LLMs) have emerged as powerful tools, but their inherent safety risks - ranging from harmful content generation to broader societal harms - pose significant challenges. These risks can be amplified by the recent adversarial attacks, fine-tuning vulnerabilities, and the increasing deployment of LLMs in high-stakes environments. Existing safety-enhancing techniques, such as fine-tuning with human feedback or adversarial training, are still vulnerable as they address specific threats and often fail to generalize across unseen attacks, or require manual system-level defenses. This paper introduces RepBend, a novel approach that fundamentally disrupts the representations underlying harmful behaviors in LLMs, offering a scalable solution to enhance (potentially inherent) safety. RepBend brings the idea of activation steering - simple vector arithmetic for steering model's behavior during inference - to loss-based fine-tuning. Through extensive evaluation, RepBend achieves state-of-the-art performance, outperforming prior methods such as Circuit Breaker, RMU, and NPO, with up to 95% reduction in attack success rates across diverse jailbreak benchmarks, all with negligible reduction in model usability and general capabilities.
△ Less
Submitted 9 June, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.
-
sudo rm -rf agentic_security
Authors:
Sejin Lee,
Jian Kim,
Haon Park,
Ashkan Yousefpour,
Sangyoon Yu,
Min Song
Abstract:
Large Language Models (LLMs) are increasingly deployed as computer-use agents, autonomously performing tasks within real desktop or web environments. While this evolution greatly expands practical use cases for humans, it also creates serious security exposures. We present SUDO (Screen-based Universal Detox2Tox Offense), a novel attack framework that systematically bypasses refusal-trained safegua…
▽ More
Large Language Models (LLMs) are increasingly deployed as computer-use agents, autonomously performing tasks within real desktop or web environments. While this evolution greatly expands practical use cases for humans, it also creates serious security exposures. We present SUDO (Screen-based Universal Detox2Tox Offense), a novel attack framework that systematically bypasses refusal-trained safeguards in commercial computer-use agents, such as Claude for Computer Use. The core mechanism, Detox2Tox, transforms harmful requests (that agents initially reject) into seemingly benign requests via detoxification, secures detailed instructions from advanced vision language models (VLMs), and then reintroduces malicious content via toxification just before execution. Unlike conventional jailbreaks, SUDO iteratively refines its attacks based on a built-in refusal feedback, making it increasingly effective against robust policy filters. In extensive tests spanning 50 real-world tasks and multiple state-of-the-art VLMs, SUDO achieves a stark attack success rate of 24.41% (with no refinement), and up to 41.33% (by its iterative refinement) in Claude for Computer Use. By revealing these vulnerabilities and demonstrating the ease with which they can be exploited in real-world computing environments, this paper highlights an immediate need for robust, context-aware safeguards. WARNING: This paper includes harmful or offensive model outputs
△ Less
Submitted 8 June, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Localized Physics-informed Gaussian Processes with Curriculum Training for Topology Optimization
Authors:
Amin Yousefpour,
Shirin Hosseinmardi,
Xiangyu Sun,
Ramin Bostanabad
Abstract:
We introduce a simultaneous and meshfree topology optimization (TO) framework based on physics-informed Gaussian processes (GPs). Our framework endows all design and state variables via GP priors which have a shared, multi-output mean function that is parametrized via a customized deep neural network (DNN). The parameters of this mean function are estimated by minimizing a multi-component loss fun…
▽ More
We introduce a simultaneous and meshfree topology optimization (TO) framework based on physics-informed Gaussian processes (GPs). Our framework endows all design and state variables via GP priors which have a shared, multi-output mean function that is parametrized via a customized deep neural network (DNN). The parameters of this mean function are estimated by minimizing a multi-component loss function that depends on the performance metric, design constraints, and the residuals on the state equations. Our TO approach yields well-defined material interfaces and has a built-in continuation nature that promotes global optimality. Other unique features of our approach include (1) its customized DNN which, unlike fully connected feed-forward DNNs, has a localized learning capacity that enables capturing intricate topologies and reducing residuals in high gradient fields, (2) its loss function that leverages localized weights to promote solution accuracy around interfaces, and (3) its use of curriculum training to avoid local optimality.To demonstrate the power of our framework, we validate it against commercial TO package COMSOL on three problems involving dissipated power minimization in Stokes flow.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs
Authors:
Junwoo Ha,
Hyunjun Kim,
Sangyoon Yu,
Haon Park,
Ashkan Yousefpour,
Yuna Park,
Suhyun Kim
Abstract:
We introduce a novel framework for consolidating multi-turn adversarial ``jailbreak'' prompts into single-turn queries, significantly reducing the manual overhead required for adversarial testing of large language models (LLMs). While multi-turn human jailbreaks have been shown to yield high attack success rates, they demand considerable human effort and time. Our multi-turn-to-single-turn (M2S) m…
▽ More
We introduce a novel framework for consolidating multi-turn adversarial ``jailbreak'' prompts into single-turn queries, significantly reducing the manual overhead required for adversarial testing of large language models (LLMs). While multi-turn human jailbreaks have been shown to yield high attack success rates, they demand considerable human effort and time. Our multi-turn-to-single-turn (M2S) methods -- Hyphenize, Numberize, and Pythonize -- systematically reformat multi-turn dialogues into structured single-turn prompts. Despite removing iterative back-and-forth interactions, these prompts preserve and often enhance adversarial potency: in extensive evaluations on the Multi-turn Human Jailbreak (MHJ) dataset, M2S methods achieve attack success rates from 70.6 percent to 95.9 percent across several state-of-the-art LLMs. Remarkably, the single-turn prompts outperform the original multi-turn attacks by as much as 17.5 percentage points while cutting token usage by more than half on average. Further analysis shows that embedding malicious requests in enumerated or code-like structures exploits ``contextual blindness'', bypassing both native guardrails and external input-output filters. By converting multi-turn conversations into concise single-turn prompts, the M2S framework provides a scalable tool for large-scale red teaming and reveals critical weaknesses in contemporary LLM defenses.
△ Less
Submitted 25 May, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
ELITE: Enhanced Language-Image Toxicity Evaluation for Safety
Authors:
Wonjun Lee,
Doehyeon Lee,
Eugene Choi,
Sangyoon Yu,
Ashkan Yousefpour,
Haon Park,
Bumsub Ham,
Suhyun Kim
Abstract:
Current Vision Language Models (VLMs) remain vulnerable to malicious prompts that induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated evaluation methods, but these methods struggle to detect implicit harmful content or produce inaccurate evaluations. Therefore, we found that existing benchmarks have low levels of harmfulness, ambiguous data, and limited diversit…
▽ More
Current Vision Language Models (VLMs) remain vulnerable to malicious prompts that induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated evaluation methods, but these methods struggle to detect implicit harmful content or produce inaccurate evaluations. Therefore, we found that existing benchmarks have low levels of harmfulness, ambiguous data, and limited diversity in image-text pair combinations. To address these issues, we propose the ELITE benchmark, a high-quality safety evaluation benchmark for VLMs, underpinned by our enhanced evaluation method, the ELITE evaluator. The ELITE evaluator explicitly incorporates a toxicity score to accurately assess harmfulness in multimodal contexts, where VLMs often provide specific, convincing, but unharmful descriptions of images. We filter out ambiguous and low-quality image-text pairs from existing benchmarks using the ELITE evaluator and generate diverse combinations of safe and unsafe image-text pairs. Our experiments demonstrate that the ELITE evaluator achieves superior alignment with human evaluations compared to prior automated methods, and the ELITE benchmark offers enhanced benchmark quality and diversity. By introducing ELITE, we pave the way for safer, more robust VLMs, contributing essential tools for evaluating and mitigating safety risks in real-world applications.
△ Less
Submitted 9 February, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
Large Language Models Still Exhibit Bias in Long Text
Authors:
Wonje Jeung,
Dongjae Jeon,
Ashkan Yousefpour,
Jonghyun Choi
Abstract:
Existing fairness benchmarks for large language models (LLMs) primarily focus on simple tasks, such as multiple-choice questions, overlooking biases that may arise in more complex scenarios like long-text generation. To address this gap, we introduce the Long Text Fairness Test (LTF-TEST), a framework that evaluates biases in LLMs through essay-style prompts. LTF-TEST covers 14 topics and 10 demog…
▽ More
Existing fairness benchmarks for large language models (LLMs) primarily focus on simple tasks, such as multiple-choice questions, overlooking biases that may arise in more complex scenarios like long-text generation. To address this gap, we introduce the Long Text Fairness Test (LTF-TEST), a framework that evaluates biases in LLMs through essay-style prompts. LTF-TEST covers 14 topics and 10 demographic axes, including gender and race, resulting in 11,948 samples. By assessing both model responses and the reasoning behind them, LTF-TEST uncovers subtle biases that are difficult to detect in simple responses. In our evaluation of five recent LLMs, including GPT-4o and LLaMa3, we identify two key patterns of bias. First, these models frequently favor certain demographic groups in their responses. Second, they show excessive sensitivity toward traditionally disadvantaged groups, often providing overly protective responses while neglecting others. To mitigate these biases, we propose FT-REGARD, a finetuning approach that pairs biased prompts with neutral responses. FT-REGARD reduces gender bias by 34.6% and improves performance by 1.4 percentage points on the BBQ benchmark, offering a promising approach to addressing biases in long-text generation tasks.
△ Less
Submitted 25 October, 2024; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Operator Learning with Gaussian Processes
Authors:
Carlos Mora,
Amin Yousefpour,
Shirin Hosseinmardi,
Houman Owhadi,
Ramin Bostanabad
Abstract:
Operator learning focuses on approximating mappings $\mathcal{G}^\dagger:\mathcal{U} \rightarrow\mathcal{V}$ between infinite-dimensional spaces of functions, such as $u: Ω_u\rightarrow\mathbb{R}$ and $v: Ω_v\rightarrow\mathbb{R}$. This makes it particularly suitable for solving parametric nonlinear partial differential equations (PDEs). While most machine learning methods for operator learning re…
▽ More
Operator learning focuses on approximating mappings $\mathcal{G}^\dagger:\mathcal{U} \rightarrow\mathcal{V}$ between infinite-dimensional spaces of functions, such as $u: Ω_u\rightarrow\mathbb{R}$ and $v: Ω_v\rightarrow\mathbb{R}$. This makes it particularly suitable for solving parametric nonlinear partial differential equations (PDEs). While most machine learning methods for operator learning rely on variants of deep neural networks (NNs), recent studies have shown that Gaussian Processes (GPs) are also competitive while offering interpretability and theoretical guarantees. In this paper, we introduce a hybrid GP/NN-based framework for operator learning that leverages the strengths of both methods. Instead of approximating the function-valued operator $\mathcal{G}^\dagger$, we use a GP to approximate its associated real-valued bilinear form $\widetilde{\mathcal{G}}^\dagger: \mathcal{U}\times\mathcal{V}^*\rightarrow\mathbb{R}.$ This bilinear form is defined by $\widetilde{\mathcal{G}}^\dagger(u,\varphi) := [\varphi,\mathcal{G}^\dagger(u)],$ which allows us to recover the operator $\mathcal{G}^\dagger$ through $\mathcal{G}^\dagger(u)(y)=\widetilde{\mathcal{G}}^\dagger(u,δ_y).$ The GP mean function can be zero or parameterized by a neural operator and for each setting we develop a robust training mechanism based on maximum likelihood estimation (MLE) that can optionally leverage the physics involved. Numerical benchmarks show that (1) it improves the performance of a base neural operator by using it as the mean function of a GP, and (2) it enables zero-shot data-driven models for accurate predictions without prior training. Our framework also handles multi-output operators where $\mathcal{G}^\dagger:\mathcal{U} \rightarrow\prod_{s=1}^S\mathcal{V}^s$, and benefits from computational speed-ups via product kernel structures and Kronecker product matrix representations.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Simultaneous and Meshfree Topology Optimization with Physics-informed Gaussian Processes
Authors:
Amin Yousefpour,
Shirin Hosseinmardi,
Carlos Mora,
Ramin Bostanabad
Abstract:
Topology optimization (TO) provides a principled mathematical approach for optimizing the performance of a structure by designing its material spatial distribution in a pre-defined domain and subject to a set of constraints. The majority of existing TO approaches leverage numerical solvers for design evaluations during the optimization and hence have a nested nature and rely on discretizing the de…
▽ More
Topology optimization (TO) provides a principled mathematical approach for optimizing the performance of a structure by designing its material spatial distribution in a pre-defined domain and subject to a set of constraints. The majority of existing TO approaches leverage numerical solvers for design evaluations during the optimization and hence have a nested nature and rely on discretizing the design variables. Contrary to these approaches, herein we develop a new class of TO methods based on the framework of Gaussian processes (GPs) whose mean functions are parameterized via deep neural networks. Specifically, we place GP priors on all design and state variables to represent them via parameterized continuous functions. These GPs share a deep neural network as their mean function but have as many independent kernels as there are state and design variables. We estimate all the parameters of our model in a single for loop that optimizes a penalized version of the performance metric where the penalty terms correspond to the state equations and design constraints. Attractive features of our approach include $(1)$ having a built-in continuation nature since the performance metric is optimized at the same time that the state equations are solved, and $(2)$ being discretization-invariant and accommodating complex domains and topologies. To test our method against conventional TO approaches implemented in commercial software, we evaluate it on four problems involving the minimization of dissipated power in Stokes flow. The results indicate that our approach does not need filtering techniques, has consistent computational costs, and is highly robust against random initializations and problem setup.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Authors:
Jiwan Chung,
Sungjae Lee,
Minseo Kim,
Seungju Han,
Ashkan Yousefpour,
Jack Hessel,
Youngjae Yu
Abstract:
Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by…
▽ More
Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by human audiences, we ask: are today's AI capable of similar understanding? We present VisArgs, a dataset of 1,611 images annotated with 5,112 visual premises (with regions), 5,574 commonsense premises, and reasoning trees connecting them into structured arguments. We propose three tasks for evaluating visual argument understanding: premise localization, premise identification, and conclusion deduction. Experiments show that 1) machines struggle to capture visual cues: GPT-4-O achieved 78.5% accuracy, while humans reached 98.0%. Models also performed 19.5% worse when distinguishing between irrelevant objects within the image compared to external objects. 2) Providing relevant visual premises improved model performance significantly.
△ Less
Submitted 22 October, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Aligning Large Language Models by On-Policy Self-Judgment
Authors:
Sangkyu Lee,
Sungdong Kim,
Ashkan Yousefpour,
Minjoon Seo,
Kang Min Yoo,
Youngjae Yu
Abstract:
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. In this paper, we present a novel alignment framework, SELF-JUDGE that (1) does on-policy learning and 2) is parameter efficient, as it does not require an additional RM for evaluating the samples for on-policy learning. To this end, we p…
▽ More
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. In this paper, we present a novel alignment framework, SELF-JUDGE that (1) does on-policy learning and 2) is parameter efficient, as it does not require an additional RM for evaluating the samples for on-policy learning. To this end, we propose Judge-augmented Supervised Fine-Tuning (JSFT) to train a single model to act as both a policy and a judge. Specifically, we view the pairwise judgment task, choosing the better response from a response pair, as a special case of the instruction-following task. The resulting model can judge preferences of on-the-fly responses from current policy initialized from itself. Experimental results show the efficacy of SELF-JUDGE, outperforming baselines in preference benchmarks. We also show that the rejecting sampling by itself can improve performance further without an additional evaluator.
△ Less
Submitted 25 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
A Gaussian Process Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations
Authors:
Carlos Mora,
Amin Yousefpour,
Shirin Hosseinmardi,
Ramin Bostanabad
Abstract:
Physics-informed machine learning (PIML) has emerged as a promising alternative to conventional numerical methods for solving partial differential equations (PDEs). PIML models are increasingly built via deep neural networks (NNs) whose architecture and training process are designed such that the network satisfies the PDE system. While such PIML models have substantially advanced over the past few…
▽ More
Physics-informed machine learning (PIML) has emerged as a promising alternative to conventional numerical methods for solving partial differential equations (PDEs). PIML models are increasingly built via deep neural networks (NNs) whose architecture and training process are designed such that the network satisfies the PDE system. While such PIML models have substantially advanced over the past few years, their performance is still very sensitive to the NN's architecture and loss function. Motivated by this limitation, we introduce kernel-weighted Corrective Residuals (CoRes) to integrate the strengths of kernel methods and deep NNs for solving nonlinear PDE systems. To achieve this integration, we design a modular and robust framework which consistently outperforms competing methods in solving a broad range of benchmark problems. This performance improvement has a theoretical justification and is particularly attractive since we simplify the training process while negligibly increasing the inference costs. Additionally, our studies on solving multiple PDEs indicate that kernel-weighted CoRes considerably decrease the sensitivity of NNs to factors such as random initialization, architecture type, and choice of optimizer. We believe our findings have the potential to spark a renewed interest in leveraging kernel methods for solving PDEs.
△ Less
Submitted 26 September, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
GP+: A Python Library for Kernel-based learning via Gaussian Processes
Authors:
Amin Yousefpour,
Zahra Zanjani Foumani,
Mehdi Shishehbor,
Carlos Mora,
Ramin Bostanabad
Abstract:
In this paper we introduce GP+, an open-source library for kernel-based learning via Gaussian processes (GPs) which are powerful statistical models that are completely characterized by their parametric covariance and mean functions. GP+ is built on PyTorch and provides a user-friendly and object-oriented tool for probabilistic learning and inference. As we demonstrate with a host of examples, GP+…
▽ More
In this paper we introduce GP+, an open-source library for kernel-based learning via Gaussian processes (GPs) which are powerful statistical models that are completely characterized by their parametric covariance and mean functions. GP+ is built on PyTorch and provides a user-friendly and object-oriented tool for probabilistic learning and inference. As we demonstrate with a host of examples, GP+ has a few unique advantages over other GP modeling libraries. We achieve these advantages primarily by integrating nonlinear manifold learning techniques with GPs' covariance and mean functions. As part of introducing GP+, in this paper we also make methodological contributions that (1) enable probabilistic data fusion and inverse parameter estimation, and (2) equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables. We demonstrate the impact of these contributions in the context of Bayesian optimization, multi-fidelity modeling, sensitivity analysis, and calibration of computer models.
△ Less
Submitted 4 June, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
On the Effects of Heterogeneous Errors on Multi-fidelity Bayesian Optimization
Authors:
Zahra Zanjani Foumani,
Amin Yousefpour,
Mehdi Shishehbor,
Ramin Bostanabad
Abstract:
Bayesian optimization (BO) is a sequential optimization strategy that is increasingly employed in a wide range of areas including materials design. In real world applications, acquiring high-fidelity (HF) data through physical experiments or HF simulations is the major cost component of BO. To alleviate this bottleneck, multi-fidelity (MF) methods are used to forgo the sole reliance on the expensi…
▽ More
Bayesian optimization (BO) is a sequential optimization strategy that is increasingly employed in a wide range of areas including materials design. In real world applications, acquiring high-fidelity (HF) data through physical experiments or HF simulations is the major cost component of BO. To alleviate this bottleneck, multi-fidelity (MF) methods are used to forgo the sole reliance on the expensive HF data and reduce the sampling costs by querying inexpensive low-fidelity (LF) sources whose data are correlated with HF samples. However, existing multi-fidelity BO (MFBO) methods operate under the following two assumptions that rarely hold in practical applications: (1) LF sources provide data that are well correlated with the HF data on a global scale, and (2) a single random process can model the noise in the fused data. These assumptions dramatically reduce the performance of MFBO when LF sources are only locally correlated with the HF source or when the noise variance varies across the data sources. In this paper, we dispense with these incorrect assumptions by proposing an MF emulation method that (1) learns a noise model for each data source, and (2) enables MFBO to leverage highly biased LF sources which are only locally correlated with the HF source. We illustrate the performance of our method through analytical examples and engineering problems on materials design.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Unsupervised Anomaly Detection via Nonlinear Manifold Learning
Authors:
Amin Yousefpour,
Mehdi Shishehbor,
Zahra Zanjani Foumani,
Ramin Bostanabad
Abstract:
Anomalies are samples that significantly deviate from the rest of the data and their detection plays a major role in building machine learning models that can be reliably used in applications such as data-driven design and novelty detection. The majority of existing anomaly detection methods either are exclusively developed for (semi) supervised settings, or provide poor performance in unsupervise…
▽ More
Anomalies are samples that significantly deviate from the rest of the data and their detection plays a major role in building machine learning models that can be reliably used in applications such as data-driven design and novelty detection. The majority of existing anomaly detection methods either are exclusively developed for (semi) supervised settings, or provide poor performance in unsupervised applications where there is no training data with labeled anomalous samples. To bridge this research gap, we introduce a robust, efficient, and interpretable methodology based on nonlinear manifold learning to detect anomalies in unsupervised settings. The essence of our approach is to learn a low-dimensional and interpretable latent representation (aka manifold) for all the data points such that normal samples are automatically clustered together and hence can be easily and robustly identified. We learn this low-dimensional manifold by designing a learning algorithm that leverages either a latent map Gaussian process (LMGP) or a deep autoencoder (AE). Our LMGP-based approach, in particular, provides a probabilistic perspective on the learning task and is ideal for high-dimensional applications with scarce data. We demonstrate the superior performance of our approach over existing technologies via multiple analytic examples and real-world datasets.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Green Federated Learning
Authors:
Ashkan Yousefpour,
Shen Guo,
Ashish Shenoy,
Sayan Ghosh,
Pierre Stock,
Kiwan Maeng,
Schalk-Willem Krüger,
Michael Rabbat,
Carole-Jean Wu,
Ilya Mironov
Abstract:
The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequence, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every 10 months between 2015 and 2022), resulting in a large carbon footprint. Federated Learning (FL) - a collaborative machine learning technique for trai…
▽ More
The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequence, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every 10 months between 2015 and 2022), resulting in a large carbon footprint. Federated Learning (FL) - a collaborative machine learning technique for training a centralized model using data of decentralized entities - can also be resource-intensive and have a significant carbon footprint, particularly when deployed at scale. Unlike centralized AI that can reliably tap into renewables at strategically placed data centers, cross-device FL may leverage as many as hundreds of millions of globally distributed end-user devices with diverse energy sources. Green AI is a novel and important research area where carbon footprint is regarded as an evaluation criterion for AI, alongside accuracy, convergence speed, and other metrics. In this paper, we propose the concept of Green FL, which involves optimizing FL parameters and making design choices to minimize carbon emissions consistent with competitive performance and training time. The contributions of this work are two-fold. First, we adopt a data-driven approach to quantify the carbon emissions of FL by directly measuring real-world at-scale FL tasks running on millions of phones. Second, we present challenges, guidelines, and lessons learned from studying the trade-off between energy efficiency, performance, and time-to-train in a production FL system. Our findings offer valuable insights into how FL can reduce its carbon footprint, and they provide a foundation for future research in the area of Green AI.
△ Less
Submitted 1 August, 2023; v1 submitted 25 March, 2023;
originally announced March 2023.
-
Multi-Fidelity Cost-Aware Bayesian Optimization
Authors:
Zahra Zanjani Foumani,
Mehdi Shishehbor,
Amin Yousefpour,
Ramin Bostanabad
Abstract:
Bayesian optimization (BO) is increasingly employed in critical applications such as materials design and drug discovery. An increasingly popular strategy in BO is to forgo the sole reliance on high-fidelity data and instead use an ensemble of information sources which provide inexpensive low-fidelity data. The overall premise of this strategy is to reduce the overall sampling costs by querying in…
▽ More
Bayesian optimization (BO) is increasingly employed in critical applications such as materials design and drug discovery. An increasingly popular strategy in BO is to forgo the sole reliance on high-fidelity data and instead use an ensemble of information sources which provide inexpensive low-fidelity data. The overall premise of this strategy is to reduce the overall sampling costs by querying inexpensive low-fidelity sources whose data are correlated with high-fidelity samples. Here, we propose a multi-fidelity cost-aware BO framework that dramatically outperforms the state-of-the-art technologies in terms of efficiency, consistency, and robustness. We demonstrate the advantages of our framework on analytic and engineering problems and argue that these benefits stem from our two main contributions: (1) we develop a novel acquisition function for multi-fidelity cost-aware BO that safeguards the convergence against the biases of low-fidelity data, and (2) we tailor a newly developed emulator for multi-fidelity BO which enables us to not only simultaneously learn from an ensemble of multi-fidelity datasets, but also identify the severely biased low-fidelity sources that should be excluded from BO.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Reconciling Security and Communication Efficiency in Federated Learning
Authors:
Karthik Prasad,
Sayan Ghosh,
Graham Cormode,
Ilya Mironov,
Ashkan Yousefpour,
Pierre Stock
Abstract:
Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper…
▽ More
Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40$\times$ compression in uplink communication with no meaningful loss in utility compared to uncompressed baselines.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Papaya: Practical, Private, and Scalable Federated Learning
Authors:
Dzmitry Huba,
John Nguyen,
Kshitiz Malik,
Ruiyu Zhu,
Mike Rabbat,
Ashkan Yousefpour,
Carole-Jean Wu,
Hongyuan Zhan,
Pavel Ustinov,
Harish Srinivas,
Kaikai Wang,
Anthony Shoumikhin,
Jesik Min,
Mani Malek
Abstract:
Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of m…
▽ More
Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of model updates from individual clients. Scaling synchronous FL is challenging since increasing the number of clients training in parallel leads to diminishing returns in training speed, analogous to large-batch training. Moreover, stragglers hinder synchronous FL training. In this work, we outline a production asynchronous FL system design. Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients. Empirically, we demonstrate that asynchronous FL converges faster than synchronous FL when training across nearly one hundred million devices. In particular, in high concurrency settings, asynchronous FL is 5x faster and has nearly 8x less communication overhead than synchronous FL.
△ Less
Submitted 25 April, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Opacus: User-Friendly Differential Privacy Library in PyTorch
Authors:
Ashkan Yousefpour,
Igor Shilov,
Alexandre Sablayrolles,
Davide Testuggine,
Karthik Prasad,
Mani Malek,
John Nguyen,
Sayan Ghosh,
Akash Bharadwaj,
Jessica Zhao,
Graham Cormode,
Ilya Mironov
Abstract:
We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of…
▽ More
We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of layers, including multi-head attention, convolution, LSTM, GRU (and generic RNN), and embedding, right out of the box and provides the means for supporting other user-defined layers. Opacus computes batched per-sample gradients, providing higher efficiency compared to the traditional "micro batch" approach. In this paper we present Opacus, detail the principles that drove its implementation and unique features, and benchmark it against other frameworks for training models with differential privacy as well as standard PyTorch.
△ Less
Submitted 22 August, 2022; v1 submitted 25 September, 2021;
originally announced September 2021.
-
Federated Learning with Buffered Asynchronous Aggregation
Authors:
John Nguyen,
Kshitiz Malik,
Hongyuan Zhan,
Ashkan Yousefpour,
Michael Rabbat,
Mani Malek,
Dzmitry Huba
Abstract:
Scalability and privacy are two critical concerns for cross-device federated learning (FL) systems. In this work, we identify that synchronous FL - synchronized aggregation of client updates in FL - cannot scale efficiently beyond a few hundred clients training in parallel. It leads to diminishing returns in model performance and training speed, analogous to large-batch training. On the other hand…
▽ More
Scalability and privacy are two critical concerns for cross-device federated learning (FL) systems. In this work, we identify that synchronous FL - synchronized aggregation of client updates in FL - cannot scale efficiently beyond a few hundred clients training in parallel. It leads to diminishing returns in model performance and training speed, analogous to large-batch training. On the other hand, asynchronous aggregation of client updates in FL (i.e., asynchronous FL) alleviates the scalability issue. However, aggregating individual client updates is incompatible with Secure Aggregation, which could result in an undesirable level of privacy for the system. To address these concerns, we propose a novel buffered asynchronous aggregation method, FedBuff, that is agnostic to the choice of optimizer, and combines the best properties of synchronous and asynchronous FL. We empirically demonstrate that FedBuff is 3.3x more efficient than synchronous FL and up to 2.5x more efficient than asynchronous FL, while being compatible with privacy-preserving technologies such as Secure Aggregation and differential privacy. We provide theoretical convergence guarantees in a smooth non-convex setting. Finally, we show that under differentially private training, FedBuff can outperform FedAvgM at low privacy settings and achieve the same utility for higher privacy settings.
△ Less
Submitted 7 March, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.
-
ResiliNet: Failure-Resilient Inference in Distributed Neural Networks
Authors:
Ashkan Yousefpour,
Brian Q. Nguyen,
Siddartha Devic,
Guanhua Wang,
Aboudy Kreidieh,
Hans Lobel,
Alexandre M. Bayen,
Jason P. Jue
Abstract:
Federated Learning aims to train distributed deep models without sharing the raw data with the centralized server. Similarly, in distributed inference of neural networks, by partitioning the network and distributing it across several physical nodes, activations and gradients are exchanged between physical nodes, rather than raw data. Nevertheless, when a neural network is partitioned and distribut…
▽ More
Federated Learning aims to train distributed deep models without sharing the raw data with the centralized server. Similarly, in distributed inference of neural networks, by partitioning the network and distributing it across several physical nodes, activations and gradients are exchanged between physical nodes, rather than raw data. Nevertheless, when a neural network is partitioned and distributed among physical nodes, failure of physical nodes causes the failure of the neural units that are placed on those nodes, which results in a significant performance drop. Current approaches focus on resiliency of training in distributed neural networks. However, resiliency of inference in distributed neural networks is less explored. We introduce ResiliNet, a scheme for making inference in distributed neural networks resilient to physical node failures. ResiliNet combines two concepts to provide resiliency: skip hyperconnection, a concept for skipping nodes in distributed neural networks similar to skip connection in resnets, and a novel technique called failout, which is introduced in this paper. Failout simulates physical node failure conditions during training using dropout, and is specifically designed to improve the resiliency of distributed neural networks. The results of the experiments and ablation studies using three datasets confirm the ability of ResiliNet to provide inference resiliency for distributed neural networks.
△ Less
Submitted 19 December, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Guardians of the Deep Fog: Failure-Resilient DNN Inference from Edge to Cloud
Authors:
Ashkan Yousefpour,
Siddartha Devic,
Brian Q. Nguyen,
Aboudy Kreidieh,
Alan Liao,
Alexandre M. Bayen,
Jason P. Jue
Abstract:
Partitioning and distributing deep neural networks (DNNs) over physical nodes such as edge, fog, or cloud nodes, could enhance sensor fusion, and reduce bandwidth and inference latency. However, when a DNN is distributed over physical nodes, failure of the physical nodes causes the failure of the DNN units that are placed on these nodes. The performance of the inference task will be unpredictable,…
▽ More
Partitioning and distributing deep neural networks (DNNs) over physical nodes such as edge, fog, or cloud nodes, could enhance sensor fusion, and reduce bandwidth and inference latency. However, when a DNN is distributed over physical nodes, failure of the physical nodes causes the failure of the DNN units that are placed on these nodes. The performance of the inference task will be unpredictable, and most likely, poor, if the distributed DNN is not specifically designed and properly trained for failures. Motivated by this, we introduce deepFogGuard, a DNN architecture augmentation scheme for making the distributed DNN inference task failure-resilient. To articulate deepFogGuard, we introduce the elements and a model for the resiliency of distributed DNN inference. Inspired by the concept of residual connections in DNNs, we introduce skip hyperconnections in distributed DNNs, which are the basis of deepFogGuard's design to provide resiliency. Next, our extensive experiments using two existing datasets for the sensing and vision applications confirm the ability of deepFogGuard to provide resiliency for distributed DNNs in edge-cloud networks.
△ Less
Submitted 21 September, 2019; v1 submitted 3 September, 2019;
originally announced September 2019.
-
All One Needs to Know about Fog Computing and Related Edge Computing Paradigms: A Complete Survey
Authors:
Ashkan Yousefpour,
Caleb Fung,
Tam Nguyen,
Krishna Kadiyala,
Fatemeh Jalali,
Amirreza Niakanlahiji,
Jian Kong,
Jason P. Jue
Abstract:
With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promisin…
▽ More
With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promising solutions for handling the large volume of security-critical and time-sensitive data that is being produced by the IoT. In this paper, we first provide a tutorial on fog computing and its related computing paradigms, including their similarities and differences. Next, we provide a taxonomy of research topics in fog computing, and through a comprehensive survey, we summarize and categorize the efforts on fog computing and its related computing paradigms. Finally, we provide challenges and future directions for research in fog computing.
△ Less
Submitted 13 February, 2019; v1 submitted 15 August, 2018;
originally announced August 2018.
-
On Reducing IoT Service Delay via Fog Offloading
Authors:
Ashkan Yousefpour,
Genya Ishigaki,
Riti Gour,
Jason P. Jue
Abstract:
With the Internet of Things (IoT) becoming a major component of our daily life, understanding how to improve the quality of service (QoS) for IoT applications through fog computing is becoming an important problem. In this paper, we introduce a general framework for IoT-fog-cloud applications, and propose a delay-minimizing collaboration and offloading policy for fog-capable devices that aims to r…
▽ More
With the Internet of Things (IoT) becoming a major component of our daily life, understanding how to improve the quality of service (QoS) for IoT applications through fog computing is becoming an important problem. In this paper, we introduce a general framework for IoT-fog-cloud applications, and propose a delay-minimizing collaboration and offloading policy for fog-capable devices that aims to reduce the service delay for IoT applications. We then develop an analytical model to evaluate our policy and show how the proposed framework helps to reduce IoT service delay.
△ Less
Submitted 19 April, 2018;
originally announced April 2018.
-
A Privacy Scheme for Monitoring Devices in the Internet of Things
Authors:
Zygmunt J. Haas,
Ashkan Yousefpour
Abstract:
Sufficiently strong security and privacy mechanisms are prerequisite to amass the promising benefits of the IoT technology and to incorporate this technology into our daily lives. This paper introduces a novel approach to privacy in networks, an approach which is especially well matched with the IoT characteristics. Our general approach is based on continually changing the identifying attributes o…
▽ More
Sufficiently strong security and privacy mechanisms are prerequisite to amass the promising benefits of the IoT technology and to incorporate this technology into our daily lives. This paper introduces a novel approach to privacy in networks, an approach which is especially well matched with the IoT characteristics. Our general approach is based on continually changing the identifying attributes of IoT nodes. In particular, the scheme proposed in this work is based on changing the IoT nodes' IP addresses, and because the changing patterns of the IP addresses appear random to a non-intended observer, an adversary is unable to identify the source or destination of a particular transmission. Thus, packets that carry information generated by a particular node cannot be linked together. The scheme offers additional security benefits, including DoS mitigation, is relatively easy to implement, and requires no changes to the existing networking infrastructure. We discuss the details of the implementation of the scheme and evaluate its performance.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
QoS-aware Dynamic Fog Service Provisioning
Authors:
Ashkan Yousefpour,
Ashish Patil,
Genya Ishigaki,
Inwoong Kim,
Xi Wang,
Hakki C. Cankaya,
Qiong Zhang,
Weisheng Xie,
Jason P. Jue
Abstract:
Recent advances in the areas of Internet of Things (IoT), Big Data, and Machine Learning have contributed to the rise of a growing number of complex applications. These applications will be data-intensive, delay-sensitive, and real-time as smart devices prevail more in our daily life. Ensuring Quality of Service (QoS) for delay-sensitive applications is a must, and fog computing is seen as one of…
▽ More
Recent advances in the areas of Internet of Things (IoT), Big Data, and Machine Learning have contributed to the rise of a growing number of complex applications. These applications will be data-intensive, delay-sensitive, and real-time as smart devices prevail more in our daily life. Ensuring Quality of Service (QoS) for delay-sensitive applications is a must, and fog computing is seen as one of the primary enablers for satisfying such tight QoS requirements, as it puts compute, storage, and networking resources closer to the user. In this paper, we first introduce FogPlan, a framework for QoS-aware Dynamic Fog Service Provisioning (QDFSP). QDFSP concerns the dynamic deployment of application services on fog nodes, or the release of application services that have previously been deployed on fog nodes, in order to meet low latency and QoS requirements of applications while minimizing cost. FogPlan framework is practical and operates with no assumptions and minimal information about IoT nodes. Next, we present a possible formulation (as an optimization problem) and two efficient greedy algorithms for addressing the QDFSP at one instance of time. Finally, the FogPlan framework is evaluated using a simulation based on real-world traffic traces.
△ Less
Submitted 26 January, 2019; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Instant Accident Reporting and Crowdsensed Road Condition Analytics for Smart Cities
Authors:
Ashkan Yousefpour,
Caleb Fung,
Tam Nguyen,
David Hong,
Daniel Zhang
Abstract:
The following report contains information about a proposed technology by the authors, which consists of a device that sits inside of a vehicle and constantly monitors the car information. It can determine speed, g-force, and location coordinates. Using these data, the device can detect a car crash or pothole on the road. The data collected from the car is forwarded to a server to for more in-depth…
▽ More
The following report contains information about a proposed technology by the authors, which consists of a device that sits inside of a vehicle and constantly monitors the car information. It can determine speed, g-force, and location coordinates. Using these data, the device can detect a car crash or pothole on the road. The data collected from the car is forwarded to a server to for more in-depth analytics. If there is an accident, the server promptly contacts the emergency services with the location of the crash. Moreover, the pothole information is used for analytics of road conditions.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.