-
A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks
Authors:
Georgios Papadopoulos,
Shaltiel Eloul,
Yash Satsangi,
Jamie Heredge,
Niraj Kumar,
Chun-Fu Chen,
Marco Pistoia
Abstract:
The loss landscape of Variational Quantum Neural Networks (VQNNs) is characterized by local minima that grow exponentially with increasing qubits. Because of this, it is more challenging to recover information from model gradients during training compared to classical Neural Networks (NNs). In this paper we present a numerical scheme that successfully reconstructs input training, real-world, pract…
▽ More
The loss landscape of Variational Quantum Neural Networks (VQNNs) is characterized by local minima that grow exponentially with increasing qubits. Because of this, it is more challenging to recover information from model gradients during training compared to classical Neural Networks (NNs). In this paper we present a numerical scheme that successfully reconstructs input training, real-world, practical data from trainable VQNNs' gradients. Our scheme is based on gradient inversion that works by combining gradients estimation with the finite difference method and adaptive low-pass filtering. The scheme is further optimized with Kalman filter to obtain efficient convergence. Our experiments show that our algorithm can invert even batch-trained data, given the VQNN model is sufficiently over-parameterized.
△ Less
Submitted 7 May, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
Applications of Certified Randomness
Authors:
Omar Amer,
Shouvanik Chakrabarti,
Kaushik Chakraborty,
Shaltiel Eloul,
Niraj Kumar,
Charles Lim,
Minzhao Liu,
Pradeep Niroula,
Yash Satsangi,
Ruslan Shaydulin,
Marco Pistoia
Abstract:
Certified randomness can be generated with untrusted remote quantum computers using multiple known protocols, one of which has been recently realized experimentally. Unlike the randomness sources accessible on today's classical computers, the output of these protocols can be certified to be random under certain computational hardness assumptions, with no trust required in the hardware generating t…
▽ More
Certified randomness can be generated with untrusted remote quantum computers using multiple known protocols, one of which has been recently realized experimentally. Unlike the randomness sources accessible on today's classical computers, the output of these protocols can be certified to be random under certain computational hardness assumptions, with no trust required in the hardware generating the randomness. In this perspective, we explore real-world applications for which the use of certified randomness protocols may lead to improved security and fairness. We identify promising applications in areas including cryptography, differential privacy, financial markets, and blockchain. Through this initial exploration, we hope to shed light on potential applications of certified randomness.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Private, Auditable, and Distributed Ledger for Financial Institutes
Authors:
Shaltiel Eloul,
Yash Satsangi,
Yeoh Wei Zhu,
Omar Amer,
Georgios Papadopoulos,
Marco Pistoia
Abstract:
Distributed ledger technology offers several advantages for banking and finance industry, including efficient transaction processing and cross-party transaction reconciliation. The key challenges for adoption of this technology in financial institutes are (a) the building of a privacy-preserving ledger, (b) supporting auditing and regulatory requirements, and (c) flexibility to adapt to complex us…
▽ More
Distributed ledger technology offers several advantages for banking and finance industry, including efficient transaction processing and cross-party transaction reconciliation. The key challenges for adoption of this technology in financial institutes are (a) the building of a privacy-preserving ledger, (b) supporting auditing and regulatory requirements, and (c) flexibility to adapt to complex use-cases with multiple digital assets and actors. This paper proposes a framework for a private, audit-able, and distributed ledger (PADL) that adapts easily to fundamental use-cases within financial institutes. PADL employs widely-used cryptography schemes combined with zero-knowledge proofs to propose a transaction scheme for a `table' like ledger. It enables fast confidential peer-to-peer multi-asset transactions, and transaction graph anonymity, in a no-trust setup, but with customized privacy. We prove that integrity and anonymity of PADL is secured against a strong threat model. Furthermore, we showcase three fundamental real-life use-cases, namely, an assets exchange ledger, a settlement ledger, and a bond market ledger. Based on these use-cases we show that PADL supports smooth-lined inter-assets auditing while preserving privacy of the participants. For example, we show how a bank can be audited for its liquidity or credit risk without violation of privacy of itself or any other party, or how can PADL ensures honest coupon rate payment in bond market without sharing investors values. Finally, our evaluation shows PADL's advantage in performance against previous relevant schemes.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Estimating class separability of text embeddings with persistent homology
Authors:
Kostis Gourgoulias,
Najah Ghalyan,
Maxime Labonne,
Yash Satsangi,
Sean Moran,
Joseph Sabelja
Abstract:
This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. Using persistent homology, we demonstrate how tracking the evolution of embedding manifolds during training can inform about class separability. More specifically, we show how this technique can be applied to detect when the training process stops improving the separab…
▽ More
This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. Using persistent homology, we demonstrate how tracking the evolution of embedding manifolds during training can inform about class separability. More specifically, we show how this technique can be applied to detect when the training process stops improving the separability of the embeddings. Our results, validated across binary and multi-class text classification tasks, show that the proposed method's estimates of class separability align with those obtained from supervised methods. This approach offers a novel perspective on monitoring and improving the fine-tuning of sentence transformers for classification tasks, particularly in scenarios where labeled data is scarce. We also discuss how tracking these quantities can provide additional insights into the properties of the trained classifier.
△ Less
Submitted 18 June, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning
Authors:
Yash Satsangi,
Paniz Behboudian
Abstract:
A key challenge for a reinforcement learning (RL) agent is to incorporate external/expert1 advice in its learning. The desired goals of an algorithm that can shape the learning of an RL agent with external advice include (a) maintaining policy invariance; (b) accelerating the learning of the agent; and (c) learning from arbitrary advice [3]. To address this challenge this paper formulates the prob…
▽ More
A key challenge for a reinforcement learning (RL) agent is to incorporate external/expert1 advice in its learning. The desired goals of an algorithm that can shape the learning of an RL agent with external advice include (a) maintaining policy invariance; (b) accelerating the learning of the agent; and (c) learning from arbitrary advice [3]. To address this challenge this paper formulates the problem of incorporating external advice in RL as a multi-armed bandit called shaping-bandits. The reward of each arm of shaping bandits corresponds to the return obtained by following the expert or by following a default RL algorithm learning on the true environment reward.We show that directly applying existing bandit and shaping algorithms that do not reason about the non-stationary nature of the underlying returns can lead to poor results. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES (LPIES) three different shaping algorithms built on different assumptions that reason about the long-term consequences of following the expert policy or the default RL algorithm. Our experiments in four different settings show that these proposed algorithms achieve the above-mentioned goals whereas the other algorithms fail to do so.
△ Less
Submitted 18 September, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Topical: Learning Repository Embeddings from Source Code using Attention
Authors:
Agathe Lherondelle,
Varun Babbar,
Yash Satsangi,
Fran Silavong,
Shaltiel Eloul,
Sean Moran
Abstract:
This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on public…
▽ More
This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.
△ Less
Submitted 4 November, 2023; v1 submitted 19 August, 2022;
originally announced August 2022.
-
Learning to Be Cautious
Authors:
Montaser Mohammedalamen,
Dustin Morrill,
Alexander Sieusahai,
Yash Satsangi,
Michael Bowling
Abstract:
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that can learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast…
▽ More
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that can learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast, current approaches typically embed task-specific safety information or explicit cautious behaviors into the system, which is error-prone and imposes extra burdens on practitioners. In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to learn to be cautious. The essential features of our algorithm are that it characterizes reward function uncertainty without task-specific safety information and uses this uncertainty to construct a robust policy. Specifically, we construct robust policies with a k-of-N counterfactual regret minimization (CFR) subroutine given learned reward function uncertainty represented by a neural network ensemble. These policies exhibit caution in each of our tasks without any task-specific safety tuning.
△ Less
Submitted 13 May, 2025; v1 submitted 29 October, 2021;
originally announced October 2021.
-
Useful Policy Invariant Shaping from Arbitrary Advice
Authors:
Paniz Behboudian,
Yash Satsangi,
Matthew E. Taylor,
Anna Harutyunyan,
Michael Bowling
Abstract:
Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can b…
▽ More
Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can be successfully used to shape the reward; by adding additional reward information, the agent can learn with much less data. Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered. While such potential-based reward shaping (PBRS) holds promise, it is limited by the need for a well-defined potential function. Ideally, we would like to be able to take arbitrary advice from a human or other agent and improve performance without affecting the optimal policy. The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent and improves performance without affecting the optimal policy. The main contribution of this paper is to expose, theoretically and empirically, a flaw in DPBA. Alternatively, to achieve the ideal goals, we present a simple method called policy invariant explicit shaping (PIES) and show theoretically and empirically that PIES succeeds where DPBA fails.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Real-Time Resource Allocation for Tracking Systems
Authors:
Yash Satsangi,
Shimon Whiteson,
Frans A. Oliehoek,
Henri Bouma
Abstract:
Automated tracking is key to many computer vision applications. However, many tracking systems struggle to perform in real-time due to the high computational cost of detecting people, especially in ultra high resolution images. We propose a new algorithm called \emph{PartiMax} that greatly reduces this cost by applying the person detector only to the relevant parts of the image. PartiMax exploits…
▽ More
Automated tracking is key to many computer vision applications. However, many tracking systems struggle to perform in real-time due to the high computational cost of detecting people, especially in ultra high resolution images. We propose a new algorithm called \emph{PartiMax} that greatly reduces this cost by applying the person detector only to the relevant parts of the image. PartiMax exploits information in the particle filter to select $k$ of the $n$ candidate \emph{pixel boxes} in the image. We prove that PartiMax is guaranteed to make a near-optimal selection with error bounds that are independent of the problem size. Furthermore, empirical results on a real-life dataset show that our system runs in real-time by processing only 10\% of the pixel boxes in the image while still retaining 80\% of the original tracking performance achieved when processing all pixel boxes.
△ Less
Submitted 21 September, 2020;
originally announced October 2020.
-
Exploiting Submodular Value Functions For Scaling Up Active Perception
Authors:
Yash Satsangi,
Shimon Whiteson,
Frans A. Oliehoek,
Matthijs T. J. Spaan
Abstract:
In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent's belief can remove the piecewise-linear and convex property of the value function required by mos…
▽ More
In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent's belief can remove the piecewise-linear and convex property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We show the mathematical equivalence of $ρ$POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and $ρ$POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Maximizing Information Gain in Partially Observable Environments via Prediction Reward
Authors:
Yash Satsangi,
Sungsu Lim,
Shimon Whiteson,
Frans Oliehoek,
Martha White
Abstract:
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of…
▽ More
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards---namely visual attention, question answering systems, and intrinsic motivation---and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Probably Approximately Correct Greedy Maximization with Efficient Bounds on Information Gain for Sensor Selection
Authors:
Yash Satsangi,
Shimon Whiteson,
Frans A. Oliehoek
Abstract:
Submodular function maximization finds application in a variety of real-world decision-making problems. However, most existing methods, based on greedy maximization, assume it is computationally feasible to evaluate F, the function being maximized. Unfortunately, in many realistic settings F is too expensive to evaluate exactly even once. We present probably approximately correct greedy maximizati…
▽ More
Submodular function maximization finds application in a variety of real-world decision-making problems. However, most existing methods, based on greedy maximization, assume it is computationally feasible to evaluate F, the function being maximized. Unfortunately, in many realistic settings F is too expensive to evaluate exactly even once. We present probably approximately correct greedy maximization, which requires access only to cheap anytime confidence bounds on F and uses them to prune elements. We show that, with high probability, our method returns an approximately optimal set. We propose novel, cheap confidence bounds for conditional entropy, which appears in many common choices of F and for which it is difficult to find unbiased or bounded estimates. Finally, results on a real-world dataset from a multi-camera tracking system in a shopping mall demonstrate that our approach performs comparably to existing methods, but at a fraction of the computational cost.
△ Less
Submitted 10 August, 2020; v1 submitted 25 February, 2016;
originally announced February 2016.