Search | arXiv e-print repository

Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels

Authors: Osama A. Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

Abstract: Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by… ▽ More Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments. △ Less

Submitted 29 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2211.05632 [pdf, ps, other]

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a numb… ▽ More In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a number of action plays. This problem is considered more challenging than the linear bandit problem, which can be viewed as a contextual bandit problem with a \emph{fixed} context. Surprisingly, in this paper, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. In particular, we establish a novel reduction framework that converts every stochastic contextual linear bandit instance to a linear bandit instance, when the context distribution is known. When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances. As a consequence, our results imply a $O(d\sqrt{T\log T})$ high-probability regret bound for contextual linear bandits, making progress in resolving an open problem in (Li et al., 2019), (Li et al., 2021). Our reduction framework opens up a new way to approach stochastic contextual linear bandit problems, and enables improved regret bounds in a number of instances including the batch setting, contextual bandits with misspecifications, contextual bandits with sparse unknown parameters, and contextual bandits with adversarial corruption. △ Less

Submitted 26 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2207.03445 [pdf, other]

Differentially Private Stochastic Linear Bandits: (Almost) for Free

Authors: Osama A. Hanna, Antonious M. Girgis, Christina Fragouli, Suhas Diggavi

Abstract: In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear… ▽ More In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde{O}(\frac{1}ε\sqrt{T})$. In the local case, we achieve a regret of $\tilde{O}(\frac{1}ε{\sqrt{T}})$ which matches the non-private regret for constant $ε$, but suffers a regret penalty when $ε$ is small. In the shuffled model, we also achieve regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ %for small $ε$ as in the central case, while the best previously known algorithm suffers a regret of $\tilde{O}(\frac{1}ε{T^{3/5}})$. Our numerical evaluation validates our theoretical results. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.04180 [pdf, ps, other]

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit lear… ▽ More Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit learning problem, where the agents who observe the contexts and take actions are geographically separated from the learner who performs the learning while not seeing the contexts. We assume that contexts are generated from a distribution and propose a method that uses $\approx 5d$ bits per context for the case of unknown context distribution and $0$ bits per context if the context distribution is known, while achieving nearly the same regret bound as if the contexts were directly observable. The former bound improves upon existing bounds by a $\log(T)$ factor, where $T$ is the length of the horizon, while the latter achieves information theoretical tightness. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2111.06067 [pdf, other]

Solving Multi-Arm Bandit Using a Few Bits of Communication

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In… ▽ More The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments. △ Less

Submitted 11 November, 2021; originally announced November 2021.

arXiv:2012.07913 [pdf, other]

Quantizing data for distributed learning

Authors: Osama A. Hanna, Yahya H. Ezzeldin, Christina Fragouli, Suhas Diggavi

Abstract: We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alt… ▽ More We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach to learn from distributed data that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach leverages the dependency of the computed gradient on data samples, which lie in a much smaller space in order to perform the quantization in the smaller dimension data space. At the cost of an extra gradient computation, the gradient estimate can be refined by conveying the difference between the gradient at the quantized data point and the original gradient using a small number of bits. Lastly, in order to save communication, our approach adds a layer that decides whether to transmit a quantized data sample or not based on its importance for learning. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods. These communication savings come at the cost of increasing computation at the learning agent, and thus our approach is beneficial in scenarios where communication load is the main problem. △ Less

Submitted 8 September, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

arXiv:1911.00216 [pdf, other]

On Distributed Quantization for Classification

Authors: Osama A. Hanna, Yahya H. Ezzeldin, Tara Sadjadpour, Christina Fragouli, Suhas Diggavi

Abstract: We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that… ▽ More We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that help the central node reconstruct the original signal as accurately as possible, our focus is not reconstruction accuracy, but instead correct classification. Our work does not make any apriori distributional assumptions on the data, but instead uses training data for the quantizer design. Our main contributions include: we prove NP-hardness of finding optimal quantizers in the general case; we design an optimal scheme for a special case; we propose quantization algorithms, that leverage discrete neural representations and training data, and can be designed in polynomial-time for any number of features, any number of classes, and arbitrary division of features across the distributed nodes. We find that tailoring the quantizers to the classification task can offer significant savings: as compared to alternatives, we can achieve more than a factor of two reduction in terms of the number of bits communicated, for the same classification accuracy. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1803.03610 [pdf, other]

Random Access Schemes in Wireless Systems With Correlated User Activity

Authors: Anders Ellersgaard Kalør, Osama A. Hanna, Petar Popovski

Abstract: Traditional random access schemes are designed based on the aggregate process of user activation, which is created on the basis of independent activations of the users. However, in Machine-Type Communications (MTC), some users are likely to exhibit a high degree of correlation, e.g. because they observe the same physical phenomenon. This paves the way to devise access schemes that combine scheduli… ▽ More Traditional random access schemes are designed based on the aggregate process of user activation, which is created on the basis of independent activations of the users. However, in Machine-Type Communications (MTC), some users are likely to exhibit a high degree of correlation, e.g. because they observe the same physical phenomenon. This paves the way to devise access schemes that combine scheduling and random access, which is the topic of this work. The underlying idea is to schedule highly correlated users in such a way that their transmissions are less likely to result in a collision. To this end, we propose two greedy allocation algorithms. Both attempt to maximize the throughput using only pairwise correlations, but they rely on different assumptions about the higher-order dependencies. We show that both algorithms achieve higher throughput compared to the traditional random access schemes, suggesting that user correlation can be utilized effectively in access protocols for MTC. △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: Submitted to SPAWC 2018

arXiv:1702.05528 [pdf, other]

Degrees of Freedom in Cached MIMO Relay Networks With Multiple Base Stations

Authors: Osama A. Hanna, Amr El-Keyi, Mohammed Nafie

Abstract: The ability of physical layer relay caching to increase the degrees of freedom (DoF) of a single cell was recently illustrated. In this paper, we extend this result to the case of multiple cells in which a caching relay is shared among multiple non-cooperative base stations (BSs). In particular, we show that a large DoF gain can be achieved by exploiting the benefits of having a shared relay that… ▽ More The ability of physical layer relay caching to increase the degrees of freedom (DoF) of a single cell was recently illustrated. In this paper, we extend this result to the case of multiple cells in which a caching relay is shared among multiple non-cooperative base stations (BSs). In particular, we show that a large DoF gain can be achieved by exploiting the benefits of having a shared relay that cooperates with the BSs. We first propose a cache-assisted relaying protocol that improves the cooperation opportunity between the BSs and the relay. Next, we consider the cache content placement problem that aims to design the cache content at the relay such that the DoF gain is maximized. We propose an optimal algorithm and a near-optimal low-complexity algorithm for the cache content placement problem. Simulation results show significant improvement in the DoF gain using the proposed relay-caching protocol. △ Less

Submitted 17 February, 2017; originally announced February 2017.

Showing 1–9 of 9 results for author: Hanna, O A