-
ICQuant: Index Coding enables Low-bit LLM Quantization
Authors:
Xinlin Li,
Osama Hanna,
Christina Fragouli,
Suhas Diggavi
Abstract:
The rapid deployment of Large Language Models (LLMs) highlights the need for efficient low-bit post-training quantization (PTQ), due to their high memory costs. A key challenge in weight quantization is the presence of outliers, which inflate quantization ranges and lead to large errors. While a number of outlier suppression techniques have been proposed, they either: fail to effectively shrink th…
▽ More
The rapid deployment of Large Language Models (LLMs) highlights the need for efficient low-bit post-training quantization (PTQ), due to their high memory costs. A key challenge in weight quantization is the presence of outliers, which inflate quantization ranges and lead to large errors. While a number of outlier suppression techniques have been proposed, they either: fail to effectively shrink the quantization range, or incur (relatively) high bit overhead. In this paper, we present ICQuant, a novel framework that leverages outlier statistics to design an efficient index coding scheme for outlier-aware weight-only quantization. Compared to existing outlier suppression techniques requiring $\approx 1$ bit overhead to halve the quantization range, ICQuant requires only $\approx 0.3$ bits; a significant saving in extreme compression regimes (e.g., 2-3 bits per weight). ICQuant can be used on top of any existing quantizers to eliminate outliers, improving the quantization quality. Using just 2.3 bits per weight and simple scalar quantizers, ICQuant improves the zero-shot accuracy of the 2-bit Llama3-70B model by up to 130% and 150% relative to QTIP and QuIP#; and it achieves comparable performance to the best-known fine-tuned quantizer (PV-tuning) without fine-tuning.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Does Feedback Help in Bandits with Arm Erasures?
Authors:
Merve Karakas,
Osama Hanna,
Lin F. Yang,
Christina Fragouli
Abstract:
We study a distributed multi-armed bandit (MAB) problem over arm erasure channels, motivated by the increasing adoption of MAB algorithms over communication-constrained networks. In this setup, the learner communicates the chosen arm to play to an agent over an erasure channel with probability $ε\in [0,1)$; if an erasure occurs, the agent continues pulling the last successfully received arm; the l…
▽ More
We study a distributed multi-armed bandit (MAB) problem over arm erasure channels, motivated by the increasing adoption of MAB algorithms over communication-constrained networks. In this setup, the learner communicates the chosen arm to play to an agent over an erasure channel with probability $ε\in [0,1)$; if an erasure occurs, the agent continues pulling the last successfully received arm; the learner always observes the reward of the arm pulled. In past work, we considered the case where the agent cannot convey feedback to the learner, and thus the learner does not know whether the arm played is the requested or the last successfully received one. In this paper, we instead consider the case where the agent can send feedback to the learner on whether the arm request was received, and thus the learner exactly knows which arm was played. Surprisingly, we prove that erasure feedback does not improve the worst-case regret upper bound order over the previously studied no-feedback setting. In particular, we prove a regret lower bound of $Ω(\sqrt{KT} + K / (1 - ε))$, where $K$ is the number of arms and $T$ the time horizon, that matches no-feedback upper bounds up to logarithmic factors. We note however that the availability of feedback enables simpler algorithm designs that may achieve better constants (albeit not better order) regret bounds; we design one such algorithm and evaluate its performance numerically.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals
Authors:
Tomoyoshi Kimura,
Xinlin Li,
Osama Hanna,
Yatong Chen,
Yizhuo Chen,
Denizhan Kara,
Tianshi Wang,
Jinyang Li,
Xiaomin Ouyang,
Shengzhong Liu,
Mani Srivastava,
Suhas Diggavi,
Tarek Abdelzaher
Abstract:
Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals re…
▽ More
Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals result in abundant unimodal data but scarce high-quality multimodal pairs. This paper proposes InfoMAE, a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting by facilitating efficient cross-modal alignment of pretrained unimodal representations. InfoMAE achieves \textit{efficient cross-modal alignment} with \textit{limited data pairs} through a novel information theory-inspired formulation that simultaneously addresses distribution-level and instance-level alignment. Extensive experiments on two real-world IoT applications are performed to evaluate InfoMAE's pairing efficiency to bridge pretrained unimodal models into a cohesive joint multimodal model. InfoMAE enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency. It also improves unimodal task accuracy by an average of 22%.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Learning for Bandits under Action Erasures
Authors:
Osama Hanna,
Merve Karakas,
Lin F. Yang,
Christina Fragouli
Abstract:
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether th…
▽ More
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether the observed reward resulted from the desired action or not. We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures. Our scheme results in a worst-case regret over action-erasure channels that is at most a factor of $O(1/\sqrt{1-ε})$ away from the no-erasure worst-case regret of the underlying MAB algorithm, where $ε$ is the erasure probability. We also propose a modification of the successive arm elimination algorithm and prove that its worst-case regret is $\Tilde{O}(\sqrt{KT}+K/(1-ε))$, which we prove is optimal by providing a matching lower bound.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels
Authors:
Osama A. Hanna,
Merve Karakas,
Lin F. Yang,
Christina Fragouli
Abstract:
Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by…
▽ More
Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments.
△ Less
Submitted 29 April, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Common Information Dimension
Authors:
Osama Hanna,
Xinlin Li,
Suhas Diggavi,
Christina Fragouli
Abstract:
The exact common information between a set of random variables $X_1,...,X_n$ is defined as the minimum entropy of a shared random variable that allows for the exact distributive simulation of $X_1,...,X_n$. It has been established that, in certain instances, infinite entropy is required to achieve distributive simulation, suggesting that continuous random variables may be needed in such scenarios.…
▽ More
The exact common information between a set of random variables $X_1,...,X_n$ is defined as the minimum entropy of a shared random variable that allows for the exact distributive simulation of $X_1,...,X_n$. It has been established that, in certain instances, infinite entropy is required to achieve distributive simulation, suggesting that continuous random variables may be needed in such scenarios. However, to date, there is no established metric to characterize such cases. In this paper, we propose the concept of Common Information Dimension (CID) with respect to a given class of functions $\mathcal{F}$, defined as the minimum dimension of a random variable $W$ required to distributively simulate a set of random variables $X_1,...,X_n$, such that $W$ can be expressed as a function of $X_1,\cdots,X_n$ using a member of $\mathcal{F}$. Our main contributions include the computation of the common information dimension for jointly Gaussian random vectors in a closed form, with $\mathcal{F}$ being the linear functions class.
△ Less
Submitted 7 July, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms
Authors:
Osama A. Hanna,
Lin F. Yang,
Christina Fragouli
Abstract:
In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a numb…
▽ More
In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a number of action plays. This problem is considered more challenging than the linear bandit problem, which can be viewed as a contextual bandit problem with a \emph{fixed} context. Surprisingly, in this paper, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. In particular, we establish a novel reduction framework that converts every stochastic contextual linear bandit instance to a linear bandit instance, when the context distribution is known. When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances.
As a consequence, our results imply a $O(d\sqrt{T\log T})$ high-probability regret bound for contextual linear bandits, making progress in resolving an open problem in (Li et al., 2019), (Li et al., 2021).
Our reduction framework opens up a new way to approach stochastic contextual linear bandit problems, and enables improved regret bounds in a number of instances including the batch setting, contextual bandits with misspecifications, contextual bandits with sparse unknown parameters, and contextual bandits with adversarial corruption.
△ Less
Submitted 26 May, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Differentially Private Stochastic Linear Bandits: (Almost) for Free
Authors:
Osama A. Hanna,
Antonious M. Girgis,
Christina Fragouli,
Suhas Diggavi
Abstract:
In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear…
▽ More
In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde{O}(\frac{1}ε\sqrt{T})$. In the local case, we achieve a regret of $\tilde{O}(\frac{1}ε{\sqrt{T}})$ which matches the non-private regret for constant $ε$, but suffers a regret penalty when $ε$ is small. In the shuffled model, we also achieve regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ %for small $ε$ as in the central case, while the best previously known algorithm suffers a regret of $\tilde{O}(\frac{1}ε{T^{3/5}})$. Our numerical evaluation validates our theoretical results.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Learning in Distributed Contextual Linear Bandits Without Sharing the Context
Authors:
Osama A. Hanna,
Lin F. Yang,
Christina Fragouli
Abstract:
Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit lear…
▽ More
Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit learning problem, where the agents who observe the contexts and take actions are geographically separated from the learner who performs the learning while not seeing the contexts. We assume that contexts are generated from a distribution and propose a method that uses $\approx 5d$ bits per context for the case of unknown context distribution and $0$ bits per context if the context distribution is known, while achieving nearly the same regret bound as if the contexts were directly observable. The former bound improves upon existing bounds by a $\log(T)$ factor, where $T$ is the length of the horizon, while the latter achieves information theoretical tightness.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Solving Multi-Arm Bandit Using a Few Bits of Communication
Authors:
Osama A. Hanna,
Lin F. Yang,
Christina Fragouli
Abstract:
The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In…
▽ More
The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments.
△ Less
Submitted 11 November, 2021;
originally announced November 2021.
-
Quantizing data for distributed learning
Authors:
Osama A. Hanna,
Yahya H. Ezzeldin,
Christina Fragouli,
Suhas Diggavi
Abstract:
We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alt…
▽ More
We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach to learn from distributed data that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach leverages the dependency of the computed gradient on data samples, which lie in a much smaller space in order to perform the quantization in the smaller dimension data space. At the cost of an extra gradient computation, the gradient estimate can be refined by conveying the difference between the gradient at the quantized data point and the original gradient using a small number of bits. Lastly, in order to save communication, our approach adds a layer that decides whether to transmit a quantized data sample or not based on its importance for learning. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods. These communication savings come at the cost of increasing computation at the learning agent, and thus our approach is beneficial in scenarios where communication load is the main problem.
△ Less
Submitted 8 September, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
On Distributed Quantization for Classification
Authors:
Osama A. Hanna,
Yahya H. Ezzeldin,
Tara Sadjadpour,
Christina Fragouli,
Suhas Diggavi
Abstract:
We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that…
▽ More
We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that help the central node reconstruct the original signal as accurately as possible, our focus is not reconstruction accuracy, but instead correct classification. Our work does not make any apriori distributional assumptions on the data, but instead uses training data for the quantizer design. Our main contributions include: we prove NP-hardness of finding optimal quantizers in the general case; we design an optimal scheme for a special case; we propose quantization algorithms, that leverage discrete neural representations and training data, and can be designed in polynomial-time for any number of features, any number of classes, and arbitrary division of features across the distributed nodes. We find that tailoring the quantizers to the classification task can offer significant savings: as compared to alternatives, we can achieve more than a factor of two reduction in terms of the number of bits communicated, for the same classification accuracy.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
Random Access Schemes in Wireless Systems With Correlated User Activity
Authors:
Anders Ellersgaard Kalør,
Osama A. Hanna,
Petar Popovski
Abstract:
Traditional random access schemes are designed based on the aggregate process of user activation, which is created on the basis of independent activations of the users. However, in Machine-Type Communications (MTC), some users are likely to exhibit a high degree of correlation, e.g. because they observe the same physical phenomenon. This paves the way to devise access schemes that combine scheduli…
▽ More
Traditional random access schemes are designed based on the aggregate process of user activation, which is created on the basis of independent activations of the users. However, in Machine-Type Communications (MTC), some users are likely to exhibit a high degree of correlation, e.g. because they observe the same physical phenomenon. This paves the way to devise access schemes that combine scheduling and random access, which is the topic of this work. The underlying idea is to schedule highly correlated users in such a way that their transmissions are less likely to result in a collision. To this end, we propose two greedy allocation algorithms. Both attempt to maximize the throughput using only pairwise correlations, but they rely on different assumptions about the higher-order dependencies. We show that both algorithms achieve higher throughput compared to the traditional random access schemes, suggesting that user correlation can be utilized effectively in access protocols for MTC.
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
Degrees of Freedom in Cached MIMO Relay Networks With Multiple Base Stations
Authors:
Osama A. Hanna,
Amr El-Keyi,
Mohammed Nafie
Abstract:
The ability of physical layer relay caching to increase the degrees of freedom (DoF) of a single cell was recently illustrated. In this paper, we extend this result to the case of multiple cells in which a caching relay is shared among multiple non-cooperative base stations (BSs). In particular, we show that a large DoF gain can be achieved by exploiting the benefits of having a shared relay that…
▽ More
The ability of physical layer relay caching to increase the degrees of freedom (DoF) of a single cell was recently illustrated. In this paper, we extend this result to the case of multiple cells in which a caching relay is shared among multiple non-cooperative base stations (BSs). In particular, we show that a large DoF gain can be achieved by exploiting the benefits of having a shared relay that cooperates with the BSs. We first propose a cache-assisted relaying protocol that improves the cooperation opportunity between the BSs and the relay. Next, we consider the cache content placement problem that aims to design the cache content at the relay such that the DoF gain is maximized. We propose an optimal algorithm and a near-optimal low-complexity algorithm for the cache content placement problem. Simulation results show significant improvement in the DoF gain using the proposed relay-caching protocol.
△ Less
Submitted 17 February, 2017;
originally announced February 2017.