-
Linear Log-Normal Attention with Unbiased Concentration
Authors:
Yury Nahshan,
Joseph Kampeas,
Emir Haleva
Abstract:
Transformer models have achieved remarkable results in a wide range of applications. However, their scalability is hampered by the quadratic time and memory complexity of the self-attention mechanism concerning the sequence length. This limitation poses a substantial obstacle when dealing with long documents or high-resolution images. In this work, we study the self-attention mechanism by analyzin…
▽ More
Transformer models have achieved remarkable results in a wide range of applications. However, their scalability is hampered by the quadratic time and memory complexity of the self-attention mechanism concerning the sequence length. This limitation poses a substantial obstacle when dealing with long documents or high-resolution images. In this work, we study the self-attention mechanism by analyzing the distribution of the attention matrix and its concentration ability. Furthermore, we propose instruments to measure these quantities and introduce a novel self-attention mechanism, Linear Log-Normal Attention, designed to emulate the distribution and concentration behavior of the original self-attention. Our experimental results on popular natural language benchmarks reveal that our proposed Linear Log-Normal Attention outperforms other linearized attention alternatives, offering a promising avenue for enhancing the scalability of transformer models.
△ Less
Submitted 26 February, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Rotation Invariant Quantization for Model Compression
Authors:
Joseph Kampeas,
Yury Nahshan,
Hanoch Kremer,
Gil Lederman,
Shira Zaloshinski,
Zheng Li,
Emir Haleva
Abstract:
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a diffe…
▽ More
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates $\times 19.4$ and $\times 52.9$ compression ratios on pre-trained VGG dense and pruned models, respectively, with $<0.4\%$ accuracy degradation. Code is available in \href{https://github.com/ehaleva/RIQ}{github.com/ehaleva/RIQ}.
△ Less
Submitted 1 December, 2024; v1 submitted 3 March, 2023;
originally announced March 2023.
-
On Secrecy Rates and Outage in Multi-User Multi-Eavesdroppers MISO Systems
Authors:
Joseph Kampeas,
Asaf Cohen,
Omer Gurewitz
Abstract:
In this paper, we study the secrecy rate and outage probability in Multiple-Input-Single-Output (MISO) Gaussian wiretap channels at the limit of a large number of legitimate users and eavesdroppers. In particular, we analyze the asymptotic achievable secrecy rates and outage, when only statistical knowledge on the wiretap channels is available to the transmitter.
The analysis provides exact expr…
▽ More
In this paper, we study the secrecy rate and outage probability in Multiple-Input-Single-Output (MISO) Gaussian wiretap channels at the limit of a large number of legitimate users and eavesdroppers. In particular, we analyze the asymptotic achievable secrecy rates and outage, when only statistical knowledge on the wiretap channels is available to the transmitter.
The analysis provides exact expressions for the reduction in the secrecy rate as the number of eavesdroppers grows, compared to the boost in the secrecy rate as the number of legitimate users grows.
△ Less
Submitted 8 May, 2016;
originally announced May 2016.
-
The Ergodic Capacity of the Multiple Access Channel Under Distributed Scheduling - Order Optimality of Linear Receivers
Authors:
Joseph Kampeas,
Asaf Cohen,
Omer Gurewitz
Abstract:
Consider the problem of a Multiple-Input Multiple-Output (MIMO) Multiple-Access Channel (MAC) at the limit of large number of users. Clearly, in practical scenarios, only a small subset of the users can be scheduled to utilize the channel simultaneously. Thus, a problem of user selection arises. However, since solutions which collect Channel State Information (CSI) from all users and decide on the…
▽ More
Consider the problem of a Multiple-Input Multiple-Output (MIMO) Multiple-Access Channel (MAC) at the limit of large number of users. Clearly, in practical scenarios, only a small subset of the users can be scheduled to utilize the channel simultaneously. Thus, a problem of user selection arises. However, since solutions which collect Channel State Information (CSI) from all users and decide on the best subset to transmit in each slot do not scale when the number of users is large, distributed algorithms for user selection are advantageous.
In this paper, we analyse a distributed user selection algorithm, which selects a group of users to transmit without coordinating between users and without all users sending CSI to the base station. This threshold-based algorithm is analysed for both Zero-Forcing (ZF) and Minimum Mean Square Error (MMSE) receivers, and its expected sum-rate in the limit of large number of users is investigated. It is shown that for large number of users it achieves the same scaling laws as the optimal centralized scheme.
△ Less
Submitted 12 January, 2018; v1 submitted 28 April, 2013;
originally announced April 2013.
-
Opportunistic Scheduling in Heterogeneous Networks: Distributed Algorithms and System Capacity
Authors:
Joseph Kampeas,
Asaf Cohen,
Omer Gurewitz
Abstract:
In this work, we design and analyze novel distributed scheduling algorithms for multi-user MIMO systems. In particular, we consider algorithms which do not require sending channel state information to a central processing unit, nor do they require communication between the users themselves, yet, we prove their performance closely approximates that of a centrally-controlled system, which is able to…
▽ More
In this work, we design and analyze novel distributed scheduling algorithms for multi-user MIMO systems. In particular, we consider algorithms which do not require sending channel state information to a central processing unit, nor do they require communication between the users themselves, yet, we prove their performance closely approximates that of a centrally-controlled system, which is able to schedule the strongest user in each time-slot.
Our analysis is based on a novel application of the Point-Process approximation. This novel technique allows us to examine non-homogeneous cases, such as non-identically distributed users, or handling various QoS considerations, and give exact expressions for the capacity of the system under these schemes, solving analytically problems which to date had been open. Possible application include, but are not limited to, modern 4G networks such as 3GPP LTE, or random access protocols.
△ Less
Submitted 16 July, 2012; v1 submitted 1 February, 2012;
originally announced February 2012.