-
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
Authors:
Yuxuan Zhu,
Ali Falahati,
David H. Yang,
Mohammad Mohammadi Amiri
Abstract:
Large language models face significant computational and memory challenges when processing long contexts. During inference, efficient management of the key-value (KV) cache, which stores intermediate activations for autoregressive generation, is critical to reducing memory overhead and improving computational efficiency. Traditional token-level efficient KV caching methods overlook semantic inform…
▽ More
Large language models face significant computational and memory challenges when processing long contexts. During inference, efficient management of the key-value (KV) cache, which stores intermediate activations for autoregressive generation, is critical to reducing memory overhead and improving computational efficiency. Traditional token-level efficient KV caching methods overlook semantic information, treating tokens independently without considering their semantic relationships. Meanwhile, existing semantic-preserving KV cache management approaches often suffer from substantial memory usage and high time-to-first-token. To address these limitations, we propose SentenceKV, a novel sentence-level semantic KV caching approach designed to enhance inference efficiency while preserving semantic coherence. During prefilling, SentenceKV groups tokens based on sentence-level semantic similarity, compressing sentence representations into concise semantic vectors stored directly on the GPU, while individual KV pairs are offloaded to CPU. During decoding, SentenceKV generates tokens by selectively retrieving semantically relevant sentence-level KV entries, leveraging the semantic similarity between the prefilling-stage semantic vectors and decoding-stage queries. This ensures efficient and contextually accurate predictions, minimizing the loading of redundant or irrelevant data into GPU memory and significantly reducing memory overhead while maintaining stable inference latency, even for extremely long contexts. Extensive evaluations on benchmarks including PG-19, LongBench, and Needle-In-A-Haystack demonstrate that SentenceKV significantly outperforms state-of-the-art methods in both efficiency and memory usage, without compromising model accuracy.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation
Authors:
Ali Falahati,
Mohammad Mohammadi Amiri
Abstract:
With the emergence of data marketplaces, the demand for methods to assess the value of data has increased significantly. While numerous techniques have been proposed for this purpose, none have specifically addressed graphs as the main data modality. Graphs are widely used across various fields, ranging from chemical molecules to social networks. In this study, we break down graphs into two main c…
▽ More
With the emergence of data marketplaces, the demand for methods to assess the value of data has increased significantly. While numerous techniques have been proposed for this purpose, none have specifically addressed graphs as the main data modality. Graphs are widely used across various fields, ranging from chemical molecules to social networks. In this study, we break down graphs into two main components: structural and featural, and we focus on evaluating data without relying on specific task-related metrics, making it applicable in practical scenarios where validation requirements may be lacking. We introduce a novel framework called blind message passing, which aligns the seller's and buyer's graphs using a shared node permutation based on graph matching. This allows us to utilize the graph Wasserstein distance to quantify the differences in the structural distribution of graph datasets, called the structural disparities. We then consider featural aspects of buyers' and sellers' graphs for data valuation and capture their statistical similarities and differences, referred to as relevance and diversity, respectively. Our approach ensures that buyers and sellers remain unaware of each other's datasets. Our experiments on real datasets demonstrate the effectiveness of our approach in capturing the relevance, diversity, and structural disparities of seller data for buyers, particularly in graph-based data valuation scenarios.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Efficient Bitrate Ladder Construction using Transfer Learning and Spatio-Temporal Features
Authors:
Ali Falahati,
Mohammad Karim Safavi,
Ardavan Elahi,
Farhad Pakdaman,
Moncef Gabbouj
Abstract:
Providing high-quality video with efficient bitrate is a main challenge in video industry. The traditional one-size-fits-all scheme for bitrate ladders is inefficient and reaching the best content-aware decision computationally impractical due to extensive encodings required. To mitigate this, we propose a bitrate and complexity efficient bitrate ladder prediction method using transfer learning an…
▽ More
Providing high-quality video with efficient bitrate is a main challenge in video industry. The traditional one-size-fits-all scheme for bitrate ladders is inefficient and reaching the best content-aware decision computationally impractical due to extensive encodings required. To mitigate this, we propose a bitrate and complexity efficient bitrate ladder prediction method using transfer learning and spatio-temporal features. We propose: (1) using feature maps from well-known pre-trained DNNs to predict rate-quality behavior with limited training data; and (2) improving highest quality rung efficiency by predicting minimum bitrate for top quality and using it for the top rung. The method tested on 102 video scenes demonstrates 94.1% reduction in complexity versus brute-force at 1.71% BD-Rate expense. Additionally, transfer learning was thoroughly studied through four networks and ablation studies.
△ Less
Submitted 13 March, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Alternative Detectors for Spectrum Sensing by Exploiting Excess Bandwidth
Authors:
Sirvan Gharib,
Abolfazl Falahati,
Vahid Ahmadi
Abstract:
The problems regarding spectrum sensing are studied by exploiting a priori and a posteriori in information of the received noise variance. First, the traditional Average Likelihood Ratio (ALR) and the General Likelihood Ratio Test (GLRT) detectors are investigated under a Gamma distributed function as a channel noise, for the first time, under the availability of a priori statistical distribution…
▽ More
The problems regarding spectrum sensing are studied by exploiting a priori and a posteriori in information of the received noise variance. First, the traditional Average Likelihood Ratio (ALR) and the General Likelihood Ratio Test (GLRT) detectors are investigated under a Gamma distributed function as a channel noise, for the first time, under the availability of a priori statistical distribution about the noise variance. Then, two robust detectors are proposed using the exiting excess bandwidth to deliver a posteriori probability on the received noise variance uncertainty. The first proposed detector that is based on traditional ALR employs marginal distribution of the observation under available a priori and a posteriori of the received signal, while the second proposed detector employs the Maximum a posteriori (MAP) estimation of the inverse of the noise power under the same hypothesizes as the first detector. In addition, analytical expressions for the performance of the proposed detectors are obtained in terms of the false-alarm and detection probabilities. The simulation results exhibit the superiority of the proposed detectors over the traditional counterparts.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Cryptanalysis and enhancement of two low cost rfid authentication protocols
Authors:
Hoda Jannati,
Abolfazl Falahati
Abstract:
Widespread attention is recently paid upon RFID system structure considering its ease of deployment over an extensive range of applications. Due to its several advantages, many technical articles are published to improve its capabilities over specific system implementations. Recently, a lightweight anti-de-synchronization RFID authentication protocol and a lightweight binding proof protocol to gua…
▽ More
Widespread attention is recently paid upon RFID system structure considering its ease of deployment over an extensive range of applications. Due to its several advantages, many technical articles are published to improve its capabilities over specific system implementations. Recently, a lightweight anti-de-synchronization RFID authentication protocol and a lightweight binding proof protocol to guard patient safety are proposed. This contribution provides enough evidence to prove the first introduced protocol vulnerability to de-synchronization attack. It also provides the other protocol's suffering from de-synchronization attack as well as tracking the movements of the tags. This paper also addresses appropriate solutions to fix the security flaws concerning the two described protocols for secure RFID applications.
△ Less
Submitted 9 February, 2012;
originally announced February 2012.
-
A Secure Variant of the Hill Cipher
Authors:
M. Toorani,
A. Falahati
Abstract:
The Hill cipher is a classical symmetric encryption algorithm that succumbs to the know-plaintext attack. Although its vulnerability to cryptanalysis has rendered it unusable in practice, it still serves an important pedagogical role in cryptology and linear algebra. In this paper, a variant of the Hill cipher is introduced that makes the Hill cipher secure while it retains the efficiency. The pro…
▽ More
The Hill cipher is a classical symmetric encryption algorithm that succumbs to the know-plaintext attack. Although its vulnerability to cryptanalysis has rendered it unusable in practice, it still serves an important pedagogical role in cryptology and linear algebra. In this paper, a variant of the Hill cipher is introduced that makes the Hill cipher secure while it retains the efficiency. The proposed scheme includes a ciphering core for which a cryptographic protocol is introduced.
△ Less
Submitted 16 March, 2012; v1 submitted 18 February, 2010;
originally announced February 2010.