-
The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems
Authors:
Petr Kasalický,
Martin Spišák,
Vojtěch Vančura,
Daniel Bohuněk,
Rodrigo Alves,
Pavel Kordík
Abstract:
Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes grow, memory constraints make storage and access increasingly difficult. We describe a lightweight, learnable embedding compression technique that projects dense…
▽ More
Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes grow, memory constraints make storage and access increasingly difficult. We describe a lightweight, learnable embedding compression technique that projects dense embeddings into a high-dimensional, sparsely activated space. Designed for retrieval tasks, our method reduces memory requirements while preserving retrieval performance, enabling scalable deployment under strict resource constraints. Our results demonstrate that leveraging sparsity is a promising approach for improving the efficiency of large-scale recommenders. We release our code at https://github.com/recombee/CompresSAE.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems
Authors:
Vojtěch Vančura,
Pavel Kordík,
Milan Straka
Abstract:
Recommender systems often use text-side information to improve their predictions, especially in cold-start or zero-shot recommendation scenarios, where traditional collaborative filtering approaches cannot be used. Many approaches to text-mining side information for recommender systems have been proposed over recent years, with sentence Transformers being the most prominent one. However, these mod…
▽ More
Recommender systems often use text-side information to improve their predictions, especially in cold-start or zero-shot recommendation scenarios, where traditional collaborative filtering approaches cannot be used. Many approaches to text-mining side information for recommender systems have been proposed over recent years, with sentence Transformers being the most prominent one. However, these models are trained to predict semantic similarity without utilizing interaction data with hidden patterns specific to recommender systems. In this paper, we propose beeFormer, a framework for training sentence Transformer models with interaction data. We demonstrate that our models trained with beeFormer can transfer knowledge between datasets while outperforming not only semantic similarity sentence Transformers but also traditional collaborative filtering methods. We also show that training on multiple datasets from different domains accumulates knowledge in a single model, unlocking the possibility of training universal, domain-agnostic sentence Transformer models to mine text representations for recommender systems. We release the source code, trained models, and additional details allowing replication of our experiments at https://github.com/recombee/beeformer.
△ Less
Submitted 8 October, 2024; v1 submitted 16 September, 2024;
originally announced September 2024.
-
SimpleTRON: Simple Transformer with O(N) Complexity
Authors:
Uladzislau Yorsh,
Alexander Kovalenko,
Vojtěch Vančura,
Daniel Vašata,
Pavel Kordík,
Tomáš Mikolov
Abstract:
In this paper, we propose that the dot product pairwise matching attention layer, which is widely used in Transformer-based models, is redundant for the model performance. Attention, in its original formulation, has to be seen rather as a human-level tool to explore and/or visualize relevancy scores in sequential data. However, the way how it is constructed leads to significant computational compl…
▽ More
In this paper, we propose that the dot product pairwise matching attention layer, which is widely used in Transformer-based models, is redundant for the model performance. Attention, in its original formulation, has to be seen rather as a human-level tool to explore and/or visualize relevancy scores in sequential data. However, the way how it is constructed leads to significant computational complexity. Instead, we present SimpleTRON: Simple Transformer with O(N) Complexity, a simple and fast alternative without any approximation that, unlike other approximation models, does not have any architecture-related overhead and therefore can be seen as a purely linear Transformer-like model. This architecture, to the best of our knowledge, outperforms existing sub-quadratic attention approximation models on several tasks from the Long-Range Arena benchmark. Moreover, we show, that SimpleTRON can benefit from weight transfer from pretrained large language models, as its parameters can be fully transferable.
△ Less
Submitted 28 June, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Deep Variational Autoencoder with Shallow Parallel Path for Top-N Recommendation (VASP)
Authors:
Vojtěch Vančura,
Pavel Kordík
Abstract:
Recently introduced EASE algorithm presents a simple and elegant way, how to solve the top-N recommendation task. In this paper, we introduce Neural EASE to further improve the performance of this algorithm by incorporating techniques for training modern neural networks. Also, there is a growing interest in the recsys community to utilize variational autoencoders (VAE) for this task. We introduce…
▽ More
Recently introduced EASE algorithm presents a simple and elegant way, how to solve the top-N recommendation task. In this paper, we introduce Neural EASE to further improve the performance of this algorithm by incorporating techniques for training modern neural networks. Also, there is a growing interest in the recsys community to utilize variational autoencoders (VAE) for this task. We introduce deep autoencoder FLVAE benefiting from multiple non-linear layers without an information bottleneck while not overfitting towards the identity. We show how to learn FLVAE in parallel with Neural EASE and achieve the state of the art performance on the MovieLens 20M dataset and competitive results on the Netflix Prize dataset.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.