Skip to main content

Showing 1–8 of 8 results for author: Naumov, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2008.11922  [pdf, other

    cs.IR cs.LG stat.ML

    Time-based Sequence Model for Personalization and Recommendation Systems

    Authors: Tigran Ishkhanov, Maxim Naumov, Xianjie Chen, Yan Zhu, Yuan Zhong, Alisson Gusatti Azzolini, Chonglin Sun, Frank Jiang, Andrey Malevich, Liang Xiong

    Abstract: In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study t… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: 17 pages, 7 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0; H.3.3; H.3.4

  2. arXiv:1909.11810  [pdf, other

    cs.LG stat.ML

    Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

    Authors: Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

    Abstract: Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings. To help manage this outsized memory consumption, we explore mixed dimension embeddings, an embedding layer architecture in which a particular embedding vector's dimension scales with its que… ▽ More

    Submitted 8 February, 2021; v1 submitted 25 September, 2019; originally announced September 2019.

  3. arXiv:1909.02107  [pdf, other

    cs.LG cs.IR stat.ML

    Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

    Authors: Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

    Abstract: Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens o… ▽ More

    Submitted 28 June, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: 11 pages, 7 figures, 1 table

  4. arXiv:1901.02103  [pdf, other

    cs.LG cs.CV cs.IT stat.ML

    On the Dimensionality of Embeddings for Sparse Features and Data

    Authors: Maxim Naumov

    Abstract: In this note we discuss a common misconception, namely that embeddings are always used to reduce the dimensionality of the item space. We show that when we measure dimensionality in terms of information entropy then the embedding of sparse probability distributions, that can be used to represent sparse features or data, may or not reduce the dimensionality of the item space. However, the embedding… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

    Comments: 8 pages, 2 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0

  5. arXiv:1811.09886  [pdf, other

    cs.LG stat.ML

    Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

    Authors: Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, Juan Pino, Martin Schatz, Alexander Sidorov, Viswanath Sivakumar, Andrew Tulloch, Xiaodong Wang, Yiming Wu, Hector Yuen, Utku Diril, Dmytro Dzhulgakov, Kim Hazelwood, Bill Jia, Yangqing Jia, Lin Qiao, Vijay Rao , et al. (3 additional authors not shown)

    Abstract: The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions… ▽ More

    Submitted 29 November, 2018; v1 submitted 24 November, 2018; originally announced November 2018.

  6. arXiv:1811.09862  [pdf, other

    cs.LG cs.CV stat.ML

    On Periodic Functions as Regularizers for Quantization of Neural Networks

    Authors: Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

    Abstract: Deep learning models have been successfully used in computer vision and many other fields. We propose an unorthodox algorithm for performing quantization of the model parameters. In contrast with popular quantization schemes based on thresholds, we use a novel technique based on periodic functions, such as continuous trigonometric sine or cosine as well as non-continuous hat functions. We apply th… ▽ More

    Submitted 24 November, 2018; originally announced November 2018.

    Comments: 11 pages, 7 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0

  7. arXiv:1811.05922  [pdf, other

    cs.LG stat.ML

    Bandana: Using Non-volatile Memory for Storing Deep Learning Models

    Authors: Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, Sachin Katti

    Abstract: Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache. The main challenge in… ▽ More

    Submitted 14 November, 2018; v1 submitted 14 November, 2018; originally announced November 2018.

  8. arXiv:1712.02029  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

    Authors: Aditya Devarakonda, Maxim Naumov, Michael Garland

    Abstract: Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size f… ▽ More

    Submitted 13 February, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: 14 pages

    MSC Class: 68T05; ACM Class: I.2.6; I.5.0