Skip to main content

Showing 1–11 of 11 results for author: Soloveychik, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.17728  [pdf, other

    cs.CL cs.AI cs.AR

    LLM Inference Acceleration via Efficient Operation Fusion

    Authors: Mahsa Salmani, Ilya Soloveychik

    Abstract: The rapid development of the Transformer-based Large Language Models (LLMs) in recent years has been closely linked to their ever-growing and already enormous sizes. Many LLMs contain hundreds of billions of parameters and require dedicated hardware resources for training and inference. One of the key challenges inherent to the Transformer architecture is the requirement to support numerous non-li… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  2. arXiv:2410.10553  [pdf, other

    cs.LG cs.AI cs.CL

    SLaNC: Static LayerNorm Calibration

    Authors: Mahsa Salmani, Nikita Trukhanov, Ilya Soloveychik

    Abstract: The ever increasing sizes of Large Language Models (LLMs) beyond hundreds of billions of parameters have generated enormous pressure on the manufacturers of dedicated hardware accelerators and made the innovative design of the latter one of the most rapidly expanding fields of the AI industry. Various approaches have been explored to enable efficient and accurate processing of LLMs on the availabl… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 9 pages, 3 figures, NeurIPS 2024 MLNCP Workshop

  3. arXiv:2405.07135  [pdf, other

    cs.LG cs.AI

    Post Training Quantization of Large Language Models with Microscaling Formats

    Authors: Sayeh Sharify, Utkarsh Saxena, Zifei Xu, Wanzin Yazar, Ilya Soloveychik, Xin Wang

    Abstract: Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of three well-known post-training techniques, SmoothQuant, AWQ, and GPTQ, and… ▽ More

    Submitted 15 October, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

  4. arXiv:2403.20137  [pdf, other

    cs.AI cs.AR math.NA

    Accurate Block Quantization in LLMs with Outliers

    Authors: Nikita Trukhanov, Ilya Soloveychik

    Abstract: The demand for inference on extremely large scale LLMs has seen enormous growth in the recent months. It made evident the colossal shortage of dedicated hardware capable of efficient and fast processing of the involved compute and memory movement. The problem is aggravated by the exploding raise in the lengths of the sequences being processed, since those require efficient on-chip storage of the K… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  5. arXiv:2403.09054  [pdf, other

    cs.LG cs.AI cs.AR cs.CL

    Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference

    Authors: Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J. Nair, Ilya Soloveychik, Purushotham Kamath

    Abstract: Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phas… ▽ More

    Submitted 5 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    MSC Class: 68U35 ACM Class: I.2.7; C.0

    Journal ref: Proceedings of the 7th Annual Conference on Machine Learning and Systems (MLSys), 2024

  6. arXiv:2308.10119  [pdf, other

    cs.IT eess.SP stat.ME

    Error Probability Bounds for Invariant Causal Prediction via Multiple Access Channels

    Authors: Austin Goddard, Yu Xiang, Ilya Soloveychik

    Abstract: We consider the problem of lower bounding the error probability under the invariant causal prediction (ICP) framework. To this end, we examine and draw connections between ICP and the zero-rate Gaussian multiple access channel by first proposing a variant of the original invariant prediction assumption, and then considering a special case of the Gaussian multiple access channel where a codebook is… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted to the 2023 Asilomar Conference on Signals, Systems, and Computers

  7. arXiv:2210.05470  [pdf, other

    cs.LG cs.AR math.NA

    Block Format Error Bounds and Optimal Block Size Selection

    Authors: Ilya Soloveychik, Ilya Lyubomirsky, Xin Wang, Sudeep Bhoja

    Abstract: The amounts of data that need to be transmitted, processed, and stored by the modern deep neural networks have reached truly enormous volumes in the last few years calling for the invention of new paradigms both in hardware and software development. One of the most promising and rapidly advancing frontiers here is the creation of new numerical formats. In this work we focus on the family of block… ▽ More

    Submitted 7 November, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

  8. arXiv:2206.14362  [pdf, other

    cs.IT eess.SP stat.ME

    Lower Bounds on the Error Probability for Invariant Causal Prediction

    Authors: Austin Goddard, Yu Xiang, Ilya Soloveychik

    Abstract: It is common practice to collect observations of feature and response pairs from different environments. A natural question is how to identify features that have consistent prediction power across environments. The invariant causal prediction framework proposes to approach this problem through invariance, assuming a linear model that is invariant under different environments. In this work, we make… ▽ More

    Submitted 29 June, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Accepted to the 2022 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

  9. arXiv:2006.03311  [pdf, other

    math.ST cs.LG

    A Robust Test for Elliptical Symmetry

    Authors: Ilya Soloveychik

    Abstract: Most signal processing and statistical applications heavily rely on specific data distribution models. The Gaussian distributions, although being the most common choice, are inadequate in most real world scenarios as they fail to account for data coming from heavy-tailed populations or contaminated by outliers. Such problems call for the use of Robust Statistics. The robust models and estimators a… ▽ More

    Submitted 14 April, 2023; v1 submitted 5 June, 2020; originally announced June 2020.

  10. arXiv:1806.03571  [pdf, other

    stat.ML cs.LG

    Stationary Geometric Graphical Model Selection

    Authors: Ilya Soloveychik, Vahid Tarokh

    Abstract: We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. In many practically important cases, the underlying networks are embedded into Euclidean spaces. Using the natural geometric structure, we introduce the notion of spatially stationary distributions over geometric graphs. This directly generalizes the notion of stationary time series to the multid… ▽ More

    Submitted 29 October, 2018; v1 submitted 9 June, 2018; originally announced June 2018.

    Comments: arXiv admin note: text overlap with arXiv:1802.03848

  11. arXiv:1701.05544  [pdf, other

    cs.IT

    Pseudo-Wigner Matrices

    Authors: Ilya Soloveychik, Yu Xiang, Vahid Tarokh

    Abstract: We consider the problem of generating pseudo-random matrices based on the similarity of their spectra to Wigner's semicircular law. We introduce the notion of an r-independent pseudo-Wigner matrix ensemble and prove closeness of the spectra of its matrices to the semicircular density in the Kolmogorov distance. We give an explicit construction of a family of N by N pseudo-Wigner ensembles using du… ▽ More

    Submitted 26 February, 2018; v1 submitted 19 January, 2017; originally announced January 2017.