Skip to main content

Showing 1–12 of 12 results for author: Katharopoulos, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.01804  [pdf, other

    cs.LG cs.CL

    Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

    Authors: Pierre Ablin, Angelos Katharopoulos, Skyler Seto, David Grangier

    Abstract: Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters,… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  2. arXiv:2410.03529  [pdf, other

    cs.LG cs.CL

    No Need to Talk: Asynchronous Mixture of Language Models

    Authors: Anastasiia Filippova, Angelos Katharopoulos, David Grangier, Ronan Collobert

    Abstract: We introduce SMALLTALK LM, an innovative method for training a mixture of language models in an almost asynchronous manner. Each model of the mixture specializes in distinct parts of the data distribution, without the need for high-bandwidth communication between the nodes training each model. At inference, a lightweight router directs a given sequence to a single expert, according to a short pref… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: 23 pages

  3. arXiv:2402.01093  [pdf, other

    cs.LG cs.CL

    Need a Small Specialized Language Model? Plan Early!

    Authors: David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun

    Abstract: Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference, but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get good specialized small language models using a large, generic, pretraining set and a limited amount of speciali… ▽ More

    Submitted 31 October, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  4. arXiv:2311.00613  [pdf, other

    cs.SD cs.LG eess.AS

    Controllable Music Production with Diffusion Models and Guidance Gradients

    Authors: Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson

    Abstract: We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic ch… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  5. arXiv:2301.07836  [pdf, other

    cs.CV cs.AI

    Masked Autoencoding Does Not Help Natural Language Supervision at Scale

    Authors: Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

    Abstract: Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks. Recent works such as M3AE and SLIP have suggested that these approaches can be effectively combined, but most notably their results use small pre-training datasets (<50M samples) and don't effectively reflect the large-scale regim… ▽ More

    Submitted 15 May, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023

  6. arXiv:2103.10429  [pdf, other

    cs.CV

    Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

    Authors: Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, Sanja Fidler

    Abstract: Impressive progress in 3D shape extraction led to representations that can capture object geometries with high fidelity. In parallel, primitive-based methods seek to represent objects as semantically consistent part arrangements. However, due to the simplicity of existing primitive representations, these methods fail to accurately reconstruct 3D shapes using a small number of primitives/parts. We… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: To appear in CVPR 2021

  7. arXiv:2007.04825  [pdf, other

    cs.LG stat.ML

    Fast Transformers with Clustered Attention

    Authors: Apoorv Vyas, Angelos Katharopoulos, François Fleuret

    Abstract: Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, grou… ▽ More

    Submitted 29 September, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

  8. arXiv:2006.16236  [pdf, other

    cs.LG stat.ML

    Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

    Authors: Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret

    Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from… ▽ More

    Submitted 31 August, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML 2020, project at https://linear-transformers.com/

  9. arXiv:1905.03711  [pdf, other

    cs.CV cs.LG stat.ML

    Processing Megapixel Images with Deep Attention-Sampling Models

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Existing deep architectures cannot operate on very large signals such as megapixel images due to computational and memory constraints. To tackle this limitation, we propose a fully differentiable end-to-end trainable model that samples and processes only a fraction of the full resolution input image. The locations to process are sampled from an attention distribution computed from a low resolution… ▽ More

    Submitted 17 July, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: Presented in ICML 2019. Code is available at https://github.com/idiap/attention-sampling

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3282-3291, 2019

  10. arXiv:1803.00942  [pdf, other

    cs.LG

    Not All Samples Are Created Equal: Deep Learning with Importance Sampling

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to… ▽ More

    Submitted 28 October, 2019; v1 submitted 2 March, 2018; originally announced March 2018.

    Comments: Accepted at ICML 2018 (short oral)

  11. arXiv:1706.08580  [pdf, other

    cs.LG stat.ML

    Learning Local Feature Aggregation Functions with Backpropagation

    Authors: Angelos Katharopoulos, Despoina Paschalidou, Christos Diou, Anastasios Delopoulos

    Abstract: This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of… ▽ More

    Submitted 26 June, 2017; originally announced June 2017.

    Comments: In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017)

  12. arXiv:1706.00043  [pdf, other

    cs.LG

    Biased Importance Sampling for Deep Neural Network Training

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems. However, the lack of an efficient way to calculate the importance still hinders its application to Deep Learning. In this paper, we show that the loss value can be used as an alternative importance metric, and propose a way to efficiently approximate it for a deep model, using a small mo… ▽ More

    Submitted 13 September, 2017; v1 submitted 31 May, 2017; originally announced June 2017.