Skip to main content

Showing 1–6 of 6 results for author: Canini, M

Searching in archive math. Search in all archives.
.
  1. arXiv:2411.12736  [pdf, other

    cs.CL cs.AI cs.LG eess.SY math.OC

    ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

    Authors: Salma Kharrat, Fares Fourati, Marco Canini

    Abstract: The effectiveness of Large Language Models (LLMs) in solving tasks vastly depends on the quality of the instructions, which often require fine-tuning through extensive human effort. This highlights the need for automated instruction optimization; however, this optimization is particularly challenging when dealing with black-box LLMs, where model parameters and gradients remain inaccessible. We pro… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  2. arXiv:2406.06520  [pdf, other

    cs.LG cs.AI cs.CV cs.MA math.OC

    Decentralized Personalized Federated Learning

    Authors: Salma Kharrat, Marco Canini, Samuel Horvath

    Abstract: This work tackles the challenges of data heterogeneity and communication limitations in decentralized federated learning. We focus on creating a collaboration graph that guides each client in selecting suitable collaborators for training personalized models that leverage their local data effectively. Our approach addresses these issues through a novel, communication-efficient strategy that enhance… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  3. arXiv:2108.00951  [pdf, other

    cs.LG cs.DC math.OC

    Rethinking gradient sparsification as total error minimization

    Authors: Atal Narayan Sahu, Aritra Dutta, Ahmed M. Abdelmoniem, Trambak Banerjee, Marco Canini, Panos Kalnis

    Abstract: Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-$k$ sparsification, sometimes with $k$ as little as $0.1\%$ of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization pers… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 33 pages, 31 figures

  4. arXiv:1911.08250  [pdf, other

    cs.DC cs.LG math.OC

    On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

    Authors: Aritra Dutta, El Houcine Bergou, Ahmed M. Abdelmoniem, Chen-Yu Ho, Atal Narayan Sahu, Marco Canini, Panos Kalnis

    Abstract: Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model,… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: To Appear In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

    Journal ref: In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

  5. arXiv:1905.11692  [pdf, other

    math.OC cs.LG math.NA

    Direct Nonlinear Acceleration

    Authors: Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik

    Abstract: Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al., were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    MSC Class: 65D15; 65B05; 46N40; 65F99; 68W99

  6. arXiv:1905.10988  [pdf, other

    cs.LG math.OC stat.ML

    Natural Compression for Distributed Deep Learning

    Authors: Samuel Horvath, Chen-Yu Ho, Ludovit Horvath, Atal Narayan Sahu, Marco Canini, Peter Richtarik

    Abstract: Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practical… ▽ More

    Submitted 5 September, 2022; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Proceedings of 3${}^{\text{rd}}$ Annual Conference on Mathematical and Scientific Machine Learning (MSML 2022)