Skip to main content

Showing 1–15 of 15 results for author: Sani, L

.
  1. arXiv:2505.22549  [pdf, other

    cs.LG

    DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models

    Authors: Alex Iacob, Lorenzo Sani, Mher Safaryan, Paris Giampouras, Samuel Horváth, Andrej Jovanovic, Meghdad Kurmanji, Preslav Aleksandrov, William F. Shen, Xinchi Qiu, Nicholas D. Lane

    Abstract: Scaling foundation model training with Distributed Data Parallel (DDP) methods is bandwidth-limited. Existing infrequent communication methods like Local SGD were designed to synchronize only model parameters and cannot be trivially applied to adaptive optimizers due to additional optimizer states. Current approaches extending Local SGD either lack convergence guarantees or require synchronizing a… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Keywords: Distributed Training, Foundation Models, Large Language Models, Optimizers, Communication Efficiency, Federated Learning, Distributed Systems, Optimization Theory, Scaling, Robustness. Preprint, under review at NeurIPS

  2. arXiv:2504.05153  [pdf, other

    cs.LG

    SparsyFed: Sparse Adaptive Federated Training

    Authors: Adriano Guastella, Lorenzo Sani, Alex Iacob, Alessio Mora, Paolo Bellavista, Nicholas D. Lane

    Abstract: Sparse training is often adopted in cross-device federated learning (FL) environments where constrained devices collaboratively train a machine learning model on private data by exchanging pseudo-gradients across heterogeneous networks. Although sparse training methods can reduce communication overhead and computational burden in FL, they are often not used in practice for the following key reason… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Published as a conference paper at ICLR 2025

  3. arXiv:2503.00096  [pdf, other

    q-bio.QM cs.AI

    BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology

    Authors: Ludovico Mitchener, Jon M Laurent, Benjamin Tenmann, Siddharth Narayanan, Geemi P Wellawatte, Andrew White, Lorenzo Sani, Samuel G Rodriques

    Abstract: Large Language Models (LLMs) and LLM-based agents show great promise in accelerating scientific research. Existing benchmarks for measuring this potential and guiding future development continue to evolve from pure recall and rote knowledge tasks, towards more practical work such as literature review and experimental planning. Bioinformatics is a domain where fully autonomous AI-driven discovery m… ▽ More

    Submitted 7 March, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

    Comments: 8 main text pages, 5 main figures

  4. Low-Eddington ratio, changing-look active galactic nuclei: the case of NGC 4614

    Authors: Elisabeta Lusso, Lapo Casetti, Marco Romoli, Lara Fossi, Emanuele Nardini, Emanuele Arra, Benedetta Barsi, Clarissa Calamai, Francesca Campani, Riccardo Capogrosso, Francesco Chiti Tegli, Riccardo Ciantini, Eirini Demertzi, Marina A. Gaitani, Asia Giudice, Alessia Gori, Lorenzo Graziani, Laura Macchiarini, Marianna Michelagnoli, Chiara Niccolai, Irene Parenti, Simone Pistolesi, Martina Rago, Ofelia Romani, Leonardo Sani , et al. (5 additional authors not shown)

    Abstract: Active galactic nuclei (AGN) are known to be variable sources across the entire electromagnetic spectrum, in particular at optical/ultraviolet and X-ray energies. Over the past decades, a growing number of AGN have displayed type transitions: from type 1 to type 2 or viceversa within a few years or even several months. These galaxies have been commonly referred to as changing-look AGN (CLAGN). Her… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 8 pages, 5 figures, accepted for publication in A&A

    Journal ref: A&A 695, A269 (2025)

  5. arXiv:2502.07218  [pdf, other

    cs.LG cs.AI

    LUNAR: LLM Unlearning via Neural Activation Redirection

    Authors: William F. Shen, Xinchi Qiu, Meghdad Kurmanji, Alex Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane

    Abstract: Large Language Models (LLMs) benefit from training on ever larger amounts of textual data, but as a result, they increasingly incur the risk of leaking private information. The ability to selectively remove knowledge from LLMs is, therefore, a highly desirable capability. In this paper, we propose LUNAR, a novel unlearning methodology grounded in the Linear Representation Hypothesis. LUNAR operate… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  6. arXiv:2411.02908  [pdf, other

    cs.LG cs.DC

    Photon: Federated LLM Pre-Training

    Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Dongqi Cai, Zexi Li, Wanru Zhao, Xinchi Qiu, Nicholas D. Lane

    Abstract: Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like federated learning (FL) could enable collaborative training of larger models across weakly-connected GPUs if they can effectively be used for pre-training. To achieve this, we… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: 13 pages, 9 appendix pages, 10 figures, 3 algorithms, 8 tables

  7. arXiv:2410.05021  [pdf, other

    cs.LG cs.CL

    DEPT: Decoupled Embeddings for Pre-training Language Models

    Authors: Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, William F. Shen, Xinchi Qiu, Dongqi Cai, Yan Gao, Nicholas D. Lane

    Abstract: Language Model pre-training uses broad data mixtures to enhance performance across domains and languages. However, training on such heterogeneous text corpora requires extensive and expensive efforts. Since these data sources vary significantly in lexical, syntactic, and semantic aspects, they cause negative interference or the ``curse of multilinguality''. To address these challenges we propose a… ▽ More

    Submitted 7 April, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Published as a conference paper at ICLR 2025

  8. arXiv:2405.20882  [pdf, other

    cs.LG

    Sheaf HyperNetworks for Personalized Federated Learning

    Authors: Bao Nguyen, Lorenzo Sani, Xinchi Qiu, Pietro Liò, Nicholas D. Lane

    Abstract: Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heteroph… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures, 7 tables, pre-print under review

  9. arXiv:2405.14446  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Worldwide Federated Training of Language Models

    Authors: Alex Iacob, Lorenzo Sani, Bill Marino, Preslav Aleksandrov, William F. Shen, Nicholas Donald Lane

    Abstract: The reliance of language model training on massive amounts of computation and vast datasets scraped from potentially low-quality, copyrighted, or sensitive data has come into question practically, legally, and ethically. Federated learning provides a plausible alternative by enabling previously untapped data to be voluntarily gathered from collaborating organizations. However, when scaled globally… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 19 pages, 8 figures, Under Review

    ACM Class: I.2.7

  10. arXiv:2405.10853  [pdf, other

    cs.LG cs.AI cs.DC

    The Future of Large Language Model Pre-training is Federated

    Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane

    Abstract: Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources they can leverage for pre-training. Federated learning (FL) has the potential to u… ▽ More

    Submitted 14 October, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 24 pages, 15 figures, pre-print

  11. arXiv:2402.10191  [pdf, other

    cs.LG

    FedAnchor: Enhancing Federated Semi-Supervised Learning with Label Contrastive Loss for Unlabeled Clients

    Authors: Xinchi Qiu, Yan Gao, Lorenzo Sani, Heng Pan, Wanru Zhao, Pedro P. B. Gusmao, Mina Alibeigi, Alex Iacob, Nicholas D. Lane

    Abstract: Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices while keeping data localized. The deployment of FL in numerous real-world applications faces delays, primarily due to the prevalent reliance on supervised tasks. Generating detailed labels at edge devices, if feasible, is demanding, given resource constraints a… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  12. arXiv:2306.17453  [pdf, other

    cs.DC

    Pollen: High-throughput Federated Learning Simulation via Resource-Aware Client Placement

    Authors: Lorenzo Sani, Pedro Porto Buarque de Gusmão, Alex Iacob, Wanru Zhao, Xinchi Qiu, Yan Gao, Javier Fernandez-Marques, Nicholas Donald Lane

    Abstract: Federated Learning (FL) is a privacy-focused machine learning paradigm that collaboratively trains models directly on edge devices. Simulation plays an essential role in FL adoption, helping develop novel aggregation and client sampling strategies. However, current simulators cannot emulate large-scale systems in a time-efficient manner, which limits their utility and casts doubts on generalizabil… ▽ More

    Submitted 20 May, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: 22 pages, 22 figures, 9 tables, under review

  13. arXiv:2007.14390  [pdf, other

    cs.LG cs.CV stat.ML

    Flower: A Friendly Federated Learning Research Framework

    Authors: Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, Nicholas D. Lane

    Abstract: Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement realistically, both in terms of scale and systems heterogeneity. Although there are… ▽ More

    Submitted 5 March, 2022; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: Open-Source, mobile-friendly Federated Learning framework

  14. arXiv:1908.06896  [pdf, other

    cs.CV

    Genetic Algorithms for the Optimization of Diffusion Parameters in Content-Based Image Retrieval

    Authors: Federico Magliani, Laura Sani, Stefano Cagnoni, Andrea Prati

    Abstract: Several computer vision and artificial intelligence projects are nowadays exploiting the manifold data distribution using, e.g., the diffusion process. This approach has produced dramatic improvements on the final performance thanks to the application of such algorithms to the kNN graph. Unfortunately, this recent technique needs a manual configuration of several parameters, thus it is not straigh… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

  15. arXiv:1611.06904  [pdf, other

    cs.NI

    Isolario: a Do-ut-des Approach to Improve the Appeal of BGP Route Collecting

    Authors: Enrico Gregori, Alessandro Improta, Luca Sani

    Abstract: The incompleteness of data collected from BGP route collecting projects is a well-known issue which potentially affects every research activity carried out on the analysis of the Internet inter-domain routing. Recent works explained that one of the possible solutions is to increase the number of ASes feeding these projects from the Internet periphery, in order to reveal the hidden portion of peeri… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: Technical report