Skip to main content

Showing 1–6 of 6 results for author: Isik, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.04177  [pdf, other

    cs.CL cs.LG stat.ML

    Scaling Laws for Downstream Task Performance in Machine Translation

    Authors: Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

    Abstract: Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we… ▽ More

    Submitted 20 February, 2025; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Published at the International Conference on Learning Representations (ICLR) 2025. Previous title: "Scaling Laws for Downstream Task Performance of Large Language Models"

  2. arXiv:2306.12625  [pdf, other

    cs.LG cs.DC stat.ML

    Adaptive Compression in Federated Learning via Side Information

    Authors: Berivan Isik, Francesco Pase, Deniz Gunduz, Sanmi Koyejo, Tsachy Weissman, Michele Zorzi

    Abstract: The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{φ^{(n)}}$, and the server estimat… ▽ More

    Submitted 21 April, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Published at the International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

  3. arXiv:2306.04924  [pdf, other

    cs.LG cs.CR cs.DC cs.IT stat.ML

    Exact Optimality of Communication-Privacy-Utility Tradeoffs in Distributed Mean Estimation

    Authors: Berivan Isik, Wei-Ning Chen, Ayfer Ozgur, Tsachy Weissman, Albert No

    Abstract: We study the mean estimation problem under communication and local differential privacy constraints. While previous work has proposed \emph{order}-optimal algorithms for the same problem (i.e., asymptotically optimal as we spend more bits), \emph{exact} optimality (in the non-asymptotic setting) still has not been achieved. In this work, we take a step towards characterizing the \emph{exact}-optim… ▽ More

    Submitted 28 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Published at the Conference on Neural Information Processing Systems (NeurIPS), 2023

  4. arXiv:2209.15328  [pdf, other

    cs.LG stat.AP stat.ML

    Sparse Random Networks for Communication-Efficient Federated Learning

    Authors: Berivan Isik, Francesco Pase, Deniz Gunduz, Tsachy Weissman, Michele Zorzi

    Abstract: One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial \em… ▽ More

    Submitted 8 February, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Published at the International Conference on Learning Representations (ICLR) 2023

  5. arXiv:2102.08329  [pdf, other

    cs.LG cs.IT eess.SP stat.ML

    An Information-Theoretic Justification for Model Pruning

    Authors: Berivan Isik, Tsachy Weissman, Albert No

    Abstract: We study the neural network (NN) compression problem, viewing the tension between the compression ratio and NN performance through the lens of rate-distortion theory. We choose a distortion metric that reflects the effect of NN compression on the model output and derive the tradeoff between rate (compression) and distortion. In addition to characterizing theoretical limits of NN compression, this… ▽ More

    Submitted 9 February, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Published in the International Conference on Artificial Intelligence and Statistics (AISTATS) 2022. Previous titles: 1) Rate-Distortion Theoretic Model Compression: Successive Refinement for Pruning, 2) Successive pruning for model compression via rate distortion theory

  6. arXiv:2005.10761  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    rTop-k: A Statistical Estimation Approach to Distributed SGD

    Authors: Leighton Pate Barnes, Huseyin A. Inan, Berivan Isik, Ayfer Ozgur

    Abstract: The large communication cost for exchanging gradients between different nodes significantly limits the scalability of distributed training for large-scale learning models. Motivated by this observation, there has been significant recent interest in techniques that reduce the communication cost of distributed Stochastic Gradient Descent (SGD), with gradient sparsification techniques such as top-k a… ▽ More

    Submitted 2 December, 2020; v1 submitted 21 May, 2020; originally announced May 2020.