Skip to main content

Showing 1–13 of 13 results for author: Hazimeh, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.12822  [pdf, other

    cs.LG stat.ML

    An Optimization Framework for Differentially Private Sparse Fine-Tuning

    Authors: Mehdi Makni, Kayhan Behdin, Gabriel Afriat, Zheng Xu, Sergei Vassilvitskii, Natalia Ponomareva, Hussein Hazimeh, Rahul Mazumder

    Abstract: Differentially private stochastic gradient descent (DP-SGD) is broadly considered to be the gold standard for training and fine-tuning neural networks under differential privacy (DP). With the increasing availability of high-quality pre-trained model checkpoints (e.g., vision and language models), fine-tuning has become a popular strategy. However, despite recent progress in understanding and appl… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  2. arXiv:2402.11120  [pdf, other

    cs.LG cs.CV stat.ML

    DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

    Authors: Yunjuan Wang, Hussein Hazimeh, Natalia Ponomareva, Alexey Kurakin, Ibrahim Hammoud, Raman Arora

    Abstract: Distribution shifts and adversarial examples are two major challenges for deploying machine learning models. While these challenges have been studied individually, their combination is an important topic that remains relatively under-explored. In this work, we study the problem of adversarial robustness under a common setting of distribution shift - unsupervised domain adaptation (UDA). Specifical… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2402.04177  [pdf, other

    cs.CL cs.LG stat.ML

    Scaling Laws for Downstream Task Performance in Machine Translation

    Authors: Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

    Abstract: Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we… ▽ More

    Submitted 20 February, 2025; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Published at the International Conference on Learning Representations (ICLR) 2025. Previous title: "Scaling Laws for Downstream Task Performance of Large Language Models"

  4. arXiv:2303.00654  [pdf, other

    cs.LG cs.CR stat.ML

    How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

    Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, Abhradeep Thakurta

    Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP t… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 77 (2023) 1113-1201

  5. arXiv:2205.09717  [pdf, other

    cs.LG stat.ML

    Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles

    Authors: Shibal Ibrahim, Hussein Hazimeh, Rahul Mazumder

    Abstract: Decision tree ensembles are widely used and competitive learning models. Despite their success, popular toolkits for learning tree ensembles have limited modeling capabilities. For instance, these toolkits support a limited number of loss functions and are restricted to single task learning. We propose a flexible framework for learning tree ensembles, which goes beyond existing toolkits to support… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Accepted at SIGKDD'2022

  6. arXiv:2202.04820  [pdf, ps, other

    cs.LG cs.MS stat.CO stat.ML

    L0Learn: A Scalable Package for Sparse Learning using L0 Regularization

    Authors: Hussein Hazimeh, Rahul Mazumder, Tim Nonet

    Abstract: We present L0Learn: an open-source package for sparse linear regression and classification using $\ell_0$ regularization. L0Learn implements scalable, approximate algorithms, based on coordinate descent and local combinatorial optimization. The package is built using C++ and has user-friendly R and Python interfaces. L0Learn can address problems with millions of features, achieving competitive run… ▽ More

    Submitted 9 June, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: Accepted to JMLR (MLOSS)

  7. arXiv:2106.03760  [pdf, other

    cs.LG math.OC stat.ML

    DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

    Authors: Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi

    Abstract: The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness ca… ▽ More

    Submitted 31 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Appeared in NeurIPS 2021

  8. arXiv:2104.07084  [pdf, other

    stat.ME cs.LG math.OC stat.CO stat.ML

    Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

    Authors: Hussein Hazimeh, Rahul Mazumder, Peter Radchenko

    Abstract: We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. While there exist several appealing approaches based on convex relaxations and nonconvex heuristics, we focus on optimal solutions for the $\ell_0$-regularized formulation, a problem that is relatively unexplored due to computational challenges. Our methodology covers both hig… ▽ More

    Submitted 17 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

  9. arXiv:2004.06152  [pdf, other

    stat.CO cs.LG math.OC stat.ML

    Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

    Authors: Hussein Hazimeh, Rahul Mazumder, Ali Saab

    Abstract: We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and squared $\ell_{2}$ penalty functions (a.k.a. $\ell_0 \ell_2$ regularization). Recent work shows that the resulting estimators are of key importance in many high-dimensional statistical settings. However, exact computation of these estimators remains a major challenge. Indeed, modern exact methods,… ▽ More

    Submitted 14 April, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

  10. arXiv:2002.07772  [pdf, other

    cs.LG cs.CV stat.ML

    The Tree Ensemble Layer: Differentiability meets Conditional Computation

    Authors: Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder

    Abstract: Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, they are typically slow in trainin… ▽ More

    Submitted 10 July, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  11. arXiv:2001.06471  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

    Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder

    Abstract: We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, M… ▽ More

    Submitted 6 June, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

    Comments: To appear in JMLR

  12. arXiv:1902.01542  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

    Authors: Hussein Hazimeh, Rahul Mazumder

    Abstract: In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization based algorithms face difficulties in handling problems where th… ▽ More

    Submitted 13 July, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: AISTATS 2020

  13. arXiv:1803.01454  [pdf, other

    stat.CO math.OC stat.ML

    Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

    Authors: Hussein Hazimeh, Rahul Mazumder

    Abstract: The $L_0$-regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of… ▽ More

    Submitted 24 January, 2020; v1 submitted 4 March, 2018; originally announced March 2018.

    Comments: To appear in Operations Research