Skip to main content

Showing 1–50 of 76 results for author: Razaviyayn, M

.
  1. arXiv:2505.23735  [pdf, ps, other

    cs.CL cs.AI

    ATLAS: Learning to Optimally Memorize the Context at Test Time

    Authors: Ali Behrouz, Zeman Li, Praneeth Kacham, Majid Daliri, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni

    Abstract: Transformers have been established as the most popular backbones in sequence modeling, mainly due to their effectiveness in in-context retrieval tasks and the ability to learn at scale. Their quadratic memory and time complexity, however, bound their applicability in longer sequences and so has motivated researchers to explore effective alternative architectures such as modern recurrent neural net… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  2. arXiv:2504.13173  [pdf, other

    cs.LG cs.AI

    It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

    Authors: Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, Vahab Mirrokni

    Abstract: Designing efficient and effective architectural backbones has been in the core of research efforts to enhance the capability of foundation models. Inspired by the human cognitive phenomenon of attentional bias-the natural tendency to prioritize certain events or stimuli-we reconceptualize neural architectures, including Transformers, Titans, and modern linear recurrent neural networks as associati… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  3. arXiv:2502.17607  [pdf, other

    cs.LG cs.CL

    Synthetic Text Generation for Training Large Language Models via Gradient Matching

    Authors: Dang Nguyen, Zeman Li, Mohammadhossein Bateni, Vahab Mirrokni, Meisam Razaviyayn, Baharan Mirzasoleiman

    Abstract: Synthetic data has the potential to improve the performance, training efficiency, and privacy of real training examples. Nevertheless, existing approaches for synthetic text generation are mostly heuristics and cannot generate human-readable text without compromising the privacy of real data or provide performance guarantees for training Large Language Models (LLMs). In this work, we propose the f… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 15 pages, 5 figures, 4 tables

  4. arXiv:2502.06244  [pdf, other

    cs.LG

    PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts

    Authors: Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni

    Abstract: Modern machine learning models are trained on diverse datasets and tasks to improve generalization. A key challenge in multitask learning is determining the optimal data mixing and sampling strategy across different data sources. Prior research in this multi-task learning setting has primarily focused on mitigating gradient conflicts between tasks. However, we observe that many real-world multitas… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  5. arXiv:2501.19287  [pdf, other

    cs.LG

    Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs

    Authors: James Flemings, Haosheng Gan, Hongyi Li, Meisam Razaviyayn, Murali Annavaram

    Abstract: In-context learning (ICL) has shown promising improvement in downstream task adaptation of LLMs by augmenting prompts with relevant input-output examples (demonstrations). However, the ICL demonstrations can contain privacy-sensitive information, which can be leaked and/or regurgitated by the LLM output. Differential Privacy (DP), a widely adopted privacy safeguard, has emerged to mitigate this pr… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  6. arXiv:2412.18164  [pdf, ps, other

    cs.LG math.OC

    Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence

    Authors: Yinbin Han, Meisam Razaviyayn, Renyuan Xu

    Abstract: Diffusion models have emerged as powerful tools for generative modeling, demonstrating exceptional capability in capturing target data distributions from large datasets. However, fine-tuning these massive models for specific downstream tasks, constraints, and human preferences remains a critical challenge. While recent advances have leveraged reinforcement learning algorithms to tackle this proble… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 28 pages

  7. arXiv:2411.07889  [pdf, other

    cs.LG

    A Stochastic Optimization Framework for Private and Fair Learning From Decentralized Data

    Authors: Devansh Gupta, A. S. Poornash, Andrew Lowy, Meisam Razaviyayn

    Abstract: Machine learning models are often trained on sensitive data (e.g., medical records and race/gender) that is distributed across different "silos" (e.g., hospitals). These federated learning models may then be used to make consequential decisions, such as allocating healthcare resources. Two key challenges emerge in this setting: (i) maintaining the privacy of each person's data, even if other silos… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  8. arXiv:2410.06441  [pdf, other

    cs.LG cs.CL

    Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

    Authors: Zeman Li, Xinwei Zhang, Peilin Zhong, Yuan Deng, Meisam Razaviyayn, Vahab Mirrokni

    Abstract: Fine-tuning language models (LMs) with the Adam optimizer often demands excessive memory, limiting accessibility. The "in-place" version of Stochastic Gradient Descent (IP-SGD) and Memory-Efficient Zeroth-order Optimizer (MeZO) have been proposed to address this. However, IP-SGD still requires substantial memory, and MeZO suffers from slow convergence and degraded final performance due to its zero… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  9. arXiv:2410.03883  [pdf, other

    cs.LG cs.CR stat.ML

    DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction

    Authors: Xinwei Zhang, Zhiqi Bu, Borja Balle, Mingyi Hong, Meisam Razaviyayn, Vahab Mirrokni

    Abstract: Differential privacy (DP) offers a robust framework for safeguarding individual data privacy. To utilize DP in training modern machine learning models, differentially private optimizers have been widely used in recent years. A popular approach to privatize an optimizer is to clip the individual gradients and add sufficiently large noise to the clipped gradient. This approach led to the development… ▽ More

    Submitted 29 April, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  10. arXiv:2410.02016  [pdf, other

    cs.LG cs.CR

    Adaptively Private Next-Token Prediction of Large Language Models

    Authors: James Flemings, Meisam Razaviyayn, Murali Annavaram

    Abstract: As Large Language Models (LLMs) proliferate, developing privacy safeguards for these models is crucial. One popular safeguard involves training LLMs in a differentially private manner. However, such solutions are shown to be computationally expensive and detrimental to the utility of these models. Since LLMs are deployed on the cloud and thus only accessible via an API, a Machine Learning as a Ser… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  11. arXiv:2408.13460  [pdf, other

    cs.LG cs.CR stat.ML

    DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction

    Authors: Xinwei Zhang, Zhiqi Bu, Mingyi Hong, Meisam Razaviyayn

    Abstract: Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. Howev… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  12. arXiv:2403.15638  [pdf, other

    cs.CR cs.CL cs.LG

    Differentially Private Next-Token Prediction of Large Language Models

    Authors: James Flemings, Meisam Razaviyayn, Murali Annavaram

    Abstract: Ensuring the privacy of Large Language Models (LLMs) is becoming increasingly important. The most widely adopted technique to accomplish this is DP-SGD, which trains a model to guarantee Differential Privacy (DP). However, DP-SGD overestimates an adversary's capabilities in having white box access to the model and, as a result, causes longer training times and larger memory usage than SGD. On the… ▽ More

    Submitted 26 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  13. arXiv:2401.15604  [pdf, ps, other

    cs.LG stat.ML

    Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

    Authors: Yinbin Han, Meisam Razaviyayn, Renyuan Xu

    Abstract: Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness. A key component of these models is to learn the score function through score matching. Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy. As a first… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: 39 pages

  14. arXiv:2312.03259  [pdf, other

    cs.LG

    f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization

    Authors: Sina Baharlouei, Shivam Patel, Meisam Razaviyayn

    Abstract: Training and deploying machine learning models that meet fairness criteria for protected groups are fundamental in modern artificial intelligence. While numerous constraints and regularization terms have been proposed in the literature to promote fairness in machine learning tasks, most of these methods are not amenable to stochastic optimization due to the complex and nonlinear structure of const… ▽ More

    Submitted 7 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 24 Pages,5 figures

    Journal ref: ICLR 2024

  15. arXiv:2312.02341  [pdf, other

    math.OC cs.GT

    Incentive Systems for Fleets of New Mobility Services

    Authors: Ali Ghafelebashi, Meisam Razaviyayn, Maged Dessouky

    Abstract: Traffic congestion has become an inevitable challenge in large cities due to population increases and expansion of urban areas. Various approaches are introduced to mitigate traffic issues, encompassing from expanding the road infrastructure to employing demand management. Congestion pricing and incentive schemes are extensively studied for traffic control in traditional networks where each driver… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 23 pages, 12 figures. arXiv admin note: text overlap with arXiv:2204.07306

  16. arXiv:2309.11682  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework

    Authors: Sina Baharlouei, Meisam Razaviyayn

    Abstract: While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 22 pages, 3 figures

  17. arXiv:2306.15056  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Optimal Differentially Private Model Training with Public Data

    Authors: Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn

    Abstract: Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set whi… ▽ More

    Submitted 9 September, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: ICML 2024

  18. arXiv:2306.13753  [pdf, ps, other

    cs.LG

    Four Axiomatic Characterizations of the Integrated Gradients Attribution Method

    Authors: Daniel Lundstrom, Meisam Razaviyayn

    Abstract: Deep neural networks have produced significant progress among machine learning models in terms of accuracy and functionality, but their inner workings are still largely unknown. Attribution methods seek to shine a light on these "black box" models by indicating how much each input contributed to a model's outputs. The Integrated Gradients (IG) method is a state of the art baseline attribution meth… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  19. arXiv:2305.03100  [pdf, ps, other

    cs.LG cs.GT

    Distributing Synergy Functions: Unifying Game-Theoretic Interaction Methods for Machine-Learning Explainability

    Authors: Daniel Lundstrom, Meisam Razaviyayn

    Abstract: Deep learning has revolutionized many areas of machine learning, from computer vision to natural language processing, but these high-performance models are generally "black box." Explaining such models would improve transparency and trust in AI-powered decision making and is necessary for understanding other practical needs such as robustness and fairness. A popular means of enhancing model transp… ▽ More

    Submitted 17 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

  20. arXiv:2303.08431  [pdf, other

    cs.LG math.OC stat.ML

    Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

    Authors: Yinbin Han, Meisam Razaviyayn, Renyuan Xu

    Abstract: Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is gove… ▽ More

    Submitted 9 April, 2025; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 34 pages

  21. arXiv:2210.14410  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes

    Authors: Sina Baharlouei, Fatemeh Sheikholeslami, Meisam Razaviyayn, Zico Kolter

    Abstract: This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the "abstain" class. In this work, we show that such a provable framework can benefit by extension to networks with multiple e… ▽ More

    Submitted 10 May, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: 20 pages, 6 figures

    Journal ref: International Conference on Artificial Intelligence and Statistics, PMLR 2023

  22. arXiv:2210.08781  [pdf, other

    cs.LG cs.CR

    Stochastic Differentially Private and Fair Learning

    Authors: Andrew Lowy, Devansh Gupta, Meisam Razaviyayn

    Abstract: Machine learning models are increasingly used in high-stakes decision-making systems. In such applications, a major concern is that these models sometimes discriminate against certain demographic groups such as individuals with certain race, gender, or age. Another major concern in these applications is the violation of the privacy of users. While fair learning algorithms have been developed to mi… ▽ More

    Submitted 3 June, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: ICLR 2023

  23. arXiv:2209.11920  [pdf, other

    math.OC cs.LG eess.SY math.DS

    Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms

    Authors: Hesameddin Mohammadi, Meisam Razaviyayn, Mihailo R. Jovanović

    Abstract: We study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to account for uncertainty in either gradient evaluation or iteration updates, and it includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic prob… ▽ More

    Submitted 19 June, 2024; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: 23 pages; 7 figures

  24. arXiv:2209.07403  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter

    Authors: Andrew Lowy, Meisam Razaviyayn

    Abstract: We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data may be extremely large or infinite. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous (i.e. stochastic gradients are uniformly bounded) over data. While this assumption is convenient, it often leads to pessimistic… ▽ More

    Submitted 27 September, 2024; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: To appear in Journal of Privacy and Confidentiality. A preliminary version appeared at International Conference on Algorithmic Learning Theory (ALT) 2023

  25. arXiv:2204.07306  [pdf, other

    math.OC

    Congestion Reduction via Personalized Incentives

    Authors: Ali Ghafelebashi, Meisam Razaviyayn, Maged Dessouky

    Abstract: With rapid population growth and urban development, traffic congestion has become an inescapable issue, especially in large cities. Many congestion reduction strategies have been proposed in the past, ranging from roadway extension to transportation demand management. In particular, congestion pricing schemes have been used as negative reinforcements for traffic control. In this project, we study… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: 24 pages, 8 figures

  26. arXiv:2203.06735  [pdf, other

    cs.LG cs.CR math.OC

    Private Non-Convex Federated Learning Without a Trusted Server

    Authors: Andrew Lowy, Ali Ghafelebashi, Meisam Razaviyayn

    Abstract: We study federated learning (FL) -- especially cross-silo FL -- with non-convex loss functions and data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) must protect the privacy of each person's data (e.g. patient's medical record), even if the server or other silos act as adversarial eavesdroppers. To that end, we consider inter-silo record-level… ▽ More

    Submitted 25 June, 2023; v1 submitted 13 March, 2022; originally announced March 2022.

    Comments: AISTATS 2023

  27. arXiv:2202.11912  [pdf, other

    cs.LG

    A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

    Authors: Daniel Lundstrom, Tianjian Huang, Meisam Razaviyayn

    Abstract: As deep learning (DL) efficacy grows, concerns for poor model explainability grow also. Attribution methods address the issue of explainability by quantifying the importance of an input feature for a model prediction. Among various methods, Integrated Gradients (IG) sets itself apart by claiming other methods failed to satisfy desirable axioms, while IG and methods like it uniquely satisfy said ax… ▽ More

    Submitted 29 June, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:1703.01365 by other authors

  28. arXiv:2110.11205  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Robustness through Data Augmentation Loss Consistency

    Authors: Tianjian Huang, Shaunak Halbe, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami

    Abstract: While deep learning through empirical risk minimization (ERM) has succeeded at achieving human-level performance at a variety of complex tasks, ERM is not robust to distribution shifts or adversarial attacks. Synthetic data augmentation followed by empirical risk minimization (DA-ERM) is a simple and widely used solution to improve robustness in ERM. In addition, consistency regularization can be… ▽ More

    Submitted 24 January, 2023; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: 40 pages

  29. arXiv:2110.03950  [pdf, ps, other

    math.OC cs.GT cs.LG

    Nonconvex-Nonconcave Min-Max Optimization with a Small Maximization Domain

    Authors: Dmitrii M. Ostrovskii, Babak Barazandeh, Meisam Razaviyayn

    Abstract: We study the problem of finding approximate first-order stationary points in optimization problems of the form $\min_{x \in X} \max_{y \in Y} f(x,y)$, where the sets $X,Y$ are convex and $Y$ is compact. The objective function $f$ is smooth, but assumed neither convex in $x$ nor concave in $y$. Our approach relies upon replacing the function $f(x,\cdot)$ with its $k$th order Taylor approximation (i… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: 50 pages

  30. arXiv:2109.00644  [pdf, other

    cs.LG cs.MS

    RIFLE: Imputation and Robust Inference from Low Order Marginals

    Authors: Sina Baharlouei, Kelechi Ogudu, Sze-chuan Suen, Meisam Razaviyayn

    Abstract: The ubiquity of missing values in real-world datasets poses a challenge for statistical inference and can prevent similar datasets from being analyzed in the same study, precluding many existing datasets from being used for new analyses. While an extensive collection of packages and algorithms have been developed for data imputation, the overwhelming majority perform poorly if there are many missi… ▽ More

    Submitted 12 September, 2023; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: 36 pages, 11 figures

    Journal ref: Transaction on Machine Learning Research (TMLR), 09/2023

  31. arXiv:2106.09779  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses

    Authors: Andrew Lowy, Meisam Razaviyayn

    Abstract: This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the… ▽ More

    Submitted 24 November, 2024; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: ICLR 2023

  32. arXiv:2105.05953  [pdf, other

    stat.ML cs.LG math.OC stat.ME

    Efficient Algorithms for Estimating the Parameters of Mixed Linear Regression Models

    Authors: Babak Barazandeh, Ali Ghafelebashi, Meisam Razaviyayn, Ram Sriharsha

    Abstract: Mixed linear regression (MLR) model is among the most exemplary statistical tools for modeling non-linear distributions using a mixture of linear models. When the additive noise in MLR model is Gaussian, Expectation-Maximization (EM) algorithm is a widely-used algorithm for maximum likelihood estimation of MLR parameters. However, when noise is non-Gaussian, the steps of EM algorithm may not have… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

  33. arXiv:2102.12586  [pdf, other

    cs.LG cs.IT

    A Stochastic Optimization Framework for Fair Risk Minimization

    Authors: Andrew Lowy, Sina Baharlouei, Rakesh Pavan, Meisam Razaviyayn, Ahmad Beirami

    Abstract: Despite the success of large-scale empirical risk minimization (ERM) at achieving high accuracy across a variety of machine learning tasks, fair ERM is hindered by the incompatibility of fairness constraints with stochastic optimization. We consider the problem of fair classification with discrete sensitive attributes and potentially large models and data sets, requiring stochastic solvers. Existi… ▽ More

    Submitted 11 January, 2023; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: 44 pages

    Journal ref: Transactions on Machine Learning Research, 2022

  34. arXiv:2102.04704  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Output Perturbation for Differentially Private Convex Optimization: Faster and More General

    Authors: Andrew Lowy, Meisam Razaviyayn

    Abstract: Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning. To date, most work has focused on private empirical risk minimization (ERM) or private stochastic convex optimization (SCO), which corresponds to population loss minimization. However, there are often other objectives-such as fairne… ▽ More

    Submitted 19 September, 2024; v1 submitted 9 February, 2021; originally announced February 2021.

  35. arXiv:2012.02901  [pdf, other

    math.ST stat.AP stat.ME stat.ML

    Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties

    Authors: Dmitrii M. Ostrovskii, Mohamed Ndaoud, Adel Javanmard, Meisam Razaviyayn

    Abstract: Let $θ_0,θ_1 \in \mathbb{R}^d$ be the population risk minimizers associated to some loss $\ell:\mathbb{R}^d\times \mathcal{Z}\to\mathbb{R}$ and two distributions $\mathbb{P}_0,\mathbb{P}_1$ on $\mathcal{Z}$. The models $θ_0,θ_1$ are unknown, and $\mathbb{P}_0,\mathbb{P}_1$ can be accessed by drawing i.i.d samples from them. Our work is motivated by the following model discrimination question: "Wha… ▽ More

    Submitted 10 July, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: 52 pages, 2 figures; corrected the proof of the lower bound; added new applications and the Fisher information-based argument in Appendix F

  36. arXiv:2009.03482  [pdf, ps, other

    math.OC cs.LG

    Alternating Direction Method of Multipliers for Quantization

    Authors: Tianjian Huang, Prajwal Singhania, Maziar Sanjabi, Pabitra Mitra, Meisam Razaviyayn

    Abstract: Quantization of the parameters of machine learning models, such as deep neural networks, requires solving constrained optimization problems, where the constraint set is formed by the Cartesian product of many simple discrete sets. For such optimization problems, we study the performance of the Alternating Direction Method of Multipliers for Quantization ($\texttt{ADMM-Q}$) algorithm, which is a va… ▽ More

    Submitted 1 March, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

  37. arXiv:2006.08141  [pdf, other

    math.OC cs.LG stat.ML

    Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances

    Authors: Meisam Razaviyayn, Tianjian Huang, Songtao Lu, Maher Nouiehed, Maziar Sanjabi, Mingyi Hong

    Abstract: The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very pop… ▽ More

    Submitted 18 August, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Journal ref: IEEE Signal Processing Magazine (Volume: 37, Issue: 5, Sept. 2020)

  38. arXiv:2003.08093  [pdf, other

    math.OC cs.GT cs.LG stat.ML

    Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method

    Authors: Babak Barazandeh, Meisam Razaviyayn

    Abstract: Min-max saddle point games appear in a wide range of applications in machine leaning and signal processing. Despite their wide applicability, theoretical studies are mostly limited to the special convex-concave structure. While some recent works generalized these results to special smooth non-convex cases, our understanding of non-smooth scenarios is still limited. In this work, we study special f… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

  39. arXiv:2002.07919  [pdf, other

    math.OC

    Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems

    Authors: Dmitrii M. Ostrovskii, Andrew Lowy, Meisam Razaviyayn

    Abstract: We propose an efficient algorithm for finding first-order Nash equilibria in min-max problems of the form $\min_{x \in X}\max_{y\in Y} F(x,y)$, where the objective function is smooth in both variables and concave with respect to $y$; the sets $X$ and $Y$ are convex and "projection-friendly," and $Y$ is compact. Our goal is to find an $(\varepsilon_x,\varepsilon_y)$-first-order Nash equilibrium wit… ▽ More

    Submitted 2 May, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: 29 pages; accepted to SIAM Journal on Optimization (as of May 2021)

    MSC Class: 90C06; 90C25; 90C26; 91A99

  40. arXiv:2001.07819  [pdf, other

    stat.ML cs.DS cs.LG math.OC

    Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

    Authors: Zhongruo Wang, Krishnakumar Balasubramanian, Shiqian Ma, Meisam Razaviyayn

    Abstract: In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the Zeroth-Order Gra… ▽ More

    Submitted 4 April, 2022; v1 submitted 21 January, 2020; originally announced January 2020.

    Comments: To appear in the Journal of Global Optimization

  41. arXiv:1911.09815  [pdf, ps, other

    cs.LG

    When Does Non-Orthogonal Tensor Decomposition Have No Spurious Local Minima?

    Authors: Maziar Sanjabi, Sina Baharlouei, Meisam Razaviyayn, Jason D. Lee

    Abstract: We study the optimization problem for decomposing $d$ dimensional fourth-order Tensors with $k$ non-orthogonal components. We derive \textit{deterministic} conditions under which such a problem does not have spurious local minima. In particular, we show that if $κ= \frac{λ_{max}}{λ_{min}} < \frac{5}{4}$, and incoherence coefficient is of the order $O(\frac{1}{\sqrt{d}})$, then all the local minima… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  42. arXiv:1907.04450  [pdf, ps, other

    math.OC cs.CC stat.ML

    SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

    Authors: Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

    Abstract: This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementa… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

  43. arXiv:1906.12005  [pdf, other

    cs.LG stat.ML

    Rényi Fair Inference

    Authors: Sina Baharlouei, Maher Nouiehed, Ahmad Beirami, Meisam Razaviyayn

    Abstract: Machine learning algorithms have been increasingly deployed in critical automated decision-making systems that directly affect human lives. When these algorithms are only trained to minimize the training/test error, they could suffer from systematic discrimination against individuals based on their sensitive attributes such as gender or race. Recently, there has been a surge in machine learning so… ▽ More

    Submitted 13 January, 2020; v1 submitted 27 June, 2019; originally announced June 2019.

    Comments: 11 pages, 1 figure

    Journal ref: International Conference on Learning Representation, 2020

  44. arXiv:1905.11011  [pdf, other

    math.OC cs.AI cs.LG eess.SY

    Robustness of accelerated first-order algorithms for strongly convex optimization problems

    Authors: Hesameddin Mohammadi, Meisam Razaviyayn, Mihailo R. Jovanović

    Abstract: We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-squared error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradien… ▽ More

    Submitted 20 February, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: 45 pages, 6 figures

  45. arXiv:1904.09775  [pdf, other

    cs.LG cs.AI stat.ML

    Training generative networks using random discriminators

    Authors: Babak Barazandeh, Meisam Razaviyayn, Maziar Sanjabi

    Abstract: In recent years, Generative Adversarial Networks (GANs) have drawn a lot of attentions for learning the underlying distribution of data in various applications. Despite their wide applicability, training GANs is notoriously difficult. This difficulty is due to the min-max nature of the resulting optimization problem and the lack of proper tools of solving general (non-convex, non-concave) min-max… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

  46. arXiv:1904.06784  [pdf, ps, other

    math.OC

    A Trust Region Method for Finding Second-Order Stationarity in Linearly Constrained Non-Convex Optimization

    Authors: Maher Nouiehed, Meisam Razaviyayn

    Abstract: Motivated by TRACE algorithm [Curtis et al. 2017], we propose a trust region algorithm for finding second order stationary points of a linearly constrained non-convex optimization problem. We show the convergence of the proposed algorithm to (ε_g, ε_H)-second order stationary points in \widetilde{\mathcal{O}}(\max{ε_g^{-3/2}, ε_H^{-3}}) iterations. This iteration complexity is achieved for general… ▽ More

    Submitted 14 April, 2019; originally announced April 2019.

  47. arXiv:1902.08297  [pdf, other

    math.OC cs.LG stat.ML

    Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods

    Authors: Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D. Lee, Meisam Razaviyayn

    Abstract: Recent applications that arise in machine learning have surged significant interest in solving min-max saddle point games. This problem has been extensively studied in the convex-concave regime for which a global equilibrium solution can be computed efficiently. In this paper, we study the problem in the non-convex regime and show that an \varepsilon--first order stationary point of the game can b… ▽ More

    Submitted 30 October, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

  48. arXiv:1812.08918  [pdf, other

    cs.AR

    Computational RAM to Accelerate String Matching at Scale

    Authors: Zamshed I. Chowdhury, S. Karen Khatamifard, Zhengyang Zhao, Masoud Zabihi, Salonik Resch, Meisam Razaviyayn, Jian-Ping Wang, Sachin Sapatnekar, Ulya R. Karpuzcu

    Abstract: Traditional Von Neumann computing is falling apart in the era of exploding data volumes as the overhead of data transfer becomes forbidding. Instead, it is more energy-efficient to fuse compute capability with memory where the data reside. This is particularly critical for pattern matching, a key computational step in large-scale data analytics, which involves repetitive search over very large dat… ▽ More

    Submitted 20 December, 2018; originally announced December 2018.

  49. arXiv:1812.02878  [pdf, ps, other

    math.OC cs.GT cs.LG

    Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition

    Authors: Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee

    Abstract: In this short note, we consider the problem of solving a min-max zero-sum game. This problem has been extensively studied in the convex-concave regime where the global solution can be computed efficiently. Recently, there have also been developments for finding the first order stationary points of the game when one of the player's objective is concave or (weakly) concave. This work focuses on the… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  50. arXiv:1810.05251  [pdf, other

    math.OC

    A Linearly Convergent Doubly Stochastic Gauss-Seidel Algorithm for Solving Linear Equations and A Certain Class of Over-Parameterized Optimization Problems

    Authors: Meisam Razaviyayn, Mingyi Hong, Navid Reyhanian, Zhi-Quan Luo

    Abstract: Consider the classical problem of solving a general linear system of equations $Ax=b$. It is well known that the (successively over relaxed) Gauss-Seidel scheme and many of its variants may not converge when $A$ is neither diagonally dominant nor symmetric positive definite. Can we have a linearly convergent G-S type algorithm that works for {\it any} $A$? In this paper we answer this question aff… ▽ More

    Submitted 13 May, 2019; v1 submitted 11 October, 2018; originally announced October 2018.