Search | arXiv e-print repository

Fair Supervised Learning Through Constraints on Smooth Nonconvex Unfairness-Measure Surrogates

Authors: Zahra Khatti, Daniel P. Robinson, Frank E. Curtis

Abstract: A new strategy for fair supervised machine learning is proposed. The main advantages of the proposed strategy as compared to others in the literature are as follows. (a) We introduce a new smooth nonconvex surrogate to approximate the Heaviside functions involved in discontinuous unfairness measures. The surrogate is based on smoothing methods from the optimization literature, and is new for the f… ▽ More A new strategy for fair supervised machine learning is proposed. The main advantages of the proposed strategy as compared to others in the literature are as follows. (a) We introduce a new smooth nonconvex surrogate to approximate the Heaviside functions involved in discontinuous unfairness measures. The surrogate is based on smoothing methods from the optimization literature, and is new for the fair supervised learning literature. The surrogate is a tight approximation which ensures the trained prediction models are fair, as opposed to other (e.g., convex) surrogates that can fail to lead to a fair prediction model in practice. (b) Rather than rely on regularizers (that lead to optimization problems that are difficult to solve) and corresponding regularization parameters (that can be expensive to tune), we propose a strategy that employs hard constraints so that specific tolerances for unfairness can be enforced without the complications associated with the use of regularization. (c)~Our proposed strategy readily allows for constraints on multiple (potentially conflicting) unfairness measures at the same time. Multiple measures can be considered with a regularization approach, but at the cost of having even more difficult optimization problems to solve and further expense for tuning. By contrast, through hard constraints, our strategy leads to optimization models that can be solved tractably with minimal tuning. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Report number: Lehigh ISE Technical Report 25T-007

arXiv:2409.09532 [pdf, other]

Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning

Authors: Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson

Abstract: In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed enviro… ▽ More In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed environments, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute algorithmic quantities (e.g., aggregation weights), which leads to a potential leakage of client information. To address these challenges, we propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs in certain scenarios without the need to pass information between clients and server iteratively. In the first stage, for each client, we use its local dataset to obtain a synthetic dataset by solving a bilevel optimization problem that aims to ensure that the ultimate global model yields fair predictions. In the second stage, we apply a method with differential privacy guarantees to the synthetic dataset from the first stage to obtain a second synthetic data. We then pass each client's second-stage synthetic dataset to the server, the collection of which is used to train the server model using conventional machine learning techniques (that no longer need to take fairness metrics or privacy into account). Thus, we eliminate the need to handle fairness-specific aggregation weights while preserving client privacy. Our approach requires only a single communication between the clients and the server (thus making it communication cost-effective), maintains data privacy, and promotes fairness. We present empirical evidence to demonstrate the advantages of our approach. △ Less

Submitted 23 January, 2025; v1 submitted 14 September, 2024; originally announced September 2024.

ACM Class: G.1.6; I.2.0; C.2.4; K.4.1

arXiv:2408.16186 [pdf, other]

Single-Loop Deterministic and Stochastic Interior-Point Algorithms for Nonlinearly Constrained Optimization

Authors: Frank E. Curtis, Xin Jiang, Qi Wang

Abstract: An interior-point algorithm framework is proposed, analyzed, and tested for solving nonlinearly constrained continuous optimization problems. The main setting of interest is when the objective and constraint functions may be nonlinear and/or nonconvex, and when constraint values and derivatives are tractable to compute, but objective function values and derivatives can only be estimated. The algor… ▽ More An interior-point algorithm framework is proposed, analyzed, and tested for solving nonlinearly constrained continuous optimization problems. The main setting of interest is when the objective and constraint functions may be nonlinear and/or nonconvex, and when constraint values and derivatives are tractable to compute, but objective function values and derivatives can only be estimated. The algorithm is intended primarily for a setting that is similar for stochastic-gradient methods for unconstrained optimization, namely, the setting when stochastic-gradient estimates are available and employed in place of gradients of the objective, and when no objective function values (nor estimates of them) are employed. This is achieved by the interior-point framework having a single-loop structure rather than the nested-loop structure that is typical of contemporary interior-point methods. For completeness, convergence guarantees for the framework are provided both for deterministic and stochastic settings. Numerical experiments show that the algorithm yields good performance on a large set of test problems. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Report number: Lehigh ISE Technical Report 24T-008

arXiv:2308.03687 [pdf, other]

Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimization

Authors: Frank E. Curtis, Xin Jiang, Qi Wang

Abstract: Stochastic sequential quadratic optimization (SQP) methods for solving continuous optimization problems with nonlinear equality constraints have attracted attention recently, such as for solving large-scale data-fitting problems subject to nonconvex constraints. However, for a recently proposed subclass of such methods that is built on the popular stochastic-gradient methodology from the unconstra… ▽ More Stochastic sequential quadratic optimization (SQP) methods for solving continuous optimization problems with nonlinear equality constraints have attracted attention recently, such as for solving large-scale data-fitting problems subject to nonconvex constraints. However, for a recently proposed subclass of such methods that is built on the popular stochastic-gradient methodology from the unconstrained setting, convergence guarantees have been limited to the asymptotic convergence of the expected value of a stationarity measure to zero. This is in contrast to the unconstrained setting in which almost-sure convergence guarantees (of the gradient of the objective to zero) can be proved for stochastic-gradient-based methods. In this paper, new almost-sure convergence guarantees for the primal iterates, Lagrange multipliers, and stationarity measures generated by a stochastic SQP algorithm in this subclass of methods are proved. It is shown that the error in the Lagrange multipliers can be bounded by the distance of the primal iterate to a primal stationary point plus the error in the latest stochastic gradient estimate. It is further shown that, subject to certain assumptions, this latter error can be made to vanish by employing a running average of the Lagrange multipliers that are computed during the run of the algorithm. The results of numerical experiments are provided to demonstrate the proved theoretical guarantees. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Report number: Lehigh ISE Technical Report 23T-019 MSC Class: 49M05; 49M37; 62L20; 65K05; 68W20; 90C26; 90C30; 90C55

arXiv:2304.14907 [pdf, other]

A Stochastic-Gradient-based Interior-Point Algorithm for Solving Smooth Bound-Constrained Optimization Problems

Authors: Frank E. Curtis, Vyacheslav Kungurtsev, Daniel P. Robinson, Qi Wang

Abstract: A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using sto… ▽ More A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using stochastic gradient estimates. It is also unique in its use of inner neighborhoods of the feasible region -- defined by a positive and vanishing neighborhood-parameter sequence -- in which the iterates are forced to remain. It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings. The results of numerical experiments show that in both settings the algorithm can outperform projection-based methods. △ Less

Submitted 13 March, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

arXiv:2007.10525 [pdf, other]

Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization

Authors: Albert Berahas, Frank E. Curtis, Daniel P. Robinson, Baoyu Zhou

Abstract: Sequential quadratic optimization algorithms are proposed for solving smooth nonlinear optimization problems with equality constraints. The main focus is an algorithm proposed for the case when the constraint functions are deterministic, and constraint function and derivative values can be computed explicitly, but the objective function is stochastic. It is assumed in this setting that it is intra… ▽ More Sequential quadratic optimization algorithms are proposed for solving smooth nonlinear optimization problems with equality constraints. The main focus is an algorithm proposed for the case when the constraint functions are deterministic, and constraint function and derivative values can be computed explicitly, but the objective function is stochastic. It is assumed in this setting that it is intractable to compute objective function and derivative values explicitly, although one can compute stochastic function and gradient estimates. As a starting point for this stochastic setting, an algorithm is proposed for the deterministic setting that is modeled after a state-of-the-art line-search SQP algorithm, but uses a stepsize selection scheme based on Lipschitz constants (or adaptively estimated Lipschitz constants) in place of the line search. This sets the stage for the proposed algorithm for the stochastic setting, for which it is assumed that line searches would be intractable. Under reasonable assumptions, convergence (resp.,~convergence in expectation) from remote starting points is proved for the proposed deterministic (resp.,~stochastic) algorithm. The results of numerical experiments demonstrate the practical performance of our proposed techniques. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Report number: Lehigh ISE Technical Report 20T-012

arXiv:2001.06699 [pdf, other]

Adaptive Stochastic Optimization

Authors: Frank E. Curtis, Katya Scheinberg

Abstract: Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the… ▽ More Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the potential to offer significant computational savings when training large-scale systems. △ Less

Submitted 18 January, 2020; originally announced January 2020.

arXiv:1711.05305 [pdf, ps, other]

An Accelerated Communication-Efficient Primal-Dual Optimization Framework for Structured Machine Learning

Authors: Chenxin Ma, Martin Jaggi, Frank E. Curtis, Nathan Srebro, Martin Takáč

Abstract: Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of $\mathcal{O}(1/t)$ for solving empirical r… ▽ More Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of $\mathcal{O}(1/t)$ for solving empirical risk minimization problems with Lipschitz continuous losses. In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of $\mathcal{O}(1/t^2)$ in terms of reducing suboptimality. The analysis of this rate is also notable in that the convergence rate bounds involve constants that, except in extreme cases, are significantly reduced compared to those previously provided for CoCoA+. The results of numerical experiments are provided to show that acceleration can lead to significant performance gains. △ Less

Submitted 14 November, 2017; originally announced November 2017.

arXiv:1707.00337 [pdf, ps, other]

Complexity Analysis of a Trust Funnel Algorithm for Equality Constrained Optimization

Authors: Frank E. Curtis, Daniel P. Robinson, Mohammadreza Samadi

Abstract: A method is proposed for solving equality constrained nonlinear optimization problems involving twice continuously differentiable functions. The method employs a trust funnel approach consisting of two phases: a first phase to locate an $ε$-feasible point and a second phase to seek optimality while maintaining at least $ε$-feasibility. A two-phase approach of this kind based on a cubic regularizat… ▽ More A method is proposed for solving equality constrained nonlinear optimization problems involving twice continuously differentiable functions. The method employs a trust funnel approach consisting of two phases: a first phase to locate an $ε$-feasible point and a second phase to seek optimality while maintaining at least $ε$-feasibility. A two-phase approach of this kind based on a cubic regularization methodology was recently proposed along with a supporting worst-case iteration complexity analysis. Unfortunately, however, in that approach, the objective function is completely ignored in the first phase when $ε$-feasibility is sought. The main contribution of the method proposed in this paper is that the same worst-case iteration complexity is achieved, but with a first phase that also accounts for improvements in the objective function. As such, the method typically requires fewer iterations in the second phase, as the results of numerical experiments demonstrate. △ Less

Submitted 2 July, 2017; originally announced July 2017.

Report number: 16T-013 (ISE Department, Lehigh University, Bethlehem, PA, USA)

arXiv:1706.10207 [pdf, other]

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning

Authors: Frank E. Curtis, Katya Scheinberg

Abstract: The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a super… ▽ More The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a supervised learning problem and show how it leads to various optimization problems, depending on the context and underlying assumptions. We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks. The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing stochastic methods, and second-order methods. Finally, we discuss how these approaches can be employed to the training of deep neural networks, emphasizing the difficulties that arise from the complex, nonconvex structure of these models. △ Less

Submitted 30 June, 2017; originally announced June 2017.

arXiv:1606.04838 [pdf, other]

Optimization Methods for Large-Scale Machine Learning

Authors: Léon Bottou, Frank E. Curtis, Jorge Nocedal

Abstract: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine… ▽ More This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations. △ Less

Submitted 8 February, 2018; v1 submitted 15 June, 2016; originally announced June 2016.

arXiv:1508.02452 [pdf, other]

Primal-Dual Active-Set Methods for Isotonic Regression and Trend Filtering

Authors: Zheng Han, Frank E. Curtis

Abstract: Isotonic regression (IR) is a non-parametric calibration method used in supervised learning. For performing large-scale IR, we propose a primal-dual active-set (PDAS) algorithm which, in contrast to the state-of-the-art Pool Adjacent Violators (PAV) algorithm, can be parallized and is easily warm-started thus well-suited in the online settings. We prove that, like the PAV algorithm, our PDAS algor… ▽ More Isotonic regression (IR) is a non-parametric calibration method used in supervised learning. For performing large-scale IR, we propose a primal-dual active-set (PDAS) algorithm which, in contrast to the state-of-the-art Pool Adjacent Violators (PAV) algorithm, can be parallized and is easily warm-started thus well-suited in the online settings. We prove that, like the PAV algorithm, our PDAS algorithm for IR is convergent and has a work complexity of O(n), though our numerical experiments suggest that our PDAS algorithm is often faster than PAV. In addition, we propose PDAS variants (with safeguarding to ensure convergence) for solving related trend filtering (TF) problems, providing the results of experiments to illustrate their effectiveness. △ Less

Submitted 3 April, 2016; v1 submitted 10 August, 2015; originally announced August 2015.

Showing 1–12 of 12 results for author: Curtis, F E