Skip to main content

Showing 1–37 of 37 results for author: Lacoste-Julien, S

Searching in archive math. Search in all archives.
.
  1. arXiv:2505.20628  [pdf, ps, other

    cs.LG math.OC

    Position: Adopt Constraints Over Penalties in Deep Learning

    Authors: Juan Ramirez, Meraj Hashemizadeh, Simon Lacoste-Julien

    Abstract: Recent efforts toward developing trustworthy AI systems with accountability guarantees have led to a growing reliance on machine learning formulations that incorporate external requirements, or constraints. These requirements are often enforced through penalization--adding fixed-weight terms to the task loss. We argue that this approach is ill-suited, and that tailored constrained optimization met… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Code available at https://github.com/merajhashemi/constraints-vs-penalties

  2. arXiv:2406.04558  [pdf, other

    cs.LG math.OC

    On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

    Authors: Motahareh Sohrabi, Juan Ramirez, Tianyue H. Zhang, Simon Lacoste-Julien, Jose Gallego-Posada

    Abstract: Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Published at ICML 2024. Code available at https://github.com/motahareh-sohrabi/nuPI

  3. arXiv:2205.04583  [pdf, other

    math.OC

    Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

    Authors: Antonio Orvieto, Simon Lacoste-Julien, Nicolas Loizou

    Abstract: Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when th… ▽ More

    Submitted 16 February, 2024; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted at NeurIPS 2022 v4: tiny mistake in the main proof (result unchanged) is now fixed, v5: confusing typo fixed

  4. arXiv:2203.04940  [pdf, other

    cs.LG cs.AI cs.DM math.OC

    Data-Efficient Structured Pruning via Submodular Optimization

    Authors: Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien

    Abstract: Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on… ▽ More

    Submitted 10 February, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

  5. arXiv:2111.06826  [pdf, other

    stat.ML cs.LG math.ST

    Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

    Authors: Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

    Abstract: We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime.… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 9 pages and 3 figures + Appendix

  6. arXiv:2107.00052  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

    Authors: Nicolas Loizou, Hugo Berard, Gauthier Gidel, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used success… ▽ More

    Submitted 4 November, 2021; v1 submitted 30 June, 2021; originally announced July 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  7. arXiv:2102.09645  [pdf, other

    cs.LG math.OC stat.ML

    SVRG Meets AdaGrad: Painless Variance Reduction

    Authors: Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  8. arXiv:2011.03351  [pdf, ps, other

    math.OC

    Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets

    Authors: Thomas Kerdreux, Lewis Liu, Simon Lacoste-Julien, Damien Scieur

    Abstract: It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FW's affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the d… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

  9. Machine Learning in Airline Crew Pairing to Construct Initial Clusters for Dynamic Constraint Aggregation

    Authors: Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

    Abstract: The crew pairing problem (CPP) is generally modelled as a set partitioning problem where the flights have to be partitioned in pairings. A pairing is a sequence of flight legs separated by connection time and rest periods that starts and ends at the same base. Because of the extensive list of complex rules and regulations, determining whether a sequence of flights constitutes a feasible pairing ca… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: First publication in the "Cahiers du GERAD" series in February 2020. Submitted to EURO Journal on Transportation and Logistics on January 17, 2020 and available online on September 2, 2020

    Journal ref: EURO Journal on Transportation and Logistics, 100020 (2020)

  10. arXiv:2009.12501  [pdf, other

    cs.LG math.OC stat.ML

    Flight-connection Prediction for Airline Crew Scheduling to Construct Initial Clusters for OR Optimizer

    Authors: Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

    Abstract: We present a case study of using machine learning classification algorithms to initialize a large-scale commercial solver (GENCOL) based on column generation in the context of the airline crew pairing problem, where small savings of as little as 1% translate to increasing annual revenue by dozens of millions of dollars in a large airline. Under the imitation learning framework, we focus on the pro… ▽ More

    Submitted 2 March, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

    Comments: First publication on the "Cahiers du GERAD" series in April 2019

    Report number: G-2019-26

  11. arXiv:2007.04202  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Hamiltonian Gradient Methods for Smooth Games

    Authors: Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using t… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: ICML 2020 - Proceedings of the 37th International Conference on Machine Learning

  12. arXiv:2006.06835  [pdf, other

    cs.LG math.OC stat.ML

    Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

    Authors: Sharan Vaswani, Issam Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only ap… ▽ More

    Submitted 18 February, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  13. arXiv:2002.10542  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

    Authors: Nicolas Loizou, Sharan Vaswani, Issam Laradji, Simon Lacoste-Julien

    Abstract: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting th… ▽ More

    Submitted 22 March, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  14. arXiv:2001.00602  [pdf, other

    cs.LG math.OC stat.ML

    Accelerating Smooth Games by Manipulating Spectral Shapes

    Authors: Waïss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges… ▽ More

    Submitted 9 March, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 34 pages

    MSC Class: G.1.6; I.2.6 ACM Class: G.1.6; I.2.6

  15. arXiv:1910.04920  [pdf, other

    cs.LG math.OC stat.ML

    Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

    Authors: Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien

    Abstract: We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient… ▽ More

    Submitted 22 March, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: AISTATS, 2020

  16. arXiv:1906.05945  [pdf, other

    cs.LG math.OC stat.ML

    A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Games

    Authors: Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We consider differentiable games where the goal is to find a Nash equilibrium. The machine learning community has recently started using variants of the gradient method (GD). Prime examples are extragradient (EG), the optimistic gradient method (OG) and consensus optimization (CO), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full benefits of theses… ▽ More

    Submitted 7 July, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 39 pages. Minor modification regarding prior work in comparison to the AISTATS Proceedings

    ACM Class: G.1.6; I.2.6

  17. arXiv:1905.09997  [pdf, other

    cs.LG math.OC stat.ML

    Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

    Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien

    Abstract: Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques t… ▽ More

    Submitted 4 June, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added a citation to the related work of Paul Tseng, and citations to methods that had previously explored line-searches for deep learning empirically

  18. arXiv:1904.13262  [pdf, other

    cs.LG math.OC stat.ML

    Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

    Authors: Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

    Abstract: When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the train… ▽ More

    Submitted 5 December, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: 19 pages, to appear in NeurIPS 2019 proceedings

  19. arXiv:1904.08598  [pdf, other

    stat.ML cs.LG math.OC

    Reducing Noise in GAN Training with Variance Reduced Extragradient

    Authors: Tatjana Chavdarova, Gauthier Gidel, François Fleuret, Simon Lacoste-Julien

    Abstract: We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges. We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm, which for a large class of games improves upon the previous co… ▽ More

    Submitted 25 June, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: latest NeurIPS'19 version

  20. arXiv:1901.07935   

    cs.LG math.OC stat.ML

    Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

    Authors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Frejinger, Simon Lacoste-Julien, Andrea Lodi

    Abstract: This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict tactical solutions to a given operational problem. In this context, the tactical solution is less detailed than the operational one but it has to be computed in very short time and under imperfect information. The problem is of importa… ▽ More

    Submitted 1 March, 2021; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: Same as arXiv:1807.11876, added by mistake

    Journal ref: INFORMS Journal on Computing 34(1):227-242, 2021

  21. arXiv:1804.03176  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Splitting via Augmented Lagrangian Method

    Authors: Gauthier Gidel, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets,… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

    Comments: Appears in: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018). 30 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  22. arXiv:1802.10551  [pdf, other

    cs.LG math.OC stat.ML

    A Variational Inequality Perspective on Generative Adversarial Networks

    Authors: Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train. One common way to tackle this issue has been to propose new formulations of the GAN objective. Yet, surprisingly few studies have looked at optimization methods designed for this adversarial training. In this work, we cast GAN optimization probl… ▽ More

    Submitted 28 August, 2020; v1 submitted 28 February, 2018; originally announced February 2018.

    Comments: Appears in: Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). Minor modifications with respect to the ICLR version (First paragraph of page 2 and section 3.3): New reference [Popov 1980] and discussion with regards to the novelty of extrapolation from the past. 38 pages

    ACM Class: I.2.6; G.1.6

  23. arXiv:1801.03749  [pdf, other

    math.OC cs.LG stat.ML

    Improved asynchronous parallel optimization analysis for stochastic incremental methods

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: As datasets continue to increase in size and multi-core computer architectures are developed, asynchronous parallel optimization algorithms become more and more essential to the field of Machine Learning. Unfortunately, conducting the theoretical analysis asynchronous methods is difficult, notably due to the introduction of delay and inconsistency in inherently sequential algorithms. Handling thes… ▽ More

    Submitted 21 March, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: 67 pages, published in JMLR, can be found online at http://jmlr.org/papers/v19/17-650.html. arXiv admin note: substantial text overlap with arXiv:1606.04809

  24. arXiv:1707.06468  [pdf, other

    math.OC cs.LG stat.ML

    Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

    Authors: Fabian Pedregosa, Rémi Leblond, Simon Lacoste-Julien

    Abstract: Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as… ▽ More

    Submitted 5 November, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

    Journal ref: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  25. arXiv:1610.07797  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Algorithms for Saddle Point Problems

    Authors: Gauthier Gidel, Tony Jebara, Simon Lacoste-Julien

    Abstract: We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also… ▽ More

    Submitted 3 March, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017). 39 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  26. arXiv:1607.00345  [pdf, other

    math.OC cs.LG math.NA stat.ML

    Convergence Rate of Frank-Wolfe for Non-Convex Objectives

    Authors: Simon Lacoste-Julien

    Abstract: We give a simple proof that the Frank-Wolfe algorithm obtains a stationary point at a rate of $O(1/\sqrt{t})$ on non-convex objectives with a Lipschitz continuous gradient. Our analysis is affine invariant and is the first, to the best of our knowledge, giving a similar rate to what was already proven for projected gradient methods (though on slightly different measures of stationarity).

    Submitted 1 July, 2016; originally announced July 2016.

    Comments: 6 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  27. arXiv:1606.04809  [pdf, other

    math.OC cs.LG stat.ML

    ASAGA: Asynchronous Parallel SAGA

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently introduce… ▽ More

    Submitted 8 November, 2017; v1 submitted 15 June, 2016; originally announced June 2016.

    Comments: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), 37 pages

  28. arXiv:1605.09346  [pdf, other

    cs.LG math.OC stat.ML

    Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

    Authors: Anton Osokin, Jean-Baptiste Alayrac, Isabella Lukasewitz, Puneet K. Dokania, Simon Lacoste-Julien

    Abstract: In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the bl… ▽ More

    Submitted 30 May, 2016; originally announced May 2016.

    Comments: Appears in Proceedings of the 33rd International Conference on Machine Learning (ICML 2016). 31 pages

    MSC Class: 90C52; 90C90; 90C06; 68T05 ACM Class: G.1.6; I.2.6

  29. arXiv:1511.05932  [pdf, other

    math.OC cs.LG stat.ML

    On the Global Linear Convergence of Frank-Wolfe Optimization Variants

    Authors: Simon Lacoste-Julien, Martin Jaggi

    Abstract: The Frank-Wolfe (FW) optimization algorithm has lately re-gained popularity thanks in particular to its ability to nicely handle the structured constraints appearing in machine learning applications. However, its convergence rate is known to be slow (sublinear) when the solution lies at the boundary. A simple less-known fix is to add the possibility to take 'away steps' during optimization, an ope… ▽ More

    Submitted 18 November, 2015; originally announced November 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 26 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  30. arXiv:1511.02124  [pdf, other

    stat.ML cs.LG math.OC

    Barrier Frank-Wolfe for Marginal Inference

    Authors: Rahul G. Krishnan, Simon Lacoste-Julien, David Sontag

    Abstract: We introduce a globally-convergent algorithm for optimizing the tree-reweighted (TRW) variational objective over the marginal polytope. The algorithm is based on the conditional gradient method (Frank-Wolfe) and moves pseudomarginals within the marginal polytope through repeated maximum a posteriori (MAP) calls. This modular structure enables us to leverage black-box MAP solvers (both exact and ap… ▽ More

    Submitted 25 November, 2015; v1 submitted 6 November, 2015; originally announced November 2015.

    Comments: 25 pages, 12 figures, To appear in Neural Information Processing Systems (NIPS) 2015, Corrected reference and cleaned up bibliography

  31. arXiv:1506.03662  [pdf, other

    cs.LG math.OC stat.ML

    Variance Reduced Stochastic Gradient Descent with Neighbors

    Authors: Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams

    Abstract: Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in me… ▽ More

    Submitted 26 February, 2016; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 pages

    MSC Class: 90C06; 90C25; 68T05 ACM Class: G.1.6; I.2.6

  32. arXiv:1408.3304  [pdf, other

    cs.CV math.OC

    On Pairwise Costs for Network Flow Multi-Object Tracking

    Authors: Visesh Chari, Simon Lacoste-Julien, Ivan Laptev, Josef Sivic

    Abstract: Multi-object tracking has been recently approached with the min-cost network flow optimization techniques. Such methods simultaneously resolve multiple object tracks in a video and enable modeling of dependencies among tracks. Min-cost network flow methods also fit well within the "tracking-by-detection" paradigm where object trajectories are obtained by connecting per-frame outputs of an object d… ▽ More

    Submitted 5 May, 2015; v1 submitted 14 August, 2014; originally announced August 2014.

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5537-5545

  33. arXiv:1407.0202  [pdf, other

    cs.LG math.OC stat.ML

    SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

    Authors: Aaron Defazio, Francis Bach, Simon Lacoste-Julien

    Abstract: In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA… ▽ More

    Submitted 16 December, 2014; v1 submitted 1 July, 2014; originally announced July 2014.

    Comments: Advances In Neural Information Processing Systems, Nov 2014, Montreal, Canada

  34. arXiv:1312.7864  [pdf, ps, other

    math.OC

    An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms

    Authors: Simon Lacoste-Julien, Martin Jaggi

    Abstract: We study the linear convergence of variants of the Frank-Wolfe algorithms for some classes of strongly convex problems, using only affine-invariant quantities. As in Guelat & Marcotte (1986), we show the linear convergence of the standard Frank-Wolfe algorithm when the solution is in the interior of the domain, but with affine invariant constants. We also show the linear convergence of the away-st… ▽ More

    Submitted 3 January, 2014; v1 submitted 30 December, 2013; originally announced December 2013.

    Comments: appeared at the NIPS 2013 Workshop on Greedy Algorithms, Frank-Wolfe and Friends (v2: added acknowledgements)

    MSC Class: 90C25 ACM Class: G.1.6

    Journal ref: NIPS 2013 Workshop on Greedy Algorithms, Frank-Wolfe and Friends

  35. arXiv:1212.2002  [pdf, other

    cs.LG math.OC stat.ML

    A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

    Authors: Simon Lacoste-Julien, Mark Schmidt, Francis Bach

    Abstract: In this note, we present a new averaging technique for the projected stochastic subgradient method. By using a weighted average with a weight of t+1 for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation. The new scheme is compared empirically to existing techniques, with similar performance behavior.

    Submitted 20 December, 2012; v1 submitted 10 December, 2012; originally announced December 2012.

    Comments: 8 pages, 6 figures. Changes with previous version: Added reference to concurrently submitted work arXiv:1212.1824v1; clarifications added; typos corrected; title changed to 'subgradient method' as 'subgradient descent' is misnomer

    MSC Class: 90C15; 68T05; 65K10 ACM Class: G.1.6; I.2.6

  36. arXiv:1207.4747  [pdf, other

    cs.LG math.OC stat.ML

    Block-Coordinate Frank-Wolfe Optimization for Structural SVMs

    Authors: Simon Lacoste-Julien, Martin Jaggi, Mark Schmidt, Patrick Pletscher

    Abstract: We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full Frank-Wolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online alg… ▽ More

    Submitted 14 January, 2013; v1 submitted 19 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the 30th International Conference on Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added weighted averaging experiments + convergence results; 4) Clarified main text and relationship with appendix

    MSC Class: 90C52; 90C90; 90C06; 68T05 ACM Class: G.1.6; I.2.6

  37. arXiv:1203.4523  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Equivalence between Herding and Conditional Gradient Algorithms

    Authors: Francis Bach, Simon Lacoste-Julien, Guillaume Obozinski

    Abstract: We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We st… ▽ More

    Submitted 11 September, 2012; v1 submitted 20 March, 2012; originally announced March 2012.

    Journal ref: ICML 2012 International Conference on Machine Learning, Edimburgh : Royaume-Uni (2012)