Skip to main content

Showing 1–21 of 21 results for author: McWilliams, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2206.04993  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

    Authors: Ian Gemp, Charlie Chen, Brian McWilliams

    Abstract: The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive whe… ▽ More

    Submitted 25 April, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Published in ICLR 2023 (JAX code available as part of github.com/deepmind/eigengame)

  2. arXiv:2201.05119  [pdf, other

    cs.CV cs.LG stat.ML

    Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

    Authors: Nenad Tomasev, Ioana Bica, Brian McWilliams, Lars Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

    Abstract: Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning.… ▽ More

    Submitted 3 November, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  3. arXiv:2102.04152  [pdf, other

    stat.ML cs.AI cs.LG

    EigenGame Unloaded: When playing games is better than optimizing

    Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

    Abstract: We build on the recently proposed EigenGame that views eigendecomposition as a competitive game. EigenGame's updates are biased if computed using minibatches of data, which hinders convergence and more sophisticated parallelism in the stochastic setting. In this work, we propose an unbiased stochastic update that is asymptotically equivalent to EigenGame, enjoys greater parallelism allowing comput… ▽ More

    Submitted 22 March, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Published in ICLR '22

  4. arXiv:2010.07922  [pdf, other

    cs.LG cs.CV stat.ML

    Representation Learning via Invariant Causal Mechanisms

    Authors: Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, Charles Blundell

    Abstract: Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representa… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  5. arXiv:2010.00554  [pdf, other

    cs.LG stat.ML

    EigenGame: PCA as a Nash Equilibrium

    Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

    Abstract: We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm -- which combines elements from Oja's rule with a generalized Gram-Schmidt orthogon… ▽ More

    Submitted 16 March, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2021

  6. arXiv:1901.05061  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Spectrogram Feature Losses for Music Source Separation

    Authors: Abhimanyu Sahai, Romann Weber, Brian McWilliams

    Abstract: In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement… ▽ More

    Submitted 26 June, 2019; v1 submitted 15 January, 2019; originally announced January 2019.

    Comments: Accepted for presentation at the 27th European Signal Processing Conference (EUSIPCO 2019)

    MSC Class: 62; 68 ACM Class: I.2.6; H.5.5

  7. arXiv:1808.03856  [pdf, other

    cs.LG cs.GR stat.ML

    Neural Importance Sampling

    Authors: Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, Jan Novák

    Abstract: We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the modeling power of individual coupling lay… ▽ More

    Submitted 3 September, 2019; v1 submitted 11 August, 2018; originally announced August 2018.

    Comments: 19 pages, 15 figures. Accepted for publication in ACM Transactions on Graphics; presented at SIGGRAPH 2019

  8. arXiv:1709.05418  [pdf, other

    cs.LG cs.GR stat.ML

    Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks

    Authors: Simon Kallweit, Thomas Müller, Brian McWilliams, Markus Gross, Jan Novák

    Abstract: We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integrat… ▽ More

    Submitted 15 September, 2017; originally announced September 2017.

    Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2017)

  9. arXiv:1703.00403  [pdf, other

    stat.ML cs.CR cs.DC cs.LG

    Preserving Differential Privacy Between Features in Distributed Estimation

    Authors: Christina Heinze-Deml, Brian McWilliams, Nicolai Meinshausen

    Abstract: Privacy is crucial in many applications of machine learning. Legal, ethical and societal issues restrict the sharing of sensitive data making it difficult to learn from datasets that are partitioned between many parties. One important instance of such a distributed setting arises when information about each record in the dataset is held by different data owners (the design matrix is "vertically-pa… ▽ More

    Submitted 27 June, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

    Journal ref: Stat 7 (1), 2018

  10. arXiv:1702.08591  [pdf, other

    cs.NE cs.LG stat.ML

    The Shattered Gradients Problem: If resnets are the answer, then what is the question?

    Authors: David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, Brian McWilliams

    Abstract: A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. Although, the problem has largely been overcome via carefully constructed initializations and batch normalization, architectures incorporating skip-connections such as highway and resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch… ▽ More

    Submitted 6 June, 2018; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: ICML 2017, final version

    Journal ref: PMLR volume 70 (2017)

  11. arXiv:1611.06652  [pdf, other

    stat.ML cs.LG

    Scalable Adaptive Stochastic Optimization Using Random Projections

    Authors: Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen

    Abstract: Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain bet… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: To appear in Advances in Neural Information Processing Systems 29 (NIPS 2016)

  12. arXiv:1611.02345  [pdf, other

    cs.LG cs.NE stat.ML

    Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

    Authors: David Balduzzi, Brian McWilliams, Tony Butler-Yeoman

    Abstract: Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex; standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets, which furthermore match… ▽ More

    Submitted 6 June, 2018; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: ICML 2017, final version

    Journal ref: PMLR volume 70, 2017

  13. arXiv:1506.03662  [pdf, other

    cs.LG math.OC stat.ML

    Variance Reduced Stochastic Gradient Descent with Neighbors

    Authors: Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams

    Abstract: Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in me… ▽ More

    Submitted 26 February, 2016; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 pages

    MSC Class: 90C06; 90C25; 68T05 ACM Class: G.1.6; I.2.6

  14. arXiv:1506.02554  [pdf, other

    stat.ML cs.DC cs.LG

    DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

    Authors: Christina Heinze, Brian McWilliams, Nicolai Meinshausen

    Abstract: We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO ha… ▽ More

    Submitted 8 January, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: 13 pages

    Journal ref: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 51, 2016, 12 pages

  15. arXiv:1406.3469  [pdf, other

    stat.ML

    LOCO: Distributing Ridge Regression with Random Projections

    Authors: Christina Heinze, Brian McWilliams, Nicolai Meinshausen, Gabriel Krummenacher

    Abstract: We propose LOCO, an algorithm for large-scale ridge regression which distributes the features across workers on a cluster. Important dependencies between variables are preserved using structured random projections which are cheap to compute and must only be communicated once. We show that LOCO obtains a solution which is close to the exact ridge regression solution in the fixed design setting. We… ▽ More

    Submitted 8 June, 2015; v1 submitted 13 June, 2014; originally announced June 2014.

    Comments: 37 pages

  16. arXiv:1406.3175  [pdf, other

    stat.ML

    Fast and Robust Least Squares Estimation in Corrupted Linear Models

    Authors: Brian McWilliams, Gabriel Krummenacher, Mario Lucic, Joachim M. Buhmann

    Abstract: Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence -- for whi… ▽ More

    Submitted 19 June, 2014; v1 submitted 12 June, 2014; originally announced June 2014.

  17. arXiv:1306.5554  [pdf, ps, other

    stat.ML cs.LG

    Correlated random features for fast semi-supervised learning

    Authors: Brian McWilliams, David Balduzzi, Joachim M. Buhmann

    Abstract: This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It… ▽ More

    Submitted 5 November, 2013; v1 submitted 24 June, 2013; originally announced June 2013.

    Comments: 15 pages, 3 figures, 6 tables

  18. A PRESS statistic for two-block partial least squares regression

    Authors: Brian McWilliams, Giovanni Montana

    Abstract: Predictive modelling of multivariate data where both the covariates and responses are high-dimensional is becoming an increasingly popular task in many data mining applications. Partial Least Squares (PLS) regression often turns out to be a useful model in these situations since it performs dimensionality reduction by assuming the existence of a small number of latent factors that may explain the… ▽ More

    Submitted 23 February, 2013; originally announced February 2013.

    Journal ref: Workshop on Computational Intelligence (UKCI), 2010 UK, pp.1-6, 8-10 Sept. 2010

  19. arXiv:1203.1065  [pdf, ps, other

    stat.ML

    Subspace clustering of high-dimensional data: a predictive approach

    Authors: Brian McWilliams, Giovanni Montana

    Abstract: In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace es… ▽ More

    Submitted 5 March, 2012; originally announced March 2012.

  20. arXiv:1202.0825  [pdf, other

    stat.ML

    Multi-view predictive partitioning in high dimensions

    Authors: Brian McWilliams, Giovanni Montana

    Abstract: Many modern data mining applications are concerned with the analysis of datasets in which the observations are described by paired high-dimensional vectorial representations or "views". Some typical examples can be found in web mining and genomics applications. In this article we present an algorithm for data clustering with multiple views, Multi-View Predictive Partitioning (MVPP), which relies o… ▽ More

    Submitted 2 February, 2012; originally announced February 2012.

    Comments: 31 pages, 12 figures

  21. arXiv:0902.1323  [pdf, ps, other

    stat.ML stat.ME

    Sparse partial least squares for on-line variable selection in multivariate data streams

    Authors: Brian McWilliams, Giovanni Montana

    Abstract: In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition w… ▽ More

    Submitted 8 February, 2009; originally announced February 2009.

    Comments: 26 pages, 6 figures, submitted