Skip to main content

Showing 1–20 of 20 results for author: Webb, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.08606  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Distillation Scaling Laws

    Authors: Dan Busbridge, Amitis Shidani, Floris Weers, Jason Ramapuram, Etai Littwin, Russ Webb

    Abstract: We provide a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings reduce the risks associated with using distillation at scale; compute allocation for both the teacher and student models can now be done to maximize student performance. We provide compute optimal distillation recipes for when 1… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 67 pages, 54 figures, 13 tables

  2. arXiv:2409.04431  [pdf, other

    cs.LG

    Theory, Analysis, and Best Practices for Sigmoid Self-Attention

    Authors: Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, Russ Webb

    Abstract: Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sigmoi… ▽ More

    Submitted 21 January, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

  3. arXiv:2403.05490  [pdf, other

    cs.LG cs.AI cs.CV cs.IT stat.ML

    Poly-View Contrastive Learning

    Authors: Amitis Shidani, Devon Hjelm, Jason Ramapuram, Russ Webb, Eeshan Gunesh Dhekane, Dan Busbridge

    Abstract: Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization and sufficient statistics. We show that with unlimit… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024. 42 pages, 7 figures, 3 tables, loss pseudo-code included in appendix

  4. arXiv:2312.03213  [pdf, other

    cs.LG stat.ML

    Bootstrap Your Own Variance

    Authors: Polina Turishcheva, Jason Ramapuram, Sinead Williamson, Dan Busbridge, Eeshan Dhekane, Russ Webb

    Abstract: Understanding model uncertainty is important for many applications. We propose Bootstrap Your Own Variance (BYOV), combining Bootstrap Your Own Latent (BYOL), a negative-free Self-Supervised Learning (SSL) algorithm, with Bayes by Backprop (BBB), a Bayesian method for estimating model posteriors. We find that the learned predictive std of BYOV vs. a supervised BBB model is well captured by a Gauss… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Journal ref: NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

  5. arXiv:2307.13813  [pdf, other

    stat.ML cs.AI cs.LG

    How to Scale Your EMA

    Authors: Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb

    Abstract: Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functio… ▽ More

    Submitted 7 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Spotlight at NeurIPS 2023, 53 pages, 32 figures, 17 tables

  6. arXiv:2210.16365  [pdf, other

    cs.LG

    Elastic Weight Consolidation Improves the Robustness of Self-Supervised Learning Methods under Transfer

    Authors: Andrius Ovsianas, Jason Ramapuram, Dan Busbridge, Eeshan Gunesh Dhekane, Russ Webb

    Abstract: Self-supervised representation learning (SSL) methods provide an effective label-free initial condition for fine-tuning downstream tasks. However, in numerous realistic scenarios, the downstream task might be biased with respect to the target label distribution. This in turn moves the learned fine-tuned model posterior away from the initial (label) bias-free self-supervised model posterior. In thi… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022 Workshop: Self-Supervised Learning - Theory and Practice

  7. arXiv:2203.16490  [pdf, other

    eess.IV cs.CV

    Foveation-based Deep Video Compression without Motion Search

    Authors: Meixu Chen, Richard Webb, Alan C. Bovik

    Abstract: The requirements of much larger file sizes, different storage formats, and immersive viewing conditions of VR pose significant challenges to the goals of acquiring, transmitting, compressing, and displaying high-quality VR content. At the same time, the great potential of deep learning to advance progress on the video compression problem has driven a significant research effort. Because of the hig… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  8. arXiv:2110.00552  [pdf, other

    cs.LG

    Stochastic Contrastive Learning

    Authors: Jason Ramapuram, Dan Busbridge, Xavier Suau, Russ Webb

    Abstract: While state-of-the-art contrastive Self-Supervised Learning (SSL) models produce results competitive with their supervised counterparts, they lack the ability to infer latent variables. In contrast, prescribed latent variable (LV) models enable attributing uncertainty, inducing task specific compression, and in general allow for more interpretable representations. In this work, we introduce LV app… ▽ More

    Submitted 30 November, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted to 2nd Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2021), Sydney, Australia

  9. arXiv:2110.00538  [pdf, other

    cs.LG

    Evaluating the fairness of fine-tuning strategies in self-supervised learning

    Authors: Jason Ramapuram, Dan Busbridge, Russ Webb

    Abstract: In this work we examine how fine-tuning impacts the fairness of contrastive Self-Supervised Learning (SSL) models. Our findings indicate that Batch Normalization (BN) statistics play a crucial role, and that updating only the BN statistics of a pre-trained SSL backbone improves its downstream fairness (36% worst subgroup, 25% mean subgroup gap). This procedure is competitive with supervised learni… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted to BayLearn 2021

  10. arXiv:2110.00528  [pdf, other

    cs.CV cs.LG stat.ML

    Do Self-Supervised and Supervised Methods Learn Similar Visual Representations?

    Authors: Tom George Grigg, Dan Busbridge, Jason Ramapuram, Russ Webb

    Abstract: Despite the success of a number of recent techniques for visual self-supervised deep learning, there has been limited investigation into the representations that are ultimately learned. By leveraging recent advances in the comparison of neural representations, we explore in this direction by comparing a contrastive self-supervised algorithm to supervision for simple image data in a common architec… ▽ More

    Submitted 2 December, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted to 2nd Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2021), Sydney, Australia. Fixed typos, added acknowledgements. 5 pages + 2 pages of appendices, 5 figures, 1 table

  11. FOVQA: Blind Foveated Video Quality Assessment

    Authors: Yize Jin, Anjul Patney, Richard Webb, Alan Bovik

    Abstract: Previous blind or No Reference (NR) video quality assessment (VQA) models largely rely on features drawn from natural scene statistics (NSS), but under the assumption that the image statistics are stationary in the spatial domain. Several of these models are quite successful on standard pictures. However, in Virtual Reality (VR) applications, foveated video compression is regaining attention, and… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  12. arXiv:2011.02523  [pdf, other

    cs.CV cs.GR

    Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

    Authors: Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, Joshua M. Susskind

    Abstract: For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images… ▽ More

    Submitted 17 August, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted for publication at the International Conference on Computer Vision (ICCV) 2021

  13. arXiv:1912.08444  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Relational Mimic for Visual Adversarial Imitation Learning

    Authors: Lionel Blondé, Yichuan Charlie Tang, Jian Zhang, Russ Webb

    Abstract: In this work, we introduce a new method for imitation learning from video demonstrations. Our method, Relational Mimic (RM), improves on previous visual imitation learning methods by combining generative adversarial networks and relational learning. RM is flexible and can be used in conjunction with other recent advances in generative adversarial imitation learning to better address the need for m… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

  14. arXiv:1905.03658  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Improving Discrete Latent Representations With Differentiable Approximation Bridges

    Authors: Jason Ramapuram, Russ Webb

    Abstract: Modern neural network training relies on piece-wise (sub-)differentiable functions in order to use backpropagation to update model parameters. In this work, we introduce a novel method to allow simple non-differentiable functions at intermediary layers of deep neural networks. We do so by training with a differentiable approximation bridge (DAB) neural network which approximates the non-differenti… ▽ More

    Submitted 25 October, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

  15. arXiv:1904.01664  [pdf, other

    cs.HC cs.AI cs.CL

    Mirroring to Build Trust in Digital Assistants

    Authors: Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff

    Abstract: We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way t… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: Preprint

  16. arXiv:1812.03170  [pdf, other

    cs.CV cs.LG stat.ML

    Variational Saccading: Efficient Inference for Large Resolution Images

    Authors: Jason Ramapuram, Maurits Diephuis, Frantzeska Lavda, Russ Webb, Alexandros Kalousis

    Abstract: Image classification with deep neural networks is typically restricted to images of small dimensionality such as 224 x 244 in Resnet models [24]. This limitation excludes the 4000 x 3000 dimensional images that are taken by modern smartphone cameras and smart devices. In this work, we aim to mitigate the prohibitive inferential and memory costs of operating in such large dimensional spaces. To sam… ▽ More

    Submitted 6 September, 2019; v1 submitted 8 December, 2018; originally announced December 2018.

    Comments: Published BMVC 2019 & NIPS 2018 Bayesian Deep Learning Workshop

  17. arXiv:1807.00126  [pdf, other

    cs.LG stat.ML

    A New Benchmark and Progress Toward Improved Weakly Supervised Learning

    Authors: Jason Ramapuram, Russ Webb

    Abstract: Knowledge Matters: Importance of Prior Information for Optimization [7], by Gulcehre et. al., sought to establish the limits of current black-box, deep learning techniques by posing problems which are difficult to learn without engineering knowledge into the model or training procedure. In our work, we completely solve the previous Knowledge Matters problem using a generic model, pose a more diffi… ▽ More

    Submitted 18 September, 2018; v1 submitted 30 June, 2018; originally announced July 2018.

  18. arXiv:1612.07828  [pdf, other

    cs.CV cs.LG cs.NE

    Learning from Simulated and Unsupervised Images through Adversarial Training

    Authors: Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb

    Abstract: With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a mod… ▽ More

    Submitted 19 July, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

    Comments: Accepted at CVPR 2017 for oral presentation

  19. arXiv:1605.03514  [pdf, other

    math.GT cs.CG

    Applications of fast triangulation simplification

    Authors: Mark C. Bell, Richard C. H. Webb

    Abstract: We describe a new algorithm to compute the geometric intersection number between two curves, given as edge vectors on an ideal triangulation. Most importantly, this algorithm runs in polynomial time in the bit-size of the two edge vectors. In its simplest instances, this algorithm works by finding the minimal position of the two curves. We achieve this by phrasing the problem as a collection of… ▽ More

    Submitted 11 May, 2016; originally announced May 2016.

    Comments: 10 pages, 5 figures

  20. arXiv:1509.04315  [pdf, other

    cs.PL

    Implementing a teleo-reactive programming system

    Authors: Robert Webb

    Abstract: This thesis explores the teleo-reactive programming paradigm for controlling autonomous agents, such as robots. Teleo-reactive programming provides a robust, opportunistic method for goal-directed programming that continuously reacts to the sensed environment. In particular, the TR and TeleoR systems are investigated. They influence the design of a teleo-reactive system programming in Python, for… ▽ More

    Submitted 14 September, 2015; originally announced September 2015.