Skip to main content

Showing 1–7 of 7 results for author: Ravi, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16679  [pdf, ps, other

    cs.CV cs.AI cs.LG

    How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions

    Authors: Manuel Brack, Sudeep Katakol, Felix Friedrich, Patrick Schramowski, Hareesh Ravi, Kristian Kersting, Ajinkya Kale

    Abstract: Training data is at the core of any successful text-to-image models. The quality and descriptiveness of image text are crucial to a model's performance. Given the noisiness and inconsistency in web-scraped datasets, recent works shifted towards synthetic training captions. While this setup is generally believed to produce more capable models, current literature does not provide any insights into i… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  2. arXiv:2411.01350  [pdf, other

    cs.LG stat.ML

    The Implicit Bias of Gradient Descent on Separable Multiclass Data

    Authors: Hrithik Ravi, Clayton Scott, Daniel Soudry, Yutong Wang

    Abstract: Implicit bias describes the phenomenon where optimization-based training algorithms, without explicit regularization, show a preference for simple estimators even when more complex estimators have equal objective values. Multiple works have developed the theory of implicit bias for binary classification under the assumption that the loss satisfies an exponential tail property. However, there is a… ▽ More

    Submitted 6 November, 2024; v1 submitted 2 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024

  3. arXiv:2302.14368  [pdf, other

    cs.CV cs.AI cs.GR

    Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods

    Authors: Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale

    Abstract: As Diffusion Models have shown promising performance, a lot of efforts have been made to improve the controllability of Diffusion Models. However, how to train Diffusion Models to have the disentangled latent spaces and how to naturally incorporate the disentangled conditions during the sampling process have been underexplored. In this paper, we present a training framework for feature disentangle… ▽ More

    Submitted 1 April, 2025; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: ECCV 2024; Code will be opened after a patent application is granted

  4. arXiv:2302.11710  [pdf, other

    cs.CV

    Controlled and Conditional Text to Image Generation with Diffusion Prior

    Authors: Pranav Aggarwal, Hareesh Ravi, Naveen Marri, Sachin Kelkar, Fengbin Chen, Vinh Khuc, Midhun Harikumar, Ritiz Tambi, Sudharshan Reddy Kakumanu, Purvak Lapsiya, Alvin Ghouas, Sarah Saber, Malavika Ramprasad, Baldo Faieta, Ajinkya Kale

    Abstract: Denoising Diffusion models have shown remarkable performance in generating diverse, high quality images from text. Numerous techniques have been proposed on top of or in alignment with models like Stable Diffusion and Imagen that generate images directly from text. A lesser explored approach is DALLE-2's two step process comprising a Diffusion Prior that generates a CLIP image embedding from text… ▽ More

    Submitted 1 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

  5. arXiv:2302.07979  [pdf, other

    cs.CV

    PRedItOR: Text Guided Image Editing with Diffusion Prior

    Authors: Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, Ajinkya Kale

    Abstract: Diffusion models have shown remarkable capabilities in generating high quality and creative images conditioned on text. An interesting application of such models is structure preserving text guided image editing. Existing approaches rely on text conditioned diffusion models such as Stable Diffusion or Imagen and require compute intensive optimization of text embeddings or fine-tuning the model wei… ▽ More

    Submitted 20 March, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

  6. arXiv:2109.11047  [pdf, other

    cs.CV

    Cross-Modal Coherence for Text-to-Image Retrieval

    Authors: Malihe Alikhani, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, Matthew Stone

    Abstract: Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Modelfor text-to… ▽ More

    Submitted 15 April, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: This paper is published in AAAI-2022

  7. arXiv:2010.04366  [pdf, other

    cs.SI cs.AI cs.LG

    GitEvolve: Predicting the Evolution of GitHub Repositories

    Authors: Honglu Zhou, Hareesh Ravi, Carlos M. Muniz, Vahid Azizi, Linda Ness, Gerard de Melo, Mubbasir Kapadia

    Abstract: Software development is becoming increasingly open and collaborative with the advent of platforms such as GitHub. Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform. Previous work has mostly considered the dynamics of traditional social networking sites like Twitter and Facebook. We propose GitEvolve, a system to predict the evolution… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.