Skip to main content

Showing 1–50 of 202 results for author: Ye, J C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.17077  [pdf, other

    physics.optics cs.AI physics.comp-ph

    Physics-guided and fabrication-aware inverse design of photonic devices using diffusion models

    Authors: Dongjin Seo, Soobin Um, Sangbin Lee, Jong Chul Ye, Haejun Chung

    Abstract: Designing free-form photonic devices is fundamentally challenging due to the vast number of possible geometries and the complex requirements of fabrication constraints. Traditional inverse-design approaches--whether driven by human intuition, global optimization, or adjoint-based gradient methods--often involve intricate binarization and filtering steps, while recent deep learning strategies deman… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 25 pages, 7 Figures

  2. arXiv:2504.01689  [pdf, other

    cs.CV

    InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

    Authors: Noam Elata, Hyungjin Chung, Jong Chul Ye, Tomer Michaeli, Michael Elad

    Abstract: Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists, regarding the way the conditioned synthesis is employed: Training-based methods achieve high quality results, while zero-shot approaches trade this with flexibility. This work introduces a… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  3. arXiv:2503.22622  [pdf, other

    cs.CV

    Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model

    Authors: Jangho Park, Taesung Kwon, Jong Chul Ye

    Abstract: Recently, multi-view or 4D video generation has emerged as a significant research topic. Nonetheless, recent approaches to 4D generation still struggle with fundamental limitations, as they primarily rely on harnessing multiple video diffusion models with additional training or compute-intensive training of a full 4D diffusion model with limited real-world 4D data and large computational costs. To… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: project page: https://zero4dvid.github.io/

  4. arXiv:2503.15056  [pdf, other

    cs.CV

    Single-Step Bidirectional Unpaired Image Translation Using Implicit Bridge Consistency Distillation

    Authors: Suhyeon Lee, Kwanyoung Kim, Jong Chul Ye

    Abstract: Unpaired image-to-image translation has seen significant progress since the introduction of CycleGAN. However, methods based on diffusion models or Schrödinger bridges have yet to be widely adopted in real-world applications due to their iterative sampling nature. To address this challenge, we propose a novel framework, Implicit Bridge Consistency Distillation (IBCD), which enables single-step bid… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 25 pages, 16 figures

  5. arXiv:2503.09151  [pdf, other

    cs.CV cs.AI

    Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

    Authors: Hyeonho Jeong, Suhyeon Lee, Jong Chul Ye

    Abstract: We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video. Unlike mainstream approaches that train multi-view video diffusion models on large-scale 4D datasets, our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors. In essence, Reangle-A-… ▽ More

    Submitted 17 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Project page: https://hyeonho99.github.io/reangle-a-video/

  6. arXiv:2503.08250  [pdf, other

    cs.CV cs.AI cs.LG

    Aligning Text to Image in Diffusion Models is Easier Than You Think

    Authors: Jaa-Yeon Lee, Byunghee Cha, Jeongsol Kim, Jong Chul Ye

    Abstract: While recent advancements in generative modeling have significantly improved text-image alignment, some residual misalignment between text and image representations still remains. Although many approaches have attempted to address this issue by fine-tuning models using various reward models, etc., we revisit the challenge from the perspective of representation alignment-an approach that has gained… ▽ More

    Submitted 21 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  7. arXiv:2503.08136  [pdf, other

    cs.CV cs.AI cs.LG

    FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems

    Authors: Jeongsol Kim, Bryan Sangwoo Kim, Jong Chul Ye

    Abstract: Flow matching is a recent state-of-the-art framework for generative modeling based on ordinary differential equations (ODEs). While closely related to diffusion models, it provides a more general perspective on generative modeling. Although inverse problem solving has been extensively explored using diffusion models, it has not been rigorously examined within the broader context of flow models. Th… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  8. arXiv:2502.19429  [pdf, other

    q-bio.GN cs.LG

    scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders

    Authors: Gyutaek Oh, Baekgyu Choi, Seyoung Jin, Inkyung Jung, Jong Chul Ye

    Abstract: Single-nucleus RNA sequencing (snRNA-seq) has significantly advanced our understanding of the disease etiology of neurodegenerative disorders. However, the low quality of specimens derived from postmortem brain tissues, combined with the high variability caused by disease heterogeneity, makes it challenging to integrate snRNA-seq data from multiple sources for precise analyses. To address these ch… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 41 pages, 12 figures

  9. arXiv:2502.06516  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation

    Authors: Soobin Um, Beomsu Kim, Jong Chul Ye

    Abstract: Minority samples are underrepresented instances located in low-density regions of a data manifold, and are valuable in many generative AI applications, such as data augmentation, creative content generation, etc. Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple y… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 29 pages, 11 figures

  10. arXiv:2501.04284  [pdf, other

    cs.CV cs.LG

    ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning

    Authors: Hyungjin Chung, Dohun Lee, Zihui Wu, Byung-Hoon Kim, Katherine L. Bouman, Jong Chul Ye

    Abstract: Compressed sensing MRI seeks to accelerate MRI acquisition processes by sampling fewer k-space measurements and then reconstructing the missing data algorithmically. The success of these approaches often relies on strong priors or learned statistical models. While recent diffusion model-based priors have shown great potential, previous methods typically ignore clinically available metadata (e.g. p… ▽ More

    Submitted 8 January, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

    Comments: 29 pages, 9 figures. Code is available at https://github.com/DoHunLee1/ContextMRI

  11. arXiv:2412.13558  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation

    Authors: Changsun Lee, Sangjoon Park, Cheong-Il Shin, Woo Hee Choi, Hyun Jeong Park, Jeong Eun Lee, Jong Chul Ye

    Abstract: Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric feat… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  12. arXiv:2412.08871  [pdf, other

    cs.CV cs.AI

    Inference-Time Diffusion Model Distillation

    Authors: Geon Yeong Park, Sang Wan Lee, Jong Chul Ye

    Abstract: Diffusion distillation models effectively accelerate reverse sampling by compressing the process into fewer steps. However, these models still exhibit a performance gap compared to their pre-trained diffusion model counterparts, exacerbated by distribution shifts and accumulated errors during multi-step sampling. To address this, we introduce Distillation++, a novel inference-time distillation fra… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: Code: https://github.com/geonyeong-park/inference_distillation

  13. arXiv:2412.06016  [pdf, other

    cs.CV cs.AI cs.LG

    Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

    Authors: Hyeonho Jeong, Chun-Hao Paul Huang, Jong Chul Ye, Niloy Mitra, Duygu Ceylan

    Abstract: While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this is because there is no explicit supervision in terms of spatial tracking at the feature level. We propose Track4Gen, a spatially aware video generator that comb… ▽ More

    Submitted 7 April, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

    Comments: CVPR 2025, Project page: hyeonho99.github.io/track4gen

  14. arXiv:2412.00156  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models

    Authors: Taesung Kwon, Jong Chul Ye

    Abstract: In this paper, we propose a novel framework for solving high-definition video inverse problems using latent image diffusion models. Building on recent advancements in spatio-temporal optimization for video inverse problems using image diffusion models, our approach leverages latent-space diffusion models to achieve enhanced video quality and resolution. To address the high computational demands of… ▽ More

    Submitted 6 March, 2025; v1 submitted 29 November, 2024; originally announced December 2024.

    Comments: Project page: https://vision-xl.github.io/

  15. arXiv:2411.17077  [pdf, other

    cs.LG cs.AI cs.CV

    Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts

    Authors: Jinho Chang, Hyungjin Chung, Jong Chul Ye

    Abstract: As Classifier-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment, many applications use a negated CFG term to filter out unwanted features from samples. However, simply negating CFG guidance creates an inverted probability distribution, often distorting samples away from the marginal distribution. Inspired by recent advances in conditi… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 14 pages, 8 figures

  16. arXiv:2411.17041  [pdf, other

    cs.CV cs.AI cs.LG

    Free$^2$Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models

    Authors: Jaemin Kim, Bryan S Kim, Jong Chul Ye

    Abstract: Diffusion models have achieved impressive results in generative tasks like text-to-image (T2I) and text-to-video (T2V) synthesis. However, achieving accurate text alignment in T2V generation remains challenging due to the complex temporal dependency across frames. Existing reinforcement learning (RL)-based approaches to enhance text alignment often require differentiable reward functions or are co… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 15 pages

  17. arXiv:2411.15540  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Optical-Flow Guided Prompt Optimization for Coherent Video Generation

    Authors: Hyelin Nam, Jaemin Kim, Dohun Lee, Jong Chul Ye

    Abstract: While text-to-video diffusion models have made significant strides, many still face challenges in generating videos with temporal consistency. Within diffusion frameworks, guidance techniques have proven effective in enhancing output quality during inference; however, applying these methods to video diffusion models introduces additional complexity of handling computations across entire sequences.… ▽ More

    Submitted 23 March, 2025; v1 submitted 23 November, 2024; originally announced November 2024.

    Comments: CVPR 2025 (poster); project page: https://motionprompt.github.io/

  18. arXiv:2411.15490  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

    Authors: Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye

    Abstract: Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contai… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  19. arXiv:2411.15265  [pdf, other

    cs.CV cs.LG

    Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI

    Authors: Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye

    Abstract: Gradient-based methods are a prototypical family of explainability techniques, especially for image-based models. Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

    Comments: 19 pages, 5 figures

  20. arXiv:2411.14863  [pdf, other

    cs.CV cs.AI cs.LG

    Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation

    Authors: Jeongsol Kim, Beomsu Kim, Jong Chul Ye

    Abstract: Diffusion models (DMs), which enable both image generation from noise and inversion from data, have inspired powerful unpaired image-to-image (I2I) translation algorithms. However, they often require a larger number of neural function evaluations (NFEs), limiting their practical applicability. In this paper, we tackle this problem with Schrodinger Bridges (SBs), which are stochastic differential e… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

  21. arXiv:2410.07838  [pdf, other

    cs.CV cs.AI cs.LG

    Minority-Focused Text-to-Image Generation via Prompt Optimization

    Authors: Soobin Um, Jong Chul Ye

    Abstract: We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretr… ▽ More

    Submitted 4 April, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: CVPR 2025 (Oral), 21 pages, 10 figures

  22. arXiv:2410.07815  [pdf, other

    cs.LG cs.CV

    Simple ReFlow: Improved Techniques for Fast Flow Models

    Authors: Beomsu Kim, Yu-Guan Hsieh, Michal Klein, Marco Cuturi, Jong Chul Ye, Bahjat Kawar, James Thornton

    Abstract: Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quali… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  23. arXiv:2410.05651  [pdf, other

    cs.CV cs.AI cs.LG

    ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

    Authors: Serin Yang, Taesung Kwon, Jong Chul Ye

    Abstract: Recent progress in large-scale text-to-video (T2V) and image-to-video (I2V) diffusion models has greatly enhanced video generation, especially in terms of keyframe interpolation. However, current image-to-video diffusion models, while powerful in generating videos from a single conditioning frame, need adaptation for two-frame (start & end) conditioned generation, which is essential for effective… ▽ More

    Submitted 1 March, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: ICLR 2025; Project page: https://vibidsampler.github.io/

  24. arXiv:2410.05591  [pdf, other

    cs.CV

    TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: Despite significant advancements in customizing text-to-image and video generation models, generating images and videos that effectively integrate multiple personalized concepts remains a challenging task. To address this, we present TweedieMix, a novel method for composing customized diffusion models during the inference phase. By analyzing the properties of reverse diffusion sampling, our approa… ▽ More

    Submitted 3 March, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Github Page: https://github.com/KwonGihyun/TweedieMix

  25. arXiv:2410.04721  [pdf, other

    cs.LG cs.CV

    ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction

    Authors: Hyungjin Chung, Dohun Lee, Jong Chul Ye

    Abstract: Autoregressive models (ARMs) and diffusion models (DMs) represent two leading paradigms in generative modeling, each excelling in distinct areas: ARMs in global context modeling and long-sequence generation, and DMs in generating high-quality local contexts, especially for continuous data such as images and short videos. However, ARMs often suffer from exponential error accumulation over long sequ… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: 25 pages, 10 figures. Project page: https://acdc2025.github.io/

  26. arXiv:2410.04364  [pdf, other

    cs.CV cs.AI cs.LG

    VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

    Authors: Dohun Lee, Bryan S Kim, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-to-image (T2I) diffusion models have revolutionized visual content creation, but extending these capabilities to text-to-video (T2V) generation remains a challenge, particularly in preserving temporal consistency. Existing methods that aim to improve consistency often cause trade-offs such as reduced imaging quality and impractical computational time. To address these issues we introduce Vide… ▽ More

    Submitted 8 December, 2024; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: 26 pages, 19 figures, Project Page: https://dohunlee1.github.io/videoguide.github.io/

  27. arXiv:2410.00083  [pdf, ps, other

    cs.LG cs.AI cs.CV

    A Survey on Diffusion Models for Inverse Problems

    Authors: Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, Mauricio Delbracio

    Abstract: Diffusion models have become increasingly popular for generative modeling due to their ability to generate high-quality samples. This has unlocked exciting new possibilities for solving inverse problems, especially in image restoration and reconstruction, by treating diffusion models as unsupervised priors. This survey provides a comprehensive overview of methods that utilize pre-trained diffusion… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Work in progress. 38 pages

  28. arXiv:2410.00046  [pdf, other

    eess.IV cs.CV cs.LG

    Mixture of Multicenter Experts in Multimodal Generative AI for Advanced Radiotherapy Target Delineation

    Authors: Yujin Oh, Sangjoon Park, Xiang Li, Wang Yi, Jonathan Paly, Jason Efstathiou, Annie Chan, Jun Won Kim, Hwa Kyung Byun, Ik Jae Lee, Jaeho Cho, Chan Woo Wee, Peng Shu, Peilong Wang, Nathan Yu, Jason Holmes, Jong Chul Ye, Quanzheng Li, Wei Liu, Woong Sub Koom, Jin Sung Kim, Kyungsang Kim

    Abstract: Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the… ▽ More

    Submitted 26 October, 2024; v1 submitted 27 September, 2024; originally announced October 2024.

    Comments: 39 pages

  29. arXiv:2409.12377  [pdf, other

    eess.IV cs.CV

    Fundus image enhancement through direct diffusion bridges

    Authors: Sehui Kim, Hyungjin Chung, Se Hie Park, Eui-Sang Chung, Kayoung Yi, Jong Chul Ye

    Abstract: We propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: Published at IEEE JBHI. 12 pages, 10 figures. Code and Data: https://github.com/heeheee888/FD3

  30. arXiv:2409.02574  [pdf, other

    cs.CV cs.AI stat.ML

    Solving Video Inverse Problems Using Image Diffusion Models

    Authors: Taesung Kwon, Jong Chul Ye

    Abstract: Recently, diffusion model-based inverse problem solvers (DIS) have emerged as state-of-the-art approaches for addressing inverse problems, including image super-resolution, deblurring, inpainting, etc. However, their application to video inverse problems arising from spatio-temporal degradation remains largely unexplored due to the challenges in training video diffusion models. To address this iss… ▽ More

    Submitted 27 February, 2025; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: ICLR 2025; 25 pages, 17 figures

  31. arXiv:2407.17907  [pdf, other

    cs.CV cs.LG

    Amortized Posterior Sampling with Diffusion Prior Distillation

    Authors: Abbas Mammadov, Hyungjin Chung, Jong Chul Ye

    Abstract: We propose a variational inference approach to sample from the posterior distribution for solving inverse problems. From a pre-trained diffusion model, our approach trains a conditional flow model to minimize the divergence between the proposal variational distribution and the posterior distribution implicitly defined through the diffusion model. Once trained, the flow model is capable of sampling… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  32. arXiv:2407.11555  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Generation of Minority Samples Using Diffusion Models

    Authors: Soobin Um, Jong Chul Ye

    Abstract: We present a novel approach for generating minority samples that live on low-density regions of a data manifold. Our framework is built upon diffusion models, leveraging the principle of guided sampling that incorporates an arbitrary energy-based guidance during inference time. The key defining feature of our sampler lies in its \emph{self-contained} nature, \ie, implementable solely with a pretra… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  33. arXiv:2407.11244  [pdf, other

    cs.LG

    (Deep) Generative Geodesics

    Authors: Beomsu Kim, Michael Puthawala, Jong Chul Ye, Emanuele Sansone

    Abstract: In this work, we propose to study the global geometrical properties of generative models. We introduce a new Riemannian metric to assess the similarity between any two data points. Importantly, our metric is agnostic to the parametrization of the generative model and requires only the evaluation of its data likelihood. Moreover, the metric leads to the conceptual definition of generative distances… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures

  34. arXiv:2407.10641  [pdf, other

    cs.CV cs.LG

    Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems

    Authors: Hyungjin Chung, Jong Chul Ye

    Abstract: Recent inverse problem solvers that leverage generative diffusion priors have garnered significant attention due to their exceptional quality. However, adaptation of the prior is necessary when there exists a discrepancy between the training and testing distributions. In this work, we propose deep diffusion image prior (DDIP), which generalizes the recent adaptation method of SCD by introducing a… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, 25 pages, 8 figures

  35. arXiv:2406.08070  [pdf, other

    cs.CV cs.AI cs.LG

    CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

    Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More

    Submitted 12 September, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 25 pages, 21 figures. Project Page: https://cfgpp-diffusion.github.io/

  36. arXiv:2405.17829  [pdf, other

    cs.LG cs.AI

    LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space

    Authors: Jinho Chang, Jong Chul Ye

    Abstract: With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model… ▽ More

    Submitted 3 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2405.17720  [pdf, other

    cs.CV cs.AI cs.LG

    MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding

    Authors: Inhwa Han, Jaayeon Lee, Jong Chul Ye

    Abstract: Research efforts for visual decoding from fMRI signals have attracted considerable attention in research community. Still multi-subject fMRI decoding with one model has been considered intractable due to the drastic variations in fMRI signals between subjects and even within the same subject across different trials. To address current limitations in multi-subject brain decoding, here we introduce… ▽ More

    Submitted 6 October, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  38. arXiv:2405.16823  [pdf, other

    cs.CV cs.AI

    Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

    Authors: Gihyun Kwon, Jangho Park, Jong Chul Ye

    Abstract: While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approache… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://unifyediting.github.io/

  39. arXiv:2404.03913  [pdf, other

    cs.CV cs.AI cs.LG

    Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

    Authors: Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

    Abstract: While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  40. arXiv:2403.15249  [pdf, other

    cs.CV cs.AI cs.LG

    Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

    Authors: Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee, Jong Chul Ye

    Abstract: The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive fram… ▽ More

    Submitted 19 December, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: AAAI 2025, Project page: https://geonyeong-park.github.io/spectral-motion-alignment/

  41. arXiv:2403.14183  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

    Authors: Kwanyoung Kim, Yujin Oh, Jong Chul Ye

    Abstract: The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ECCV 2024; 23 pages, 8 tables, 8 figures; Project Page: https://cubeyoung.github.io/OTSeg_project/

  42. arXiv:2403.13551  [pdf, other

    cs.CV cs.LG

    Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing

    Authors: Hangeol Chang, Jinho Chang, Jong Chul Ye

    Abstract: Despite recent advancements in text-to-image diffusion models facilitating various image editing techniques, complex text prompts often lead to an oversight of some requests due to a bottleneck in processing text information. To tackle this challenge, we present Ground-A-Score, a simple yet powerful model-agnostic image editing method by incorporating grounding during score distillation. This appr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  43. arXiv:2403.12510  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Consistency Trajectory Models for Image Manipulation

    Authors: Beomsu Kim, Jaemin Kim, Jeongsol Kim, Jong Chul Ye

    Abstract: Diffusion models (DMs) excel in unconditional generation, as well as on applications such as image editing and restoration. The success of DMs lies in the iterative nature of diffusion: diffusion breaks down the complex process of mapping noise to data into a sequence of simple denoising tasks. Moreover, we are able to exert fine-grained control over the generation process by injecting guidance te… ▽ More

    Submitted 10 October, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  44. arXiv:2403.12002  [pdf, other

    cs.CV cs.AI

    DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

    Authors: Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024, Project page: https://hyeonho99.github.io/dreammotion/

  45. arXiv:2403.11415  [pdf, other

    cs.CV cs.AI cs.LG

    DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

    Authors: Jeongsol Kim, Geon Yeong Park, Jong Chul Ye

    Abstract: Reverse sampling and score-distillation have emerged as main workhorses in recent years for image manipulation using latent diffusion models (LDMs). While reverse diffusion sampling often requires adjustments of LDM architecture or feature engineering, score distillation offers a simple yet powerful model-agnostic approach, but it is often prone to mode-collapsing. To address these limitations and… ▽ More

    Submitted 23 September, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: ECCV 2024

  46. arXiv:2403.06275  [pdf, other

    cs.CV cs.AI cs.LG physics.med-ph

    UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation

    Authors: Kwanyoung Kim, Jaa-Yeon Lee, Jong Chul Ye

    Abstract: Nakagami imaging holds promise for visualizing and quantifying tissue scattering in ultrasound waves, with potential applications in tumor diagnosis and fat fraction estimation which are challenging to discern by conventional ultrasound B-mode images. Existing methods struggle with optimal window size selection and suffer from estimator instability, leading to degraded resolution images. To addres… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figure

  47. arXiv:2402.08601  [pdf, other

    cs.CV

    Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing

    Authors: Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jong Chul Ye

    Abstract: Text-guided non-rigid editing involves complex edits for input images, such as changing motion or compositions within their surroundings. Since it requires manipulating the input structure, existing methods often struggle with preserving object identity and background, particularly when combined with Stable Diffusion. In this work, we propose a training-free approach for non-rigid editing with Sta… ▽ More

    Submitted 16 October, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: This manuscript has been submitted to Pattern Recognition Letters

  48. arXiv:2402.02407  [pdf, other

    cs.LG cs.CV cs.NE

    Defining Neural Network Architecture through Polytope Structures of Dataset

    Authors: Sangmin Lee, Abbas Mammadov, Jong Chul Ye

    Abstract: Current theoretical and empirical research in neural networks suggests that complex datasets require large network architectures for thorough classification, yet the precise nature of this relationship remains unclear. This paper tackles this issue by defining upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question. We also delve in… ▽ More

    Submitted 30 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  49. arXiv:2312.08223  [pdf, other

    cs.CV

    Patch-wise Graph Contrastive Learning for Image Translation

    Authors: Chanyong Jung, Gihyun Kwon, Jong Chul Ye

    Abstract: Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarit… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  50. arXiv:2312.03013  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Breast Ultrasound Report Generation using LangChain

    Authors: Jaeyoung Huh, Hyun Jeong Park, Jong Chul Ye

    Abstract: Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multipl… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.