Skip to main content

Showing 1–18 of 18 results for author: Um, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.17077  [pdf, other

    physics.optics cs.AI physics.comp-ph

    Physics-guided and fabrication-aware inverse design of photonic devices using diffusion models

    Authors: Dongjin Seo, Soobin Um, Sangbin Lee, Jong Chul Ye, Haejun Chung

    Abstract: Designing free-form photonic devices is fundamentally challenging due to the vast number of possible geometries and the complex requirements of fabrication constraints. Traditional inverse-design approaches--whether driven by human intuition, global optimization, or adjoint-based gradient methods--often involve intricate binarization and filtering steps, while recent deep learning strategies deman… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 25 pages, 7 Figures

  2. arXiv:2504.01861  [pdf, other

    cs.RO cs.LG

    Corner-Grasp: Multi-Action Grasp Detection and Active Gripper Adaptation for Grasping in Cluttered Environments

    Authors: Yeong Gwang Son, Seunghwan Um, Juyong Hong, Tat Hieu Bui, Hyouk Ryeol Choi

    Abstract: Robotic grasping is an essential capability, playing a critical role in enabling robots to physically interact with their surroundings. Despite extensive research, challenges remain due to the diverse shapes and properties of target objects, inaccuracies in sensing, and potential collisions with the environment. In this work, we propose a method for effectively grasping in cluttered bin-picking en… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 11 pages, 14 figures

  3. arXiv:2502.06516  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation

    Authors: Soobin Um, Beomsu Kim, Jong Chul Ye

    Abstract: Minority samples are underrepresented instances located in low-density regions of a data manifold, and are valuable in many generative AI applications, such as data augmentation, creative content generation, etc. Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple y… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 29 pages, 11 figures

  4. arXiv:2501.02504  [pdf, other

    cs.CV cs.AI

    Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection

    Authors: Sung Jin Um, Dongjin Kim, Sangmin Lee, Jung Uk Kim

    Abstract: The goal of video moment retrieval and highlight detection is to identify specific segments and highlights based on a given text query. With the rapid growth of video content and the overlap between these tasks, recent works have addressed both simultaneously. However, they still struggle to fully capture the overall video context, making it challenging to determine which words are most relevant.… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: Accepted at AAAI 2025

  5. arXiv:2410.07838  [pdf, other

    cs.CV cs.AI cs.LG

    Minority-Focused Text-to-Image Generation via Prompt Optimization

    Authors: Soobin Um, Jong Chul Ye

    Abstract: We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretr… ▽ More

    Submitted 4 April, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: CVPR 2025 (Oral), 21 pages, 10 figures

  6. arXiv:2409.17285  [pdf, other

    cs.SD cs.AI eess.AS

    SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

    Authors: Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with diffe… ▽ More

    Submitted 15 April, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: IEEE OJSP. Official document lives at: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10839331

  7. arXiv:2409.08711  [pdf, other

    eess.AS cs.AI

    Text-To-Speech Synthesis In The Wild

    Authors: Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms. The recent literature nonetheless shows efforts to train TTS systems using data collected in the wild. While this approach allows for the use of massive quantities of natural speech, until now, there are no common… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 5 pages, submitted to ICASSP 2025 as a conference paper

  8. arXiv:2407.11555  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Generation of Minority Samples Using Diffusion Models

    Authors: Soobin Um, Jong Chul Ye

    Abstract: We present a novel approach for generating minority samples that live on low-density regions of a data manifold. Our framework is built upon diffusion models, leveraging the principle of guided sampling that incorporates an arbitrary energy-based guidance during inference time. The key defining feature of our sampler lies in its \emph{self-contained} nature, \ie, implementable solely with a pretra… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2403.17420  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

    Authors: Dongjin Kim, Sung Jin Um, Sangmin Lee, Jung Uk Kim

    Abstract: The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localizati… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  10. arXiv:2308.06087  [pdf, other

    cs.MM cs.AI cs.CV

    Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization

    Authors: Sung Jin Um, Dongjin Kim, Jung Uk Kim

    Abstract: The objective of the sound source localization task is to enable machines to detect the location of sound-making objects within a visual scene. While the audio modality provides spatial cues to locate the sound source, existing approaches only use audio as an auxiliary role to compare spatial regions of the visual modality. Humans, on the other hand, utilize both audio and visual modalities as spa… ▽ More

    Submitted 17 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Camera-Ready, ACM MM 2023

  11. arXiv:2301.12334  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Don't Play Favorites: Minority Guidance for Diffusion Models

    Authors: Soobin Um, Suhyeon Lee, Jong Chul Ye

    Abstract: We explore the problem of generating minority samples using diffusion models. The minority samples are instances that lie on low-density regions of a data manifold. Generating a sufficient number of such minority instances is important, since they often contain some unique attributes of the data. However, the conventional generation process of the diffusion models mostly yields majority samples (t… ▽ More

    Submitted 26 February, 2024; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: ICLR 2024

  12. arXiv:2205.04104  [pdf, other

    eess.AS cs.AI

    ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

    Authors: Sangshin Oh, Seyun Um, Hong-Goo Kang

    Abstract: The Gumbel-softmax distribution, or Concrete distribution, is often used to relax the discrete characteristics of a categorical distribution and enable back-propagation through differentiable reparameterization. Although it reliably yields low variance gradients, it still relies on a stochastic sampling process for optimization. In this work, we present a relaxed categorical analytic bound (ReCAB)… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  13. Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech

    Authors: Hyungchan Yoon, Seyun Um, Changwhan Kim, Hong-Goo Kang

    Abstract: To simplify the generation process, several text-to-speech (TTS) systems implicitly learn intermediate latent representations instead of relying on predefined features (e.g., mel-spectrogram). However, their generation quality is unsatisfactory as these representations lack speech variances. In this paper, we improve TTS performance by adding \emph{prosody embeddings} to the latent representations… ▽ More

    Submitted 28 August, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: INTERSPEECH 2023

    MSC Class: 68T07 (Primary) 68T50; 68T99 (Secondary) ACM Class: I.2.7; I.2.6

  14. arXiv:2204.01387  [pdf, other

    eess.AS cs.CR cs.LG cs.SD

    Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

    Authors: Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim

    Abstract: Recent advances in sophisticated synthetic speech generated from text-to-speech (TTS) or voice conversion (VC) systems cause threats to the existing automatic speaker verification (ASV) systems. Since such synthetic speech is generated from diverse algorithms, generalization ability with using limited training data is indispensable for a robust anti-spoofing system. In this work, we propose a tran… ▽ More

    Submitted 14 December, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  15. arXiv:2202.03601  [pdf

    cs.AR

    Two-Step Spike Encoding Scheme and Architecture for Highly Sparse Spiking-Neural-Network

    Authors: Sangyeob Kim, Sangjin Kim, Soyeon Um, Soyeon Kim, Hoi-Jun Yoo

    Abstract: This paper proposes a two-step spike encoding scheme, which consists of the source encoding and the process encoding for a high energy-efficient spiking-neural-network (SNN) acceleration. The eigen-train generation and its superposition generate spike trains which show high accuracy with low spike ratio. Sparsity boosting (SB) and spike generation skipping (SGS) reduce the amount of operations for… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: 5 pages, 10 figures

  16. arXiv:2107.12003  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

    Authors: Se-Yun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang

    Abstract: In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions. Using a generative adversarial network (GAN) with linguistic and speaker characteristic features as auxiliary conditions, our method directly converts face images into speech waveforms under an end-to-end training framework. The linguistic features are extracted from li… ▽ More

    Submitted 15 March, 2023; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: 5 pages (including references), 1 figure

  17. arXiv:1911.01635  [pdf, other

    eess.AS cs.SD

    Emotional speech synthesis with rich and granularized control

    Authors: Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

    Abstract: This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion… ▽ More

    Submitted 5 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Submitted to ICASSP 2020

  18. arXiv:1704.05231  [pdf, other

    cs.CV

    Fast 2-D Complex Gabor Filter with Kernel Decomposition

    Authors: Suhyuk Um, Jaeyoon Kim, Dongbo Min

    Abstract: 2-D complex Gabor filtering has found numerous applications in the fields of computer vision and image processing. Especially, in some applications, it is often needed to compute 2-D complex Gabor filter bank consisting of the 2-D complex Gabor filtering outputs at multiple orientations and frequencies. Although several approaches for fast 2-D complex Gabor filtering have been proposed, they prima… ▽ More

    Submitted 18 April, 2017; originally announced April 2017.