Skip to main content

Showing 1–18 of 18 results for author: Heo, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.01234  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Fourier-Modulated Implicit Neural Representation for Multispectral Satellite Image Compression

    Authors: Woojin Cho, Steve Andreas Immanuel, Junhyuk Heo, Darongsae Kwon

    Abstract: Multispectral satellite images play a vital role in agriculture, fisheries, and environmental monitoring. However, their high dimensionality, large data volumes, and diverse spatial resolutions across multiple channels pose significant challenges for data compression and analysis. This paper presents ImpliSat, a unified framework specifically designed to address these challenges through efficient… ▽ More

    Submitted 11 June, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to IGARSS 2025 (Oral)

  2. arXiv:2505.16798  [pdf, ps, other

    eess.AS cs.AI

    SEED: Speaker Embedding Enhancement Diffusion Model

    Authors: KiHyun Nam, Jungwoo Heo, Jee-weon Jung, Gangin Park, Chaeyoung Jung, Ha-Jin Yu, Joon Son Chung

    Abstract: A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a pre-trained speaker recognition model and generates refined embeddings. For training, our approach progressively adds Gaussian noise to both clean and noisy speaker e… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025. The official code can be found at https://github.com/kaistmm/seed-pytorch

  3. arXiv:2503.03785  [pdf, other

    eess.IV cs.LG

    Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model

    Authors: Steve Andreas Immanuel, Woojin Cho, Junhyuk Heo, Darongsae Kwon

    Abstract: Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approa… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Accepted to ICLRW 2025 (Oral)

  4. arXiv:2502.08179  [pdf, other

    eess.SP

    Can TDD Be Employed in LEO SatCom Systems? Challenges and Potential Approaches

    Authors: Hyunwoo Lee, Ian P. Roberts, Jehyun Heo, Joohyun Son, Hanwoong Kim, Yunseo Lee, Daesik Hong

    Abstract: Frequency-division duplexing (FDD) remains the de facto standard in modern low Earth orbit (LEO) satellite communication (SatCom) systems, such as SpaceX's Starlink, OneWeb, and Amazon's Project Kuiper. While time-division duplexing (TDD) is often regarded as superior in today's terrestrial networks, its viability in future LEO SatCom systems remains unclear. This article details how the long prop… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  5. arXiv:2412.00124  [pdf, other

    cs.CV eess.IV

    Auto-Encoded Supervision for Perceptual Image Super-Resolution

    Authors: MinKyu Lee, Sangeek Hyun, Woojin Jun, Jae-Pil Heo

    Abstract: This work tackles the fidelity objective in the perceptual super-resolution~(SR). Specifically, we address the shortcomings of pixel-level $L_\text{p}$ loss ($\mathcal{L}_\text{pix}$) in the GAN-based SR framework. Since $L_\text{pix}$ is known to have a trade-off relationship against perceptual quality, prior methods often multiply a small scale factor or utilize low-pass filters. However, this w… ▽ More

    Submitted 11 April, 2025; v1 submitted 28 November, 2024; originally announced December 2024.

    Comments: Codes are available at https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution

  6. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  7. arXiv:2405.05426  [pdf, other

    eess.SY

    ATLS: Automated Trailer Loading for Surface Vessels

    Authors: Amer Abughaida, Meet Gandhi, Jun Heo, Vaishnav Tadiparthi, Yosuke Sakamoto, Joohyun Woo, Sangjae Bae

    Abstract: Automated docking technologies of marine boats have been enlightened by an increasing number of literature. This paper contributes to the literature by proposing a mathematical framework that automates "trailer loading" in the presence of wind disturbances, which is unexplored despite its importance to boat owners. The comprehensive pipeline of localization, system identification, and trajectory o… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: To be presented at IEEE Intelligent Vehicles Symposium (IV 2024)

  8. arXiv:2309.08320  [pdf, other

    eess.AS cs.SD

    Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

    Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

    Abstract: Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV… ▽ More

    Submitted 13 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, accepted for ICASSP 2024

  9. arXiv:2309.08208  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

    Authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu

    Abstract: Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

  10. arXiv:2309.04549  [pdf, other

    cs.CV cs.DC cs.MM eess.IV

    Poster: Making Edge-assisted LiDAR Perceptions Robust to Lossy Point Cloud Compression

    Authors: Jin Heo, Gregorie Phillips, Per-Erik Brodin, Ada Gavrilovska

    Abstract: Real-time light detection and ranging (LiDAR) perceptions, e.g., 3D object detection and simultaneous localization and mapping are computationally intensive to mobile devices of limited resources and often offloaded on the edge. Offloading LiDAR perceptions requires compressing the raw sensor data, and lossy compression is used for efficiently reducing the data volume. Lossy compression degrades t… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: extended abstract of 2 pages, 2 figures, 1 table

  11. arXiv:2307.10628  [pdf, other

    eess.AS cs.SD

    PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

    Authors: Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-Jin Yu

    Abstract: Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environ… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference paper

  12. arXiv:2305.17394  [pdf, other

    eess.AS cs.SD

    One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

    Authors: Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-Jin Yu

    Abstract: The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods coul… ▽ More

    Submitted 7 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ISCA INTERSPEECH 2023

  13. arXiv:2211.02227  [pdf, other

    eess.AS cs.SD

    Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

    Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

    Abstract: The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive mod… ▽ More

    Submitted 1 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures

  14. arXiv:2211.01599  [pdf, other

    eess.AS cs.SD

    Convolution channel separation and frequency sub-bands aggregation for music genre classification

    Authors: Jungwoo Heo, Hyun-seo Shin, Ju-ho Kim, Chan-yeong Lim, Ha-Jin Yu

    Abstract: In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  15. Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

    Authors: Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin

    Abstract: The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating AS… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference paper

    Journal ref: Proc. Interspeech 2022

  16. arXiv:2206.13044  [pdf, other

    eess.AS cs.SD

    Extended U-Net for Speaker Verification in Noisy Environments

    Authors: Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu

    Abstract: Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility. Various studies have used separate pretrained enhancement models as the front-end module of the SV system in noisy environments, and these methods effectively remove noises. However, the denoising process of independent enhancement models n… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 Interspeech as a conference paper

  17. arXiv:2112.12343  [pdf, other

    cs.SD eess.AS

    Graph attentive feature aggregation for text-independent speaker verification

    Authors: Hye-jin Shim, Jungwoo Heo, Jae-han Park, Ga-hui Lee, Ha-Jin Yu

    Abstract: The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pairwise relationship. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be dir… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 5 pages, 1 figure, 6 tables, submitted to ICASSP 2022

  18. arXiv:2112.07935  [pdf, other

    eess.AS

    RawNeXt: Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies

    Authors: Ju-ho Kim, Hye-jin Shim, Jungwoo Heo, Ha-Jin Yu

    Abstract: Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems. To deal with this issue, we propose a speaker verification system called RawNeXt that can handle input raw waveforms of arbitrary length by employing the following two components: (1) A deep layer aggregation strate… ▽ More

    Submitted 27 June, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 ICASSP as a conference paper