Skip to main content

Showing 1–16 of 16 results for author: Hwang, S J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.12233  [pdf, ps, other

    eess.IV cs.CV

    PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning

    Authors: Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, Seong Jae Hwang

    Abstract: Retinal foundation models have significantly advanced retinal image analysis by leveraging self-supervised learning to reduce dependence on labeled data while achieving strong generalization. Many recent approaches enhance retinal image understanding using report supervision, but obtaining clinical reports is often costly and challenging. In contrast, metadata (e.g., age, gender) is widely availab… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: MICCAI2025 early accept

  2. arXiv:2503.18642  [pdf, other

    eess.IV cs.CV

    Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata Integration

    Authors: Taejin Jeong, Joohyeok Kim, Jaehoon Joo, Yeonwoo Jung, Hyeonmin Kim, Seong Jae Hwang

    Abstract: Glaucoma is an incurable ophthalmic disease that damages the optic nerve, leads to vision loss, and ranks among the leading causes of blindness worldwide. Diagnosing glaucoma typically involves fundus photography, optical coherence tomography (OCT), and visual field testing. However, the high cost of OCT often leads to reliance on fundus photography and visual field testing, both of which exhibit… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  3. arXiv:2503.10055  [pdf, other

    cs.CV eess.IV

    Fourier Decomposition for Explicit Representation of 3D Point Cloud Attributes

    Authors: Donghyun Kim, Hyunah Ko, Chanyoung Kim, Seong Jae Hwang

    Abstract: While 3D point clouds are widely utilized across various vision applications, their irregular and sparse nature make them challenging to handle. In response, numerous encoding approaches have been proposed to capture the rich semantic information of point clouds. Yet, a critical limitation persists: a lack of consideration for colored point clouds which are more capable 3D representations as they… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  4. arXiv:2407.07517  [pdf, other

    eess.IV cs.CV

    Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction

    Authors: Yumin Kim, Gayoon Choi, Seong Jae Hwang

    Abstract: Reducing scan time in Positron Emission Tomography (PET) imaging while maintaining high-quality images is crucial for minimizing patient discomfort and radiation exposure. Due to the limited size of datasets and distribution discrepancy across scanners in medical imaging, fine-tuning in a parameter-efficient and effective manner is on the rise. Motivated by the potential of Parameter-Efficient Fin… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  5. arXiv:2407.05059  [pdf, other

    eess.IV cs.CV

    Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model

    Authors: Kyobin Choo, Youngjun Jun, Mijin Yun, Seong Jae Hwang

    Abstract: In neuroimaging, generally, brain CT is more cost-effective and accessible imaging option compared to MRI. Nevertheless, CT exhibits inferior soft-tissue contrast and higher noise levels, yielding less precise structural clarity. In response, leveraging more readily available CT to construct its counterpart MRI, namely, medical image-to-image translation (I2I), serves as a promising solution. Part… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, Early accepted at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024

    ACM Class: I.4.5; I.4.9; J.3

  6. arXiv:2305.13831  [pdf, other

    cs.SD cs.CL eess.AS

    ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

    Authors: Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

    Abstract: Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen speakers during training, without consideration of the generalization to unseen speakers. In this paper, we propose ZET-Speech, a zero-shot adaptive emotion-controllab… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  7. arXiv:2303.01105  [pdf, other

    eess.IV cs.CV cs.LG

    Evidence-empowered Transfer Learning for Alzheimer's Disease

    Authors: Kai Tzu-iunn Ong, Hana Kim, Minjin Kim, Jinseong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, Jinyoung Yeo

    Abstract: Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we pres… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023. The authorship was changed from co-first authors to a single first author, which was authorized by the adviser/corresponding author Jinyoung Yeo (Apr 18th, 2023)

  8. arXiv:2211.09383  [pdf, other

    eess.AS cs.AI cs.SD

    Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

    Authors: Minki Kang, Dongchan Min, Sung Ju Hwang

    Abstract: There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers' styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adapt… ▽ More

    Submitted 13 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

  9. arXiv:2208.10922  [pdf, other

    cs.CV cs.LG eess.AS eess.IV

    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

    Authors: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang

    Abstract: We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

  10. arXiv:2111.08988  [pdf, other

    cs.GR cs.LG eess.IV eess.SP

    LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks

    Authors: Berivan Isik, Philip A. Chou, Sung Jin Hwang, Nick Johnston, George Toderici

    Abstract: We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 30 pages, 29 figures

  11. arXiv:2106.03153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

    Authors: Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

    Abstract: With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho… ▽ More

    Submitted 16 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by ICML 2021

  12. arXiv:2102.13147  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Domain Learning by Meta-Learning: Taking Optimal Steps in Multi-Domain Loss Landscapes by Inner-Loop Learning

    Authors: Anthony Sicilia, Xingchen Zhao, Davneet Minhas, Erin O'Connor, Howard Aizenstein, William Klunk, Dana Tudorascu, Seong Jae Hwang

    Abstract: We consider a model-agnostic solution to the problem of Multi-Domain Learning (MDL) for multi-modal applications. Many existing MDL techniques are model-dependent solutions which explicitly require nontrivial architectural changes to construct domain-specific modules. Thus, properly applying these MDL techniques for new problems with well-established models, e.g. U-Net for semantic segmentation, m… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: IEEE International Symposium on Biomedical Imaging 2021

  13. arXiv:2007.03034  [pdf, other

    cs.IT eess.IV

    Nonlinear Transform Coding

    Authors: Johannes Ballé, Philip A. Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung Jin Hwang, George Toderici

    Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the… ▽ More

    Submitted 23 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: 17 pages, 14 figures. Accepted for publication in IEEE Journal of Selected Topics in Signal Processing

  14. arXiv:2004.02863  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

    Authors: Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

    Abstract: In practical settings, a speaker recognition system needs to identify a speaker given a short utterance, while the enrollment utterance may be relatively long. However, existing speaker recognition models perform poorly with such short utterances. To solve this problem, we introduce a meta-learning framework for imbalance length pairs. Specifically, we use a Prototypical Networks and train it with… ▽ More

    Submitted 10 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020. The codes are available at https://github.com/seongmin-kye/meta-SR

  15. arXiv:1911.12990   

    cs.CV cs.LG eess.IV

    Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization

    Authors: Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang

    Abstract: Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] s… ▽ More

    Submitted 7 September, 2021; v1 submitted 29 November, 2019; originally announced November 2019.

    Comments: New submission with another link

  16. arXiv:1802.01436  [pdf, other

    eess.IV cs.IT

    Variational image compression with a scale hyperprior

    Authors: Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston

    Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unl… ▽ More

    Submitted 1 May, 2018; v1 submitted 31 January, 2018; originally announced February 2018.

    Comments: accepted as a conference contribution to International Conference on Learning Representations 2018