Skip to main content

Showing 1–50 of 69 results for author: Kim, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.15513  [pdf, ps, other

    cs.LG cs.RO eess.SY

    KoopCast: Trajectory Forecasting via Koopman Operators

    Authors: Jungjin Lee, Jaeuk Shin, Gihwan Kim, Joonho Han, Insoon Yang

    Abstract: We present KoopCast, a lightweight yet efficient model for trajectory forecasting in general dynamic environments. Our approach leverages Koopman operator theory, which enables a linear representation of nonlinear dynamics by lifting trajectories into a higher-dimensional space. The framework follows a two-stage design: first, a probabilistic neural goal estimator predicts plausible long-term targ… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  2. arXiv:2509.12695  [pdf, ps, other

    eess.SY

    MAPS: A Mode-Aware Probabilistic Scheduling Framework for LPV-Based Adaptive Control

    Authors: Taehun Kim, Guntae Kim, Cheolmin Jeong, Chang Mook Kang

    Abstract: This paper proposes Mode-Aware Probabilistic Scheduling (MAPS), a novel adaptive control framework tailored for DC motor systems experiencing varying friction. MAPS uniquely integrates an Interacting Multiple Model (IMM) estimator with a Linear Parameter-Varying (LPV) based control strategy, leveraging real-time mode probability estimates to perform probabilistic gain scheduling. A key innovation… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  3. arXiv:2508.20976  [pdf, ps, other

    cs.SD cs.AI eess.AS

    WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations

    Authors: Jaeyeon Kim, Heeseung Yun, Sang Hoon Woo, Chao-Han Huck Yang, Gunhee Kim

    Abstract: Large audio language models (LALMs) extend language understanding into the auditory domain, yet their ability to perform low-level listening, such as pitch and duration detection, remains underexplored. However, low-level listening is critical for real-world, out-of-distribution tasks where models must reason about unfamiliar sounds based on fine-grained acoustic cues. To address this gap, we intr… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

    Comments: Preprint. Project page: https://jaeyeonkim99.github.io/wow_bench/

  4. arXiv:2507.06481  [pdf, ps, other

    cs.SD eess.AS

    IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer

    Authors: Changheon Han, Yuseop Sim, Hoin Jung, Jiho Lee, Hojun Lee, Yun Seok Kang, Sucheol Woo, Garam Kim, Hyung Wook Park, Martin Byung-Guk Jun

    Abstract: Acoustic signals from industrial machines offer valuable insights for anomaly detection, predictive maintenance, and operational efficiency enhancement. However, existing task-specific, supervised learning methods often scale poorly and fail to generalize across diverse industrial scenarios, whose acoustic characteristics are distinct from general audio. Furthermore, the scarcity of accessible, la… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  5. arXiv:2507.02981  [pdf, ps, other

    eess.SY

    Determination of Bandwidth of Q-filter in Disturbance Observers to Guarantee Transient and Steady State Performance under Measurement Noise

    Authors: Gaeun Kim, Hyungbo Shim

    Abstract: Q-filter-based disturbance observer (DOB) is one of the most widely used robust controller due to its design simplicity. Such simplicity arises from that reducing the time constant of low pass filters, not only ensures robust stability but also enhances nominal performance recovery -- ability to recover the trajectory of nominal closed-loop system. However, in contrast to noise-free environment, e… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  6. arXiv:2506.12199  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ViSAGe: Video-to-Spatial Audio Generation

    Authors: Jaeyeon Kim, Heeseung Yun, Gunhee Kim

    Abstract: Spatial audio is essential for enhancing the immersiveness of audio-visual experiences, yet its production typically demands complex recording systems and specialized expertise. In this work, we address a novel problem of generating first-order ambisonics, a widely used spatial audio format, directly from silent videos. To support this task, we introduce YT-Ambigen, a dataset comprising 102K 5-sec… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: ICLR 2025. Project page: https://jaeyeonkim99.github.io/visage/

  7. arXiv:2505.23085  [pdf, ps, other

    cs.CV cs.AI eess.IV

    GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion

    Authors: Gwanghyun Kim, Xueting Li, Ye Yuan, Koki Nagano, Tianye Li, Jan Kautz, Se Young Chun, Umar Iqbal

    Abstract: Estimating accurate and temporally consistent 3D human geometry from videos is a challenging problem in computer vision. Existing methods, primarily optimized for single images, often suffer from temporal inconsistencies and fail to capture fine-grained dynamic details. To address these limitations, we present GeoMan, a novel architecture designed to produce accurate and temporally consistent dept… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project page: https://research.nvidia.com/labs/dair/geoman

  8. arXiv:2505.13976  [pdf, ps, other

    eess.AS cs.SD

    Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection

    Authors: Taewoo Kim, Guisik Kim, Choongsang Cho, Young Han Lee

    Abstract: Recent advances in speech deepfake detection (SDD) have significantly improved artifacts-based detection in spoofed speech. However, most models overlook speech naturalness, a crucial cue for distinguishing bona fide speech from spoofed speech. This study proposes naturalness-aware curriculum learning, a novel training framework that leverages speech naturalness to enhance the robustness and gener… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  9. arXiv:2505.07365  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

    Authors: Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Gunhee Kim, Jun Du, Rafael Valle, Bryan Catanzaro

    Abstract: We present Task 5 of the DCASE 2025 Challenge: an Audio Question Answering (AQA) benchmark spanning multiple domains of sound understanding. This task defines three QA subsets (Bioacoustics, Temporal Soundscapes, and Complex QA) to test audio-language models on interactive question-answering over diverse acoustic scenes. We describe the dataset composition (from marine mammal calls to soundscapes… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Preprint. DCASE 2025 Audio QA Challenge: https://dcase.community/challenge2025/task-audio-question-answering

  10. arXiv:2502.03502  [pdf, other

    eess.IV cs.AI cs.GR

    DC-VSR: Spatially and Temporally Consistent Video Super-Resolution with Video Diffusion Prior

    Authors: Janghyeok Han, Gyujin Sim, Geonung Kim, Hyun-seung Lee, Kyuha Choi, Youngseok Han, Sunghyun Cho

    Abstract: Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-bas… ▽ More

    Submitted 26 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: Equal contributions from first two authors

  11. arXiv:2501.11225  [pdf, other

    cond-mat.mtrl-sci cs.CV eess.IV

    CNN-based TEM image denoising from first principles

    Authors: Jinwoong Chae, Sungwook Hong, Sungkyu Kim, Sungroh Yoon, Gunn Kim

    Abstract: Transmission electron microscope (TEM) images are often corrupted by noise, hindering their interpretation. To address this issue, we propose a deep learning-based approach using simulated images. Using density functional theory calculations with a set of pseudo-atomic orbital basis sets, we generate highly accurate ground truth images. We introduce four types of noise into these simulations to cr… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: 10 pages and 4 figures

  12. arXiv:2412.01496  [pdf, ps, other

    cs.CV cs.LG eess.IV stat.ML

    Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets

    Authors: Nicholas Konz, Richard Osuala, Preeti Verma, Yuwen Chen, Hanxue Gu, Haoyu Dong, Yaqian Chen, Andrew Marshall, Lidia Garrucho, Kaisar Kushibar, Daniel M. Lang, Gene S. Kim, Lars J. Grimm, John M. Lewin, James S. Duncan, Julia A. Schnabel, Oliver Diaz, Karim Lekadir, Maciej A. Mazurowski

    Abstract: Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perc… ▽ More

    Submitted 6 June, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Codebase for FRD computation: https://github.com/RichardObi/frd-score. Codebase for medical image similarity metric evaluation framework: https://github.com/mazurowski-lab/medical-image-similarity-metrics

  13. arXiv:2407.13564  [pdf, other

    math.OC eess.SY

    Convergence result for the gradient-push algorithm and its application to boost up the Push-DIging algorithm

    Authors: Hyogi Choi, Woocheol Choi, Gwangil Kim

    Abstract: The gradient-push algorithm is a fundamental algorithm for the distributed optimization problem \begin{equation} \min_{x \in \mathbb{R}^d} f(x) = \sum_{j=1}^n f_j (x), \end{equation} where each local cost $f_j$ is only known to agent $a_i$ for $1 \leq i \leq n$ and the agents are connected by a directed graph. In this paper, we obtain convergence results for the gradient-push algorithm with consta… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  14. arXiv:2407.11365  [pdf, other

    eess.AS

    Team HYU ASML ROBOVOX SP Cup 2024 System Description

    Authors: Jeong-Hwan Choi, Gaeun Kim, Hee-Jae Lee, Seyun Ahn, Hyun-Soo Kim, Joon-Hyuk Chang

    Abstract: This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Technical report for IEEE Signal Processing Cup 2024, 9 pages

  15. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  16. arXiv:2406.05270  [pdf

    physics.med-ph cs.CV cs.LG eess.IV

    fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

    Authors: Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

    Abstract: This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  17. arXiv:2406.02562  [pdf, other

    eess.AS cs.AI cs.CL

    Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

    Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

    Abstract: In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter… ▽ More

    Submitted 23 April, 2024; originally announced June 2024.

    Comments: Table 2 is revised

    Journal ref: ICASSP 2024 Workshop(HSCMA 2024) paper

  18. arXiv:2405.19380  [pdf, ps, other

    stat.ML cs.LG eess.SY

    Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

    Authors: Yeoneung Kim, Gihun Kim, Jiwhan Park, Insoon Yang

    Abstract: We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the a… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to be presented at L4DC'25 (Oral)

  19. arXiv:2405.13762  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

    Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

    Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Journal ref: In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

  20. arXiv:2405.11807  [pdf, other

    cs.HC cs.RO eess.SY

    Dual-sided Peltier Elements for Rapid Thermal Feedback in Wearables

    Authors: Seongjun Kang, Gwangbin Kim, Seokhyun Hwang, Jeongju Park, Ahmed Elsharkawy, SeungJun Kim

    Abstract: This paper introduces a motor-driven Peltier device designed to deliver immediate thermal sensations within extended reality (XR) environments. The system incorporates eight motor-driven Peltier elements, facilitating swift transitions between warm and cool sensations by rotating preheated or cooled elements to opposite sides. A multi-layer structure, comprising aluminum and silicone layers, ensur… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 3 pages, 4 figures, ICRA Wearable Workshop 2024 - 1st Workshop on Advancing Wearable Devices and Applications through Novel Design, Sensing, Actuation, and AI

  21. arXiv:2401.08962  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

    Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

    Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  22. arXiv:2312.13313  [pdf, other

    eess.IV cs.CV

    ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

    Authors: Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho

    Abstract: RAW images are rarely shared mainly due to its excessive data size compared to their sRGB counterparts obtained by camera ISPs. Learning the forward and inverse processes of camera ISPs has been recently demonstrated, enabling physically-meaningful RAW-level image processing on input sRGB images. However, existing learning-based ISP methods fail to handle the large variations in the ISP processes… ▽ More

    Submitted 14 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  23. arXiv:2312.05465  [pdf, other

    cs.LG eess.SY

    On Task-Relevant Loss Functions in Meta-Reinforcement Learning and Online LQR

    Authors: Jaeuk Shin, Giho Kim, Howon Lee, Joonho Han, Insoon Yang

    Abstract: Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method e… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  24. arXiv:2310.12574  [pdf

    eess.IV cs.CV

    A reproducible 3D convolutional neural network with dual attention module (3D-DAM) for Alzheimer's disease classification

    Authors: Gia Minh Hoang, Youngjoo Lee, Jae Gwan Kim

    Abstract: Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classifi… ▽ More

    Submitted 2 July, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  25. arXiv:2307.07409  [pdf, other

    cs.CL cs.AI eess.IV

    KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

    Authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang

    Abstract: In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published at BioNLP workshop @ ACL 2023

  26. arXiv:2306.13361  [pdf, other

    physics.optics cs.CV eess.IV

    Neural 360$^\circ$ Structured Light with Learned Metasurfaces

    Authors: Eunsue Choi, Gyeongtae Kim, Jooyeong Yun, Yujin Jeon, Junsuk Rho, Seung-Hwan Baek

    Abstract: Structured light has proven instrumental in 3D imaging, LiDAR, and holographic light projection. Metasurfaces, comprised of sub-wavelength-sized nanostructures, facilitate 180$^\circ$ field-of-view (FoV) structured light, circumventing the restricted FoV inherent in traditional optics like diffractive optical elements. However, extant metasurface-facilitated structured light exhibits sub-optimal p… ▽ More

    Submitted 27 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  27. arXiv:2306.04137  [pdf, other

    cs.MA eess.SY

    Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility

    Authors: Chanyoung Park, Gyu Seon Kim, Soohyun Park, Soyi Jung, Joongheon Kim

    Abstract: The development of urban-air-mobility (UAM) is rapidly progressing with spurs, and the demand for efficient transportation management systems is a rising need due to the multifaceted environmental uncertainties. Thus, this paper proposes a novel air transportation service management algorithm based on multi-agent deep reinforcement learning (MADRL) to address the challenges of multi-UAM cooperatio… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 15 pages, 14 figures

  28. arXiv:2306.00680  [pdf, other

    cs.SD cs.AI eess.AS

    Encoder-decoder multimodal speaker change detection

    Authors: Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee

    Abstract: The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are bui… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted for presentation at INTERSPEECH 2023

  29. arXiv:2210.08997  [pdf, other

    cs.CV cs.LG eess.IV

    AIM 2022 Challenge on Instagram Filter Removal: Methods and Results

    Authors: Furkan Kınlı, Sami Menteş, Barış Özcan, Furkan Kıraç, Radu Timofte, Yi Zuo, Zitao Wang, Xiaowen Zhang, Yu Zhu, Chenghua Li, Cong Leng, Jian Cheng, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Tianzhi Ma, Zihan Gao, Wenxin He, Woon-Ha Yeo, Wang-Taek Oh, Young-Il Kim, Han-Cheol Ryu, Gang He , et al. (8 additional authors not shown)

    Abstract: This paper introduces the methods and the results of AIM 2022 challenge on Instagram Filter Removal. Social media filters transform the images by consecutive non-linear operations, and the feature maps of the original content may be interpolated into a different domain. This reduces the overall performance of the recent deep learning strategies. The main goal of this challenge is to produce realis… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 14 pages, 9 figures, Challenge report of AIM 2022 Instagram Filter Removal Challenge in conjunction with ECCV 2022

  30. arXiv:2207.01520  [pdf, other

    eess.IV cs.CV

    Adaptive GLCM sampling for transformer-based COVID-19 detection on CT

    Authors: Okchul Jung, Dong Un Kang, Gwanghyun Kim, Se Young Chun

    Abstract: The world has suffered from COVID-19 (SARS-CoV-2) for the last two years, causing much damage and change in people's daily lives. Thus, automated detection of COVID-19 utilizing deep learning on chest computed tomography (CT) scans became promising, which helps correct diagnosis efficiently. Recently, transformer-based COVID-19 detection method on CT is proposed to utilize 3D information in CT vol… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 6 pages

  31. arXiv:2205.12633  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

    Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

    Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

  32. arXiv:2205.01304  [pdf, other

    eess.AS cs.SD

    Efficient dynamic filter for robust and low computational feature extraction

    Authors: Donghyeon Kim, Gwantae Kim, Bokyeung Lee, Jeong-gi Kwak, David K. Han, Hanseok Ko

    Abstract: Unseen noise signal which is not considered in a model training process is difficult to anticipate and would lead to performance degradation. Various methods have been investigated to mitigate unseen noise. In our previous work, an Instance-level Dynamic Filter (IDF) and a Pixel Dynamic Filter (PDF) were proposed to extract noise-robust features. However, the performance of the dynamic filter migh… ▽ More

    Submitted 20 October, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: Accept to SLT2022

  33. Fetal Brain Tissue Annotation and Segmentation Challenge Results

    Authors: Kelly Payette, Hongwei Li, Priscille de Dumast, Roxane Licandro, Hui Ji, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Hao Liu, Yuchen Pei, Lisheng Wang, Ying Peng, Juanying Xie, Huiquan Zhang, Guiming Dong, Hao Fu, Guotai Wang, ZunHyan Rieu, Donghyeon Kim, Hyun Gi Kim, Davood Karimi, Ali Gholipour, Helena R. Torres, Bruno Oliveira, João L. Vilaça , et al. (33 additional authors not shown)

    Abstract: In-utero fetal MRI is emerging as an important tool in the diagnosis and analysis of the developing human brain. Automatic segmentation of the developing fetal brain is a vital step in the quantitative analysis of prenatal neurodevelopment both in the research and clinical context. However, manual segmentation of cerebral structures is time-consuming and prone to error and inter-observer variabili… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: Results from FeTA Challenge 2021, held at MICCAI; Manuscript submitted

  34. arXiv:2202.06431  [pdf, other

    eess.IV cs.CV cs.LG

    AI can evolve without labels: self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation

    Authors: Sangjoon Park, Gwanghyun Kim, Yujin Oh, Joon Beom Seo, Sang Min Lee, Jin Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Chang Min Park, Jong Chul Ye

    Abstract: Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, developing a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially i… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

    Comments: 24 pages

  35. arXiv:2201.06735  [pdf

    eess.SP

    AI Augmented Digital Metal Component

    Authors: Eunhyeok Seo, Hyokyung Sung, Hayeol Kim, Taekyeong Kim, Sangeun Park, Min Sik Lee, Seung Ki Moon, Jung Gi Kim, Hayoung Chung, Seong-Kyum Choi, Ji-hun Yu, Kyung Tae Kim, Seong Jin Park, Namhun Kim, Im Doo Jung

    Abstract: The aim of this work is to propose a new paradigm that imparts intelligence to metal parts with the fusion of metal additive manufacturing and artificial intelligence (AI). Our digital metal part classifies the status with real time data processing with convolutional neural network (CNN). The training data for the CNN is collected from a strain gauge embedded in metal parts by laser powder bed fus… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 46 pages

  36. CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

    Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

    Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More

    Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

    Comments: In Medical Image Analysis

  37. arXiv:2111.04028  [pdf, other

    cs.CV eess.IV

    Style Transfer with Target Feature Palette and Attention Coloring

    Authors: Suhyeon Ha, Guisik Kim, Junseok Kwon

    Abstract: Style transfer has attracted a lot of attentions, as it can change a given image into one with splendid artistic styles while preserving the image structure. However, conventional approaches easily lose image details and tend to produce unpleasant artifacts during style transfer. In this paper, to solve these problems, a novel artistic stylization method with target feature palettes is proposed, w… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  38. arXiv:2111.01338  [pdf, other

    eess.IV cs.AI cs.CV

    Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training

    Authors: Sangjoon Park, Gwanghyun Kim, Jeongsol Kim, Boah Kim, Jong Chul Ye

    Abstract: Federated learning, which shares the weights of the neural network across clients, is gaining attention in the healthcare sector as it enables training on a large corpus of decentralized data while maintaining data privacy. For example, this enables neural network training for COVID-19 diagnosis on chest X-ray (CXR) images without collecting patient CXR data across multiple hospitals. Unfortunatel… ▽ More

    Submitted 3 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted for NeurIPS 2021

  39. arXiv:2110.03326  [pdf, other

    cs.CL cs.SD eess.AS

    Back from the future: bidirectional CTC decoding using future information in speech recognition

    Authors: Namkyu Jung, Geonmin Kim, Han-Gyu Kim

    Abstract: In this paper, we propose a simple but effective method to decode the output of Connectionist Temporal Classifier (CTC) model using a bi-directional neural language model. The bidirectional language model uses the future as well as the past information in order to predict the next output in the sequence. The proposed method based on bi-directional beam search takes advantage of the CTC greedy deco… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 2022

  40. arXiv:2110.02791  [pdf, other

    cs.SD cs.CL eess.AS

    Spell my name: keyword boosted speech recognition

    Authors: Namkyu Jung, Geonmin Kim, Joon Son Chung

    Abstract: Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context. However, the ability to recognise such words remains a challenge in modern automatic speech recognition (ASR) systems. In this paper, we propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords, which in turn enables better reada… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  41. arXiv:2109.09041  [pdf, other

    cs.RO eess.SY

    Online Distributed Trajectory Planning for Quadrotor Swarm with Feasibility Guarantee using Linear Safe Corridor

    Authors: Jungwon Park, Dabin Kim, Gyeong Chan Kim, Dahyun Oh, H. Jin Kim

    Abstract: This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optim… ▽ More

    Submitted 3 January, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: 8 pages, RA-L 2022 under review

  42. Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments

    Authors: Jaeuk Shin, Astghik Hakobyan, Mingyu Park, Yeoneung Kim, Gihun Kim, Insoon Yang

    Abstract: The successful operation of mobile robots requires them to adapt rapidly to environmental changes. To develop an adaptive decision-making tool for mobile robots, we propose a novel algorithm that combines meta-reinforcement learning (meta-RL) with model predictive control (MPC). Our method employs an off-policy meta-RL algorithm as a baseline to train a policy using transition samples generated by… ▽ More

    Submitted 7 July, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in the IEEE Robotics and Automation Letters

    Journal ref: IEEE Robotics and Automation Letters, 2022

  43. arXiv:2104.07235  [pdf, other

    eess.IV cs.CV cs.LG

    Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification

    Authors: Sangjoon Park, Gwanghyun Kim, Yujin Oh, Joon Beom Seo, Sang Min Lee, Jin Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Jong Chul Ye

    Abstract: Developing a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 13 pages

  44. arXiv:2104.06782  [pdf, other

    cs.CV eess.IV

    Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

    Authors: Hak Gu Kim, Minho Park, Sangmin Lee, Seongyeop Kim, Yong Man Ro

    Abstract: Depth adjustment aims to enhance the visual experience of stereoscopic 3D (S3D) images, which accompanied with improving visual comfort and depth perception. For a human expert, the depth adjustment procedure is a sequence of iterative decision making. The human expert iteratively adjusts the depth until he is satisfied with the both levels of visual comfort and the perceived depth. In this work,… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: AAAI 2021

  45. arXiv:2104.06780  [pdf, other

    cs.CV eess.IV

    Towards a Better Understanding of VR Sickness: Physical Symptom Prediction for VR Contents

    Authors: Hak Gu Kim, Sangmin Lee, Seongyeop Kim, Heoun-taek Lim, Yong Man Ro

    Abstract: We address the black-box issue of VR sickness assessment (VRSA) by evaluating the level of physical symptoms of VR sickness. For the VR contents inducing the similar VR sickness level, the physical symptoms can vary depending on the characteristics of the contents. Most of existing VRSA methods focused on assessing the overall VR sickness score. To make better understanding of VR sickness, it is r… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: AAAI 2021

  46. arXiv:2103.09022  [pdf, other

    eess.IV cs.CV cs.LG

    Missing Cone Artifacts Removal in ODT using Unsupervised Deep Learning in Projection Domain

    Authors: Hyungjin Chung, Jaeyoung Huh, Geon Kim, Yong Keun Park, Jong Chul Ye

    Abstract: Optical diffraction tomography (ODT) produces three dimensional distribution of refractive index (RI) by measuring scattering fields at various angles. Although the distribution of RI index is highly informative, due to the missing cone problem stemming from the limited-angle acquisition of holograms, reconstructions have very poor resolution along axial direction compared to the horizontal imagin… ▽ More

    Submitted 18 July, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: This will appear in IEEE Trans. on Computational Imaging

  47. arXiv:2103.07062  [pdf, other

    eess.IV cs.CV cs.LG

    Severity Quantification and Lesion Localization of COVID-19 on CXR using Vision Transformer

    Authors: Gwanghyun Kim, Sangjoon Park, Yujin Oh, Joon Beom Seo, Sang Min Lee, Jin Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Jong Chul Ye

    Abstract: Under the global pandemic of COVID-19, building an automated framework that quantifies the severity of COVID-19 and localizes the relevant lesion on chest X-ray images has become increasingly important. Although pixel-level lesion severity labels, e.g. lesion segmentation, can be the most excellent target to build a robust model, collecting enough data with such labels is difficult due to time and… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: 8 pages

  48. arXiv:2103.07055  [pdf, other

    eess.IV cs.CV cs.LG

    Vision Transformer for COVID-19 CXR Diagnosis using Chest X-ray Feature Corpus

    Authors: Sangjoon Park, Gwanghyun Kim, Yujin Oh, Joon Beom Seo, Sang Min Lee, Jin Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Jong Chul Ye

    Abstract: Under the global COVID-19 crisis, developing robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that use… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: 10 pages

  49. arXiv:2102.08567  [pdf

    cs.CV cs.AI eess.IV

    Ensemble Transfer Learning of Elastography and B-mode Breast Ultrasound Images

    Authors: Sampa Misra, Seungwan Jeon, Ravi Managuli, Seiyon Lee, Gyuwon Kim, Seungchul Lee, Richard G Barr, Chulhong Kim

    Abstract: Computer-aided detection (CAD) of benign and malignant breast lesions becomes increasingly essential in breast ultrasound (US) imaging. The CAD systems rely on imaging features identified by the medical experts for their performance, whereas deep learning (DL) methods automatically extract features from the data. The challenge of the DL is the insufficiency of breast US images available to train t… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 17 pages, 10 figures, 6 Tables

  50. arXiv:2010.13105  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding

    Authors: Seongbin Kim, Gyuwan Kim, Seongjin Shin, Sangmin Lee

    Abstract: End-to-end approaches open a new way for more accurate and efficient spoken language understanding (SLU) systems by alleviating the drawbacks of traditional pipeline systems. Previous works exploit textual information for an SLU model via pre-training with automatic speech recognition or fine-tuning with knowledge distillation. To utilize textual information more effectively, this work proposes a… ▽ More

    Submitted 10 June, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: ICASSP 2021; 5 pages, 1 figure