Skip to main content

Showing 1–50 of 198 results for author: Kim, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.07851  [pdf, other

    eess.IV cs.AI cs.CV cs.RO

    Pose Estimation for Intra-cardiac Echocardiography Catheter via AI-Based Anatomical Understanding

    Authors: Jaeyoung Huh, Ankur Kapoor, Young-Ho Kim

    Abstract: Intra-cardiac Echocardiography (ICE) plays a crucial role in Electrophysiology (EP) and Structural Heart Disease (SHD) interventions by providing high-resolution, real-time imaging of cardiac structures. However, existing navigation methods rely on electromagnetic (EM) tracking, which is susceptible to interference and position drift, or require manual adjustments based on operator expertise. To o… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  2. arXiv:2505.05518  [pdf, other

    eess.IV cs.CV cs.RO

    Guidance for Intra-cardiac Echocardiography Manipulation to Maintain Continuous Therapy Device Tip Visibility

    Authors: Jaeyoung Huh, Ankur Kapoor, Young-Ho Kim

    Abstract: Intra-cardiac Echocardiography (ICE) plays a critical role in Electrophysiology (EP) and Structural Heart Disease (SHD) interventions by providing real-time visualization of intracardiac structures. However, maintaining continuous visibility of the therapy device tip remains a challenge due to frequent adjustments required during manual ICE catheter manipulation. To address this, we propose an AI-… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  3. arXiv:2505.01366  [pdf, other

    eess.SY

    Deep Learning-Enabled System Diagnosis in Microgrids: A Feature-Feedback GAN Approach

    Authors: Swetha Rani Kasimalla, Kuchan Park, Junho Hong, Young-Jin Kim, HyoJong Lee

    Abstract: The increasing integration of inverter-based resources (IBRs) and communication networks has brought both modernization and new vulnerabilities to the power system infrastructure. These vulnerabilities expose the system to internal faults and cyber threats, particularly False Data Injection (FDI) attacks, which can closely mimic real fault scenarios. Hence, this work presents a two-stage fault and… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  4. arXiv:2503.19945  [pdf, other

    eess.IV cs.AI cs.CV

    Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification

    Authors: Daniel G. P. Petrini, Hae Yong Kim

    Abstract: This study explores open questions in the application of machine learning for breast cancer detection in mammograms. Current approaches often employ a two-stage transfer learning process: first, adapting a backbone model trained on natural images to develop a patch classifier, which is then used to create a single-view whole-image classifier. Additionally, many studies leverage both mammographic v… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 8 pages

  5. arXiv:2503.12907  [pdf, ps, other

    eess.SP cs.IT

    Robust Deep Joint Source Channel Coding for Task-Oriented Semantic Communications

    Authors: Taewoo Park, Eunhye Hong, Yo-Seb Jeon, Namyoon Lee, Yongjune Kim

    Abstract: Semantic communications based on deep joint source-channel coding (JSCC) aim to improve communication efficiency by transmitting only task-relevant information. However, ensuring robustness to the stochasticity of communication channels remains a key challenge in learning-based JSCC. In this paper, we propose a novel regularization technique for learning-based JSCC to enhance robustness against ch… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  6. arXiv:2503.08214  [pdf, other

    cs.RO eess.SY

    Safety-Ensured Control Framework for Robotic Endoscopic Task Automation

    Authors: Yitaek Kim, IƱigo Iturrate, Christoffer Sloth, Hansoul Kim

    Abstract: There is growing interest in automating surgical tasks using robotic systems, such as endoscopy for treating gastrointestinal (GI) cancer. However, previous studies have primarily focused on detecting and analyzing objects or robots, with limited attention to ensuring safety, which is critical for clinical applications, where accidents can be caused by unsafe robot motions. In this study, we propo… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: This paper is submitted to IEEE Access

  7. arXiv:2503.02203  [pdf, other

    eess.SP

    Low Complexity Frequency Domain Nonlinear Self-Interference Cancellation for Flexible Duplex

    Authors: Yonghwi Kim, Kai-Kit Wong, Jianzhong, Zhang, Chan-Byoung Chae

    Abstract: Nonlinear self-interference (SI) cancellation is essential for mitigating the impact of transmitter-side nonlinearity on overall SI cancellation performance in flexible duplex systems, including in-band full-duplex (IBFD) and sub-band full-duplex (SBFD). Digital SI cancellation (SIC) must address the nonlinearity in the power amplifier (PA) and the in-phase/quadrature-phase (IQ) imbalance from up/… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 16 pages, 9 figures

  8. arXiv:2502.13986  [pdf, other

    eess.IV

    Structure-from-Sherds++: Robust Incremental 3D Reassembly of Axially Symmetric Pots from Unordered and Mixed Fragment Collections

    Authors: Seong Jong Yoo, Sisung Liu, Muhammad Zeeshan Arshad, Jinhyeok Kim, Young Min Kim, Yiannis Aloimonos, Cornelia Fermuller, Kyungdon Joo, Jinwook Kim, Je Hyeong Hong

    Abstract: Reassembling multiple axially symmetric pots from fragmentary sherds is crucial for cultural heritage preservation, yet it poses significant challenges due to thin and sharp fracture surfaces that generate numerous false positive matches and hinder large-scale puzzle solving. Existing global approaches, which optimize all potential fragment pairs simultaneously or data-driven models, are prone to… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 24 pages

  9. arXiv:2502.11478  [pdf, other

    cs.SD cs.LG eess.AS

    TAPS: Throat and Acoustic Paired Speech Dataset for Deep Learning-Based Speech Enhancement

    Authors: Yunsik Kim, Yonghun Song, Yoonyoung Chung

    Abstract: In high-noise environments such as factories, subways, and busy streets, capturing clear speech is challenging due to background noise. Throat microphones provide a solution with their noise-suppressing properties, reducing the noise while recording speech. However, a significant limitation remains: high-frequency information is attenuated as sound waves pass through skin and tissue, reducing spee… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  10. arXiv:2502.09283  [pdf, other

    eess.SP

    Rate-Splitting Multiple Access for 6G: Prototypes, Experimental Results and Link/System level Simulations

    Authors: Sundar Aditya, Yong Jin Daniel Kim, David Vargas, David Redgate, Onur Dizdar, Neil Bhushan, Xinze Lyu, Sibo Zhang, Stephen Wang, Bruno Clerckx

    Abstract: Rate-Splitting Multiple Access (RSMA) is a powerful and versatile physical layer multiple access technique that generalizes and has better interference management capabilities than 5G-based Space Division Multiple Access (SDMA). It is also a rapidly maturing technology, all of which makes it a natural successor to SDMA in 6G. In this article, we describe RSMA's suitability for 6G by presenting: i)… ▽ More

    Submitted 17 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: Submitted to the IEEE Communications Standards Magazine December 2025 Special Issue on "Wireless Technologies for 6G and Beyond: Applications, Implementations, and Standardization"

  11. arXiv:2502.00497  [pdf

    cs.LG eess.SP

    Convolutional Fourier Analysis Network (CFAN): A Unified Time-Frequency Approach for ECG Classification

    Authors: Sam Jeong, Hae Yong Kim

    Abstract: Machine learning has revolutionized biomedical signal analysis, particularly in electrocardiogram (ECG) classification. While convolutional neural networks (CNNs) excel at automatic feature extraction, the optimal integration of time- and frequency-domain information remains unresolved. This study introduces the Convolutional Fourier Analysis Network (CFAN), a novel architecture that unifies time-… ▽ More

    Submitted 13 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

  12. arXiv:2501.19010  [pdf, other

    cs.CL cs.SD eess.AS

    DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition

    Authors: Wonjun Lee, Solee Im, Heejin Do, Yunsu Kim, Jungseul Ok, Gary Geunbae Lee

    Abstract: Dysarthric speech recognition often suffers from performance degradation due to the intrinsic diversity of dysarthric severity and extrinsic disparity from normal speech. To bridge these gaps, we propose a Dynamic Phoneme-level Contrastive Learning (DyPCL) method, which leads to obtaining invariant representations across diverse speakers. We decompose the speech utterance into phoneme segments for… ▽ More

    Submitted 3 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: NAACL 2025 main conference, 9pages, 1 page appendix

  13. arXiv:2501.10405  [pdf

    eess.SP physics.data-an physics.ins-det

    Stochastic resonance in Schmitt trigger and its application towards weak signal detection

    Authors: Yoonkang Kim, Donghyeok Seo

    Abstract: This study explores stochastic resonance (SR) in a Schmitt trigger circuit and its application to weak signal detection. SR, a phenomenon where noise synchronizes with weak signals to enhance detectability, was demonstrated using a custom-designed bi-stable Schmitt trigger system. The circuit's bi-stability was validated through hysteresis curve analysis, confirming its suitability for SR studies.… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: 14 pages, 16 figures, 1 table

  14. arXiv:2501.00645  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    SoundBrush: Sound as a Brush for Visual Scene Editing

    Authors: Kim Sung-Bin, Kim Jun-Seong, Junseok Ko, Yewon Kim, Tae-Hyun Oh

    Abstract: We propose SoundBrush, a model that uses sound as a brush to edit and manipulate visual scenes. We extend the generative capabilities of the Latent Diffusion Model (LDM) to incorporate audio information for editing visual scenes. Inspired by existing image-editing works, we frame this task as a supervised learning problem and leverage various off-the-shelf models to construct a sound-paired visual… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

    Comments: AAAI 2025

  15. arXiv:2412.06038  [pdf, other

    eess.SP cs.CV cs.IT

    Vision Transformer-based Semantic Communications With Importance-Aware Quantization

    Authors: Joohyuk Park, Yongjeong Oh, Yongjune Kim, Yo-Seb Jeon

    Abstract: Semantic communications provide significant performance gains over traditional communications by transmitting task-relevant semantic features through wireless channels. However, most existing studies rely on end-to-end (E2E) training of neural-type encoders and decoders to ensure effective transmission of these semantic features. To enable semantic communications without relying on E2E training, t… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  16. arXiv:2412.04591  [pdf, other

    eess.IV cs.CV

    Aberration Correcting Vision Transformers for High-Fidelity Metalens Imaging

    Authors: Byeonghyeon Lee, Youbin Kim, Yongjae Jo, Hyunsu Kim, Hyemi Park, Yangkyu Kim, Debabrata Mandal, Praneeth Chakravarthula, Inki Kim, Eunbyung Park

    Abstract: Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise in various applications. Despite its advantage in miniaturization, its practicality is constrained by spatially varying aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address… ▽ More

    Submitted 25 March, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: 22 pages, 22 figures

  17. arXiv:2411.17277  [pdf, other

    eess.SY

    Minimizing Conservatism in Safety-Critical Control for Input-Delayed Systems via Adaptive Delay Estimation

    Authors: Yitaek Kim, Ersin Das, Jeeseop Kim, Aaron D. Ames, Joel W. Burdick, Christoffer Sloth

    Abstract: Input delays affect systems such as teleoperation and wirelessly autonomous connected vehicles, and may lead to safety violations. One promising way to ensure safety in the presence of delay is to employ control barrier functions (CBFs), and extensions thereof that account for uncertainty: delay adaptive CBFs (DaCBFs). This paper proposes an online adaptive safety control framework for reducing th… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: This paper has been submitted to ECC 2025 for possible publication

  18. arXiv:2411.08761  [pdf, other

    eess.SY

    AI-Enhanced Inverter Fault and Anomaly Detection System for Distributed Energy Resources in Microgrids

    Authors: Swetha Rani Kasimalla, Kuchan Park, Junho Hong, Young-Jin Kim, HyoJong Lee

    Abstract: The integration of Distributed Energy Resources (DERs) into power distribution systems has made microgrids foundational to grid modernization. These DERs, connected through power electronic inverters, create power electronics dominated grid architecture, introducing unique challenges for fault detection. While external line faults are widely studied, inverter faults remain a critical yet underexpl… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: 5 pages, 2 figures, submitted to 2025 IEEE Power and Energy Society General Meeting (PESGM 2025), Austin, TX

  19. arXiv:2411.07833  [pdf, other

    cs.RO eess.SY

    Robust Adaptive Safe Robotic Grasping with Tactile Sensing

    Authors: Yitaek Kim, Jeeseop Kim, Albert H. Li, Aaron D. Ames, Christoffer Sloth

    Abstract: Robotic grasping requires safe force interaction to prevent a grasped object from being damaged or slipping out of the hand. In this vein, this paper proposes an integrated framework for grasping with formal safety guarantees based on Control Barrier Functions. We first design contact force and force closure constraints, which are enforced by a safety filter to accomplish safe grasping with finger… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  20. arXiv:2410.23172  [pdf, other

    eess.SP

    Chernoff fusion of Bernoulli Gaussian max filters

    Authors: Zhijin Chen, Branko Ristic, Du Yong Kim

    Abstract: Statistical dependencies between information sources are rarely known, yet in practical distributed tracking schemes, they must be taken into account in order to prevent track divergences. Chernoff fusion is well-known and universally accepted method that can address the problem of track fusion when the statistical dependence between the fusing sources is unknown. In this paper we derive the exact… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  21. arXiv:2410.22659  [pdf, ps, other

    eess.SP cs.CE

    Property Estimation in Geotechnical Databases Using Labeled Random Finite Sets

    Authors: Changbeom Shim, Youngho Kim, Craig Butterworth

    Abstract: The sufficiency of accurate data is a core element in data-centric geotechnics. However, geotechnical datasets are essentially uncertain, whereupon engineers have difficulty with obtaining precise information for making decisions. This challenge is more apparent when the performance of data-driven technologies solely relies on imperfect databases or even when it is sometimes difficult to investiga… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

  22. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  23. arXiv:2410.14122  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation

    Authors: Yonghyun Kim, Alexander Lerch

    Abstract: Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained o… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted to the Late-Breaking Demo Session of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024

  24. arXiv:2410.00068  [pdf

    eess.IV cs.LG stat.AP

    Denoising VAE as an Explainable Feature Reduction and Diagnostic Pipeline for Autism Based on Resting state fMRI

    Authors: Xinyuan Zheng, Orren Ravid, Robert A. J. Barry, Yoojean Kim, Qian Wang, Young-geun Kim, Xi Zhu, Xiaofu He

    Abstract: Autism spectrum disorders (ASDs) are developmental conditions characterized by restricted interests and difficulties in communication. The complexity of ASD has resulted in a deficiency of objective diagnostic biomarkers. Deep learning methods have gained recognition for addressing these challenges in neuroimaging analysis, but finding and interpreting such diagnostic biomarkers are still challeng… ▽ More

    Submitted 27 March, 2025; v1 submitted 30 September, 2024; originally announced October 2024.

    ACM Class: J.3; I.4.9; I.4.10

  25. arXiv:2409.18622  [pdf, other

    cs.SD eess.AS

    Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech

    Authors: Youngjae Kim, Yejin Jeon, Gary Geunbae Lee

    Abstract: The difficulty of acquiring abundant, high-quality data, especially in multi-lingual contexts, has sparked interest in addressing low-resource scenarios. Moreover, current literature rely on fixed expressions from language IDs, which results in the inadequate learning of language representations, and the failure to generate speech in unseen languages. To address these challenges, we propose a nove… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Findings

  26. arXiv:2409.18282  [pdf

    eess.IV cs.CV physics.med-ph

    Synthesizing beta-amyloid PET images from T1-weighted Structural MRI: A Preliminary Study

    Authors: Qing Lyu, Jin Young Kim, Jeongchul Kim, Christopher T Whitlow

    Abstract: Beta-amyloid positron emission tomography (A$β$-PET) imaging has become a critical tool in Alzheimer's disease (AD) research and diagnosis, providing insights into the pathological accumulation of amyloid plaques, one of the hallmarks of AD. However, the high cost, limited availability, and exposure to radioactivity restrict the widespread use of A$β$-PET imaging, leading to a scarcity of comprehe… ▽ More

    Submitted 1 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

  27. arXiv:2409.17451  [pdf, other

    eess.IV cs.CV

    Study of Subjective and Objective Quality in Super-Resolution Enhanced Broadcast Images on a Novel SR-IQA Dataset

    Authors: Yongrok Kim, Junha Shin, Juhyun Lee, Hyunsuk Ko

    Abstract: To display low-quality broadcast content on high-resolution screens in full-screen format, the application of Super-Resolution (SR), a key consumer technology, is essential. Recently, SR methods have been developed that not only increase resolution while preserving the original image information but also enhance the perceived quality. However, evaluating the quality of SR images generated from low… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  28. arXiv:2409.12476  [pdf, other

    cs.CL cs.SD eess.AS

    AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost

    Authors: Ahmet Gündüz, Yunsu Kim, Kamer Ali Yuksel, Mohamed Al-Badrashiny, Thiago Castro Ferreira, Hassan Sawaf

    Abstract: We present AutoMode-ASR, a novel framework that effectively integrates multiple ASR systems to enhance the overall transcription quality while optimizing cost. The idea is to train a decision model to select the optimal ASR system for each segment based solely on the audio input before running the systems. We achieve this by ensembling binary classifiers determining the preference between two syst… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: SPECOM 2024 Conference

  29. arXiv:2408.08790  [pdf, other

    eess.IV cs.AI cs.CV

    A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks

    Authors: Boa Jang, Youngbin Ahn, Eun Kyung Choe, Chang Ki Yoon, Hyuk Jin Choi, Young-Gon Kim

    Abstract: Artificial intelligence applied to retinal images offers significant potential for recognizing signs and symptoms of retinal conditions and expediting the diagnosis of eye diseases and systemic disorders. However, developing generalized artificial intelligence models for medical data often requires a large number of labeled images representing various disease signs, and most models are typically t… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages, 4 figures

  30. arXiv:2408.06065  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    An Investigation Into Explainable Audio Hate Speech Detection

    Authors: Jinmyeong An, Wonjun Lee, Yejin Jeon, Jungseul Ok, Yunsu Kim, Gary Geunbae Lee

    Abstract: Research on hate speech has predominantly revolved around detection and interpretation from textual inputs, leaving verbal content largely unexplored. While there has been limited exploration into hate speech detection within verbal acoustic speech inputs, the aspect of interpretability has been overlooked. Therefore, we introduce a new task of explainable audio hate speech detection. Specifically… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to SIGDIAL 2024

  31. arXiv:2408.03593  [pdf, other

    eess.AS

    Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting

    Authors: Youkyum Kim, Jaemin Jung, Jihwan Park, Byeong-Yeol Kim, Joon Son Chung

    Abstract: This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment. Since audio data possesses additional acoustic information compared to text, there are discrepancies between these two modalities. To address this challenge, we present ParallelKWS, which utilises self- and cross-attention in a parallel architecture to effectively ca… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  32. arXiv:2407.18505  [pdf, other

    eess.AS

    VoxSim: A perceptual voice similarity dataset

    Authors: Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun, Joon Son Chung

    Abstract: This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice similarity relatively unexplored due to a lack of extensive training data. To address this, we generate about 41k utterance pairs from the VoxCeleb dat… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: INTERSPEECH 2024. The dataset is available from https://mm.kaist.ac.kr/projects/voxsim/

  33. arXiv:2407.09005  [pdf, other

    cs.CV cs.AI eess.IV

    Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset

    Authors: Yongjin Kim, Jinbum Park, Sanha Kang, Hanguen Kim

    Abstract: The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light r… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 11 pages, 9 figures, whitepaper

  34. arXiv:2407.07517  [pdf, other

    eess.IV cs.CV

    Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction

    Authors: Yumin Kim, Gayoon Choi, Seong Jae Hwang

    Abstract: Reducing scan time in Positron Emission Tomography (PET) imaging while maintaining high-quality images is crucial for minimizing patient discomfort and radiation exposure. Due to the limited size of datasets and distribution discrepancy across scanners in medical imaging, fine-tuning in a parameter-efficient and effective manner is on the rise. Motivated by the potential of Parameter-Efficient Fin… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  35. arXiv:2407.05551  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Read, Watch and Scream! Sound Generation from Text and Video

    Authors: Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee

    Abstract: Despite the impressive progress of multimodal generative models, video-to-audio generation still suffers from limited performance and limits the flexibility to prioritize sound synthesis for specific objects within the scene. Conversely, text-to-audio generation methods generate high-quality audio but pose challenges in ensuring comprehensive scene depiction and time-varying control. To tackle the… ▽ More

    Submitted 26 December, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: AAAI2025, Project page: https://naver-ai.github.io/rewas

  36. arXiv:2405.19380  [pdf, other

    stat.ML cs.LG eess.SY

    Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

    Authors: Yeoneung Kim, Gihun Kim, Insoon Yang

    Abstract: We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby acc… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 61 pages, 6 figures

  37. arXiv:2405.13413  [pdf, other

    cs.IT cs.LG eess.SP

    Boosted Neural Decoders: Achieving Extreme Reliability of LDPC Codes for 6G Networks

    Authors: Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No

    Abstract: Ensuring extremely high reliability in channel coding is essential for 6G networks. The next-generation of ultra-reliable and low-latency communications (xURLLC) scenario within 6G networks requires frame error rate (FER) below $10^{-9}$. However, low-density parity-check (LDPC) codes, the standard in 5G new radio (NR), encounter a challenge known as the error floor phenomenon, which hinders to ac… ▽ More

    Submitted 14 November, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, 11 figures

  38. arXiv:2405.09193  [pdf, other

    eess.SY

    Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle Systems

    Authors: Yoo-Bin Bae, Yeong-Ung Kim, Jun-Oh Park, Hyo-Sung Ahn

    Abstract: As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficie… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  39. arXiv:2404.15333  [pdf, other

    eess.SP cs.LG

    EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection

    Authors: JuneYoung Park, Da Young Kim, Yunsoo Kim, Jisu Yoo, Tae Joon Kim

    Abstract: Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are no… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  40. arXiv:2404.07217  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    Attention-aware Semantic Communications for Collaborative Inference

    Authors: Jiwoong Im, Nayoung Kwon, Taewoo Park, Jiheon Woo, Jaeho Lee, Yongjune Kim

    Abstract: We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. There… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 February, 2024; originally announced April 2024.

  41. arXiv:2404.02592  [pdf

    cs.CL cs.SD eess.AS

    Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation

    Authors: Yejin Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  42. arXiv:2404.02477  [pdf, ps, other

    eess.SP cs.AI

    Enhancing Sum-Rate Performance in Constrained Multicell Networks: A Low-Information Exchange Approach

    Authors: Youjin Kim, Jonggyu Jang, Hyun Jong Yang

    Abstract: Despite the extensive research on massive MIMO systems for 5G telecommunications and beyond, the reality is that many deployed base stations are equipped with a limited number of antennas rather than supporting massive MIMO configurations. Furthermore, while the cell-less network concept, which eliminates cell boundaries, is under investigation, practical deployments often grapple with significant… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 5 pages, 12 figures

  43. arXiv:2404.00559  [pdf, other

    eess.SY

    Hierarchical Climate Control Strategy for Electric Vehicles with Door-Opening Consideration

    Authors: Sanghyeon Nam, Hyejin Lee, Youngki Kim, Kyoung hyun Kwak, Kyoungseok Han

    Abstract: This study proposes a novel climate control strategy for electric vehicles (EVs) by addressing door-opening interruptions, an overlooked aspect in EV thermal management. We create and validate an EV simulation model that incorporates door-opening scenarios. Three controllers are compared using the simulation model: (i) a hierarchical non-linear model predictive control (NMPC) with a unique coolant… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: This paper, intended for presentation at the IEEE Intelligent Vehicles Symposium (IV) 2024, comprises six pages and includes eight figures

  44. DeRO: Dead Reckoning Based on Radar Odometry With Accelerometers Aided for Robot Localization

    Authors: Hoang Viet Do, Yong Hun Kim, Joo Han Lee, Min Ho Lee, Jin Woo Song

    Abstract: In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mit… ▽ More

    Submitted 24 November, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 9 pages, 5 figures, 1 table, IROS 2024

    ACM Class: I.2.9

    Journal ref: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8547-8554

  45. arXiv:2403.01256  [pdf

    eess.SY

    Resilient Microgrid Formation Considering Communication Interruptions

    Authors: Jian Zhong, Chen Chen, Young-Jin Kim, Yuxiong Huang, Mengjie Teng, Yiheng Bian, Zhaohong Bie

    Abstract: Distribution system (DS) communication failures following extreme events often degrade monitoring and control functions, thus preventing the acquisition of complete global DS component state information, on which existing post-disaster DS restoration methods are based. This letter proposes methods of inferring the states of DS components in the case of incomplete component state information. By us… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  46. arXiv:2402.16998  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    What Do Language Models Hear? Probing for Auditory Representations in Language Models

    Authors: Jerry Ngo, Yoon Kim

    Abstract: This work explores whether language models encode meaningfully grounded representations of sounds of objects. We learn a linear probe that retrieves the correct text representation of an object given a snippet of audio related to that object, where the sound representation is given by a pretrained audio model. This probe is trained via a contrastive loss that pushes the language representations an… ▽ More

    Submitted 16 August, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Journal ref: 2024.acl-long.297

  47. arXiv:2402.06463  [pdf, other

    eess.IV cs.CV cs.LG

    Cardiac ultrasound simulation for autonomous ultrasound navigation

    Authors: Abdoul Aziz Amadou, Laura Peralta, Paul Dryburgh, Paul Klein, Kaloian Petkov, Richard James Housden, Vivek Singh, Rui Liao, Young-Ho Kim, Florin Christian Ghesu, Tommaso Mansi, Ronak Rajani, Alistair Young, Kawal Rhode

    Abstract: Ultrasound is well-established as an imaging modality for diagnostic and interventional purposes. However, the image quality varies with operator skills as acquiring and interpreting ultrasound images requires extensive training due to the imaging artefacts, the range of acquisition parameters and the variability of patient anatomies. Automating the image acquisition task could improve acquisition… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 24 pages, 10 figures, 5 tables

    ACM Class: I.6.0; I.5.4; J.3

  48. arXiv:2402.05402  [pdf, other

    cs.NI eess.SP eess.SY

    A State-of-the-art Survey on Full-duplex Network Design

    Authors: Yonghwi Kim, Hyung-Joo Moon, Hanju Yoo, Byoungnam, Kim, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: Full-duplex (FD) technology is gaining popularity for integration into a wide range of wireless networks due to its demonstrated potential in recent studies. In contrast to half-duplex (HD) technology, the implementation of FD in networks necessitates considering inter-node interference (INI) from various network perspectives. When deploying FD technology in networks, several critical factors must… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 23 pages, 10 figures, To appear in Proceedings of the IEEE

  49. arXiv:2402.05350  [pdf, other

    cs.CV eess.IV

    Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

    Authors: Junghun Cha, Ali Haider, Seoyun Yang, Hoeyeong Jin, Subin Yang, A. F. M. Shahab Uddin, Jaehyoung Kim, Soo Ye Kim, Sung-Ho Bae

    Abstract: A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to AAAI 2024

  50. arXiv:2401.15313  [pdf, other

    cs.RO cs.CV eess.SY math.OC

    Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

    Authors: Kihoon Shin, Hyunjae Sim, Seungwon Nam, Yonghee Kim, Jae Hu, Kwang-Ki K. Kim

    Abstract: In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: 20 pages, 21 figures

    MSC Class: 93C85; 93E11; 93E24; 90C26; 93E10; 62M20;