Skip to main content

Showing 1–19 of 19 results for author: Kim, H Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.19945  [pdf, other

    eess.IV cs.AI cs.CV

    Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification

    Authors: Daniel G. P. Petrini, Hae Yong Kim

    Abstract: This study explores open questions in the application of machine learning for breast cancer detection in mammograms. Current approaches often employ a two-stage transfer learning process: first, adapting a backbone model trained on natural images to develop a patch classifier, which is then used to create a single-view whole-image classifier. Additionally, many studies leverage both mammographic v… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 8 pages

  2. arXiv:2502.00497  [pdf

    cs.LG eess.SP

    Convolutional Fourier Analysis Network (CFAN): A Unified Time-Frequency Approach for ECG Classification

    Authors: Sam Jeong, Hae Yong Kim

    Abstract: Machine learning has revolutionized biomedical signal analysis, particularly in electrocardiogram (ECG) classification. While convolutional neural networks (CNNs) excel at automatic feature extraction, the optimal integration of time- and frequency-domain information remains unresolved. This study introduces the Convolutional Fourier Analysis Network (CFAN), a novel architecture that unifies time-… ▽ More

    Submitted 13 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

  3. arXiv:2411.18995  [pdf, other

    cs.CV

    MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers

    Authors: Jongseong Bae, Susang Kim, Minsu Cho, Ha Young Kim

    Abstract: Active research is currently underway to enhance the efficiency of vision transformers (ViTs). Most studies have focused solely on effective token mixers, overlooking the potential relationship with normalization. To boost diverse feature learning, we propose two components: a normalization module called multi-view normalization (MVN) and a token mixer called multi-view token mixer (MVTM). The MVN… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

  4. arXiv:2411.17248  [pdf, other

    cs.CV

    DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion Model

    Authors: JiHwan Moon, Jihoon Park, Jungeun Kim, Jongseong Bae, Hyeongwoo Jeon, Ha Young Kim

    Abstract: Sign language translation (SLT) is challenging, as it involves converting sign language videos into natural language. Previous studies have prioritized accuracy over diversity. However, diversity is crucial for handling lexical and syntactic ambiguities in machine translation, suggesting it could similarly benefit SLT. In this work, we propose DiffSLT, a novel gloss-free SLT framework that leverag… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Project page: https://diffslt.github.io/

  5. arXiv:2411.16789  [pdf, other

    cs.CV cs.CL

    Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

    Authors: Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae, Ha Young Kim

    Abstract: Sign language translation (SLT) is a challenging task that involves translating sign language images into spoken language. For SLT models to perform this task successfully, they must bridge the modality gap and identify subtle variations in sign language components to understand their meanings accurately. To address these challenges, we propose a novel gloss-free SLT framework called Multimodal Si… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  6. arXiv:2411.16129  [pdf, other

    cs.CV

    Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

    Authors: Jongseong Bae, Junwoo Ha, Ha Young Kim

    Abstract: Camera-based Semantic Scene Completion (SSC) is gaining attentions in the 3D perception field. However, properties such as perspective and occlusion lead to the underestimation of the geometry in distant regions, posing a critical issue for safety-focused autonomous driving systems. To tackle this, we propose ScanSSC, a novel camera-based SSC model composed of a Scan Module and Scan Loss, both des… ▽ More

    Submitted 25 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Accepted to CVPR 2025

  7. arXiv:2410.01531  [pdf, other

    cs.LG cs.AI

    TiVaT: A Transformer with a Single Unified Mechanism for Capturing Asynchronous Dependencies in Multivariate Time Series Forecasting

    Authors: Junwoo Ha, Hyukjae Kwon, Sungsoo Kim, Kisu Lee, Seungjae Park, Ha Young Kim

    Abstract: Multivariate time series (MTS) forecasting is vital across various domains but remains challenging due to the need to simultaneously model temporal and inter-variate dependencies. Existing channel-dependent models, where Transformer-based models dominate, process these dependencies separately, limiting their capacity to capture complex interactions such as lead-lag dynamics. To address this issue,… ▽ More

    Submitted 30 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 15pages

    MSC Class: I.2.0

  8. arXiv:2407.12514  [pdf, other

    cs.CL

    On Initializing Transformers with Pre-trained Embeddings

    Authors: Ha Young Kim, Niranjan Balasubramanian, Byungkon Kang

    Abstract: It has become common practice now to use random initialization schemes, rather than the pre-trained embeddings, when training transformer based models from scratch. Indeed, we find that pre-trained word embeddings from GloVe, and some sub-word embeddings extracted from language models such as T5 and mT5 fare much worse compared to random initialization. This is counter-intuitive given the well-kno… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  9. Characterization of Magnetic Labyrinthine Structures Through Junctions and Terminals Detection Using Template Matching and CNN

    Authors: Vinícius Yu Okubo, Kotaro Shimizu, B. S. Shivaram, Hae Yong Kim

    Abstract: Defects influence diverse properties of materials, shaping their structural, mechanical, and electronic characteristics. Among a variety of materials exhibiting unique defects, magnets exhibit diverse nano- to micro-scale defects and have been intensively studied in materials science. Specifically, defects in magnetic labyrinthine patterns, called junctions and terminals are ubiquitous and serve a… ▽ More

    Submitted 18 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 12 pages, 7 figures, published in IEEE Access

    Journal ref: IEEE Access, vol. 12, pp. 92419 - 92430, 2024

  10. arXiv:2309.00372  [pdf, other

    eess.IV cs.CV

    On the Localization of Ultrasound Image Slices within Point Distribution Models

    Authors: Lennart Bastian, Vincent Bürgin, Ha Young Kim, Alexander Baumann, Benjamin Busam, Mahdi Saleh, Nassir Navab

    Abstract: Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US). Longitudinal nodule tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology. This task, however, imposes a substantial cognitive load on clinicians due to the inherent challenge of maintaining a mental 3D reconstruction of the organ. We thus present a framework for autom… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: ShapeMI Workshop @ MICCAI 2023; 12 pages 2 figures

  11. arXiv:2308.15791  [pdf, other

    cs.CV eess.IV

    Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

    Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim

    Abstract: Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In N… ▽ More

    Submitted 5 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  12. arXiv:2307.01227  [pdf, other

    cs.LG cs.AI

    ESGCN: Edge Squeeze Attention Graph Convolutional Network for Traffic Flow Forecasting

    Authors: Sangrok Lee, Ha Young Kim

    Abstract: Traffic forecasting is a highly challenging task owing to the dynamical spatio-temporal dependencies of traffic flows. To handle this, we focus on modeling the spatio-temporal dynamics and propose a network termed Edge Squeeze Graph Convolutional Network (ESGCN) to forecast traffic flow in multiple regions. ESGCN consists of two modules: W-module and ES module. W-module is a fully node-wise convol… ▽ More

    Submitted 12 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: 7 Pages, 3 figures

  13. End-to-End Learnable Multi-Scale Feature Compression for VCM

    Authors: Yeongwoong Kim, Hyewon Jeong, Janghyun Yu, Younhee Kim, Jooyoung Lee, Se Yoon Jeong, Hui Yong Kim

    Abstract: The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision performance instead of human visual quality. In the feature compression track of MPEG-VCM, multi-scale features extracted from images are subject to compressio… ▽ More

    Submitted 8 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 13 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology

  14. arXiv:2304.07515  [pdf, other

    cs.CV cs.LG

    S3M: Scalable Statistical Shape Modeling through Unsupervised Correspondences

    Authors: Lennart Bastian, Alexander Baumann, Emily Hoppe, Vincent Bürgin, Ha Young Kim, Mahdi Saleh, Benjamin Busam, Nassir Navab

    Abstract: Statistical shape models (SSMs) are an established way to represent the anatomy of a population with various clinically relevant applications. However, they typically require domain expertise, and labor-intensive landmark annotations to construct. We address these shortcomings by proposing an unsupervised method that leverages deep geometric features and functional correspondences to simultaneousl… ▽ More

    Submitted 24 July, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

    Comments: Accepted at MICCAI 2023. 13 pages, 6 figures

  15. arXiv:2303.02328  [pdf, ps, other

    cs.CV

    Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization

    Authors: Sangrok Lee, Jongseong Bae, Ha Young Kim

    Abstract: Domain generalization (DG) is a principal task to evaluate the robustness of computer vision models. Many previous studies have used normalization for DG. In normalization, statistics and normalized features are regarded as style and content, respectively. However, it has a content variation problem when removing style because the boundary between content and style is unclear. This study addresses… ▽ More

    Submitted 15 March, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: 10 pages,6 figures, Conference on Computer Vision and Pattern Recognition 2023

  16. arXiv:2111.03664  [pdf, other

    cs.LG eess.AS eess.IV

    Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

    Authors: Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

    Abstract: Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teach… ▽ More

    Submitted 11 August, 2023; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  17. Breast Cancer Diagnosis in Two-View Mammography Using End-to-End Trained EfficientNet-Based Convolutional Network

    Authors: Daniel G. P. Petrini, Carlos Shimizu, Rosimeire A. Roela, Gabriel V. Valente, Maria A. A. K. Folgueira, Hae Yong Kim

    Abstract: Some recent studies have described deep convolutional neural networks to diagnose breast cancer in mammograms with similar or even superior performance to that of human experts. One of the best techniques does two transfer learnings: the first uses a model trained on natural images to create a "patch classifier" that categorizes small subimages; the second uses the patch classifier to scan the who… ▽ More

    Submitted 3 August, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Updated to published version in IEEE Access

    Journal ref: IEEE Access, vol. 10, pp. 77723-77731, 2022

  18. Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo Kim

    Abstract: For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end with… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 7 pages, 3 figures

    Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI} 2020

  19. arXiv:1810.10327  [pdf, other

    cs.CV cs.LG stat.ML

    BshapeNet: Object Detection and Instance Segmentation with Bounding Shape Masks

    Authors: Ba Rom Kang, Ha Young Kim

    Abstract: Recent object detectors use four-coordinate bounding box (bbox) regression to predict object locations. Providing additional information indicating the object positions and coordinates will improve detection performance. Thus, we propose two types of masks: a bbox mask and a bounding shape (bshape) mask, to represent the object's bbox and boundary shape, respectively. For each of these types, we c… ▽ More

    Submitted 31 July, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: 10 pages,6 figures