Skip to main content

Showing 1–49 of 49 results for author: Hwang, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04482  [pdf, ps, other

    cs.CV

    A Training-Free Style-Personalization via Scale-wise Autoregressive Model

    Authors: Kyoungmin Lee, Jihun Park, Jongmin Gim, Wonhyeok Choi, Kyumin Hwang, Jaeyeul Kim, Sunghoon Im

    Abstract: We present a training-free framework for style-personalized image generation that controls content and style information during inference using a scale-wise autoregressive model. Our method employs a three-path design--content, style, and generation--each guided by a corresponding text prompt, enabling flexible and efficient control over image semantics without any additional training. A central c… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: 13 pages, 10 figures

  2. arXiv:2503.22209  [pdf, other

    cs.CV cs.LG

    Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces

    Authors: Wonhyeok Choi, Kyumin Hwang, Minwoo Choi, Kiljoon Han, Wonjoon Choi, Mingyu Shin, Sunghoon Im

    Abstract: Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning as it estimates depth without requiring ground truth depth maps. This approach typically uses a photometric consistency loss between a synthesized image, generated from the estimated depth, and the original image, thereby reducing the need for extensive dataset acquisition. However, the convention… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted at AAAI 2025

  3. arXiv:2503.22172  [pdf, other

    cs.CV

    Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation

    Authors: Minho Park, Sunghyun Park, Jungsoo Lee, Hyojin Park, Kyuwoong Hwang, Fatih Porikli, Jaegul Choo, Sungha Choi

    Abstract: This paper addresses the challenge of data scarcity in semantic segmentation by generating datasets through text-to-image (T2I) generation models, reducing image acquisition and labeling costs. Segmentation dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. Fine-tuning T2I models can help… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  4. arXiv:2503.18244  [pdf, other

    cs.CV

    CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

    Authors: Jungsoo Lee, Debasmit Das, Munawar Hayat, Sungha Choi, Kyuwoong Hwang, Fatih Porikli

    Abstract: We propose a novel knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models (e.g., MobileNetV3). Despite recent advancements in LVFMs, such as DINOv2 and CLIP, their potential in knowledge distillation for enhancing edge models remains underexplored. While knowledge distillation is a promising approach fo… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  5. arXiv:2503.13915  [pdf, other

    cs.CV cs.AI

    Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization

    Authors: Dongkwan Lee, Kyomin Hwang, Nojun Kwak

    Abstract: We address the problem of semi-supervised domain generalization (SSDG), where the distributions of train and test data differ, and only a small amount of labeled data along with a larger amount of unlabeled data are available during training. Existing SSDG methods that leverage only the unlabeled samples for which the model's predictions are highly confident (confident-unlabeled samples), limit th… ▽ More

    Submitted 27 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  6. arXiv:2502.14573  [pdf, other

    cs.CV cs.LG

    Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining

    Authors: Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im

    Abstract: Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image, by learning depth from RGB image sequences, eliminating the need for ground-truth depth labels. Although this approach simplifies data acquisition compared to supervised methods, it struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance, leading to ina… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted at ICLR 2025

  7. arXiv:2501.06293  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.GA cs.AI

    LensNet: Enhancing Real-time Microlensing Event Discovery with Recurrent Neural Networks in the Korea Microlensing Telescope Network

    Authors: Javier Viaña, Kyu-Ha Hwang, Zoë de Beurs, Jennifer C. Yee, Andrew Vanderburg, Michael D. Albrow, Sun-Ju Chung, Andrew Gould, Cheongho Han, Youn Kil Jung, Yoon-Hyun Ryu, In-Gu Shin, Yossi Shvartzvald, Hongjing Yang, Weicheng Zang, Sang-Mok Cha, Dong-Jin Kim, Seung-Lee Kim, Chung-Uk Lee, Dong-Joo Lee, Yongseok Lee, Byeong-Gon Park, Richard W. Pogge

    Abstract: Traditional microlensing event vetting methods require highly trained human experts, and the process is both complex and time-consuming. This reliance on manual inspection often leads to inefficiencies and constrains the ability to scale for widespread exoplanet detection, ultimately hindering discovery rates. To address the limits of traditional microlensing event vetting, we have developed LensN… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: 23 pages, 13 figures, Accepted for publication in the The Astronomical Journal

    MSC Class: 85-08 ACM Class: J.2

    Journal ref: 2025 AJ

  8. arXiv:2412.11461  [pdf, other

    cs.LG cs.AI

    Unsupervised Anomaly Detection for Tabular Data Using Noise Evaluation

    Authors: Wei Dai, Kai Hwang, Jicong Fan

    Abstract: Unsupervised anomaly detection (UAD) plays an important role in modern data analytics and it is crucial to provide simple yet effective and guaranteed UAD algorithms for real applications. In this paper, we present a novel UAD method for tabular data by evaluating how much noise is in the data. Specifically, we propose to learn a deep neural network from the clean (normal) training dataset and a n… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: The paper was accepted by AAAI 2025

  9. arXiv:2410.12692  [pdf, other

    cs.CV cs.LG

    Machine learning approach to brain tumor detection and classification

    Authors: Alice Oh, Inyoung Noh, Jian Choo, Jihoo Lee, Justin Park, Kate Hwang, Sanghyeon Kim, Soo Min Oh

    Abstract: Brain tumor detection and classification are critical tasks in medical image analysis, particularly in early-stage diagnosis, where accurate and timely detection can significantly improve treatment outcomes. In this study, we apply various statistical and machine learning models to detect and classify brain tumors using brain MRI images. We explore a variety of statistical models including linear,… ▽ More

    Submitted 6 November, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: 7 pages, 2 figures, 2 tables

  10. Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

    Authors: Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang

    Abstract: Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In respo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  11. arXiv:2404.15154  [pdf, other

    cs.CL cs.AI

    Do not think about pink elephant!

    Authors: Kyomin Hwang, Suyoung Kim, JunHoo Lee, Nojun Kwak

    Abstract: Large Models (LMs) have heightened expectations for the potential of general AI as they are akin to human intelligence. This paper shows that recent large models such as Stable Diffusion and DALL-E3 also share the vulnerability of human intelligence, namely the "white bear phenomenon". We investigate the causes of the white bear phenomenon by analyzing their representation space. Based on this ana… ▽ More

    Submitted 31 October, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted in CVPR 2024 Responsible Generative AI Workshop (ReGenAI)

  12. arXiv:2403.17329  [pdf, ps, other

    cs.LG cs.AI

    Deep Support Vectors

    Authors: Junhoo Lee, Hyunho Lee, Kyomin Hwang, Nojun Kwak

    Abstract: Deep learning has achieved tremendous success. However, unlike SVMs, which provide direct decision criteria and can be trained with a small dataset, it still has significant weaknesses due to its requirement for massive datasets during training and the black-box characteristics on decision criteria. This paper addresses these issues by identifying support vectors in deep learning models. To this e… ▽ More

    Submitted 29 June, 2025; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Neurips 2024

  13. arXiv:2403.02496  [pdf

    cs.CL

    Choose Your Own Adventure: Interactive E-Books to Improve Word Knowledge and Comprehension Skills

    Authors: Stephanie Day, Jin K. Hwang, Tracy Arner, Danielle McNamara, Carol Connor

    Abstract: The purpose of this feasibility study was to examine the potential impact of reading digital interactive e-books on essential skills that support reading comprehension with third-fifth grade students. Students read two e-Books that taught word learning and comprehension monitoring strategies in the service of learning difficult vocabulary and targeted science concepts about hurricanes. We investig… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  14. arXiv:2403.01344  [pdf, other

    cs.LG cs.CV

    Mitigating the Bias in the Model for Continual Test-Time Adaptation

    Authors: Inseop Chung, Kyomin Hwang, Jayeon Yoo, Nojun Kwak

    Abstract: Continual Test-Time Adaptation (CTA) is a challenging task that aims to adapt a source pre-trained model to continually changing target domains. In the CTA setting, a model does not know when the target domain changes, thus facing a drastic change in the distribution of streaming inputs during the test-time. The key challenge is to keep adapting the model to the continually changing target domains… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  15. arXiv:2311.00737  [pdf

    cs.LG physics.ins-det physics.med-ph

    Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning

    Authors: Dang Nguyen, Phat K. Huynh, Vinh Duc An Bui, Kee Young Hwang, Nityanand Jain, Chau Nguyen, Le Huu Nhat Minh, Le Van Truong, Xuan Thanh Nguyen, Dinh Hoang Nguyen, Le Tien Dung, Trung Q. Le, Manh-Huong Phan

    Abstract: The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through thre… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  16. arXiv:2309.01961  [pdf, other

    cs.CV

    NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

    Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

    Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More

    Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Tech report, project page https://nice.lgresearch.ai/

  17. arXiv:2308.16415  [pdf, other

    cs.CL eess.AS

    Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

    Authors: Kyuhong Shim, Jinkyu Lee, Simyung Chang, Kyuwoong Hwang

    Abstract: Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted to Interspeech 2023

  18. arXiv:2307.05517  [pdf, other

    cs.LG

    Adaptive Graph Convolution Networks for Traffic Flow Forecasting

    Authors: Zhengdao Li, Wei Li, Kai Hwang

    Abstract: Traffic flow forecasting is a highly challenging task due to the dynamic spatial-temporal road conditions. Graph neural networks (GNN) has been widely applied in this task. However, most of these GNNs ignore the effects of time-varying road conditions due to the fixed range of the convolution receptive field. In this paper, we propose a novel Adaptive Graph Convolution Networks (AGC-net) to addres… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  19. arXiv:2301.03169  [pdf, other

    cs.CV cs.AI

    A Study on the Generality of Neural Network Structures for Monocular Depth Estimation

    Authors: Jinwoo Bae, Kyumin Hwang, Sunghoon Im

    Abstract: Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone netwo… ▽ More

    Submitted 10 December, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

    Comments: Accepted in TPAMI

  20. arXiv:2211.06400  [pdf, other

    physics.acc-ph cs.LG

    Prior-mean-assisted Bayesian optimization application on FRIB Front-End tunning

    Authors: Kilean Hwang, Tomofumi Maruta, Alexander Plastun, Kei Fukushima, Tong Zhang, Qiang Zhao, Peter Ostroumov, Yue Hao

    Abstract: Bayesian optimization~(BO) is often used for accelerator tuning due to its high sample efficiency. However, the computational scalability of training over large data-set can be problematic and the adoption of historical data in a computationally efficient way is not trivial. Here, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.

    Submitted 11 November, 2022; originally announced November 2022.

  21. arXiv:2211.01629  [pdf, other

    cs.CV cs.LG

    Image-based Early Detection System for Wildfires

    Authors: Omkar Ranadive, Jisu Kim, Serin Lee, Youngseo Cha, Heechan Park, Minkook Cho, Young K. Hwang

    Abstract: Wildfires are a disastrous phenomenon which cause damage to land, loss of property, air pollution, and even loss of human life. Due to the warmer and drier conditions created by climate change, more severe and uncontrollable wildfires are expected to occur in the coming years. This could lead to a global wildfire crisis and have dire consequences on our planet. Hence, it has become imperative to u… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Published in Tackling Climate Change with Machine Learning workshop, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  22. arXiv:2205.09185  [pdf, other

    physics.ins-det cs.LG hep-ex nucl-ex physics.comp-ph

    AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

    Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

    Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More

    Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 16 pages, 18 figures, 2 appendices, 3 tables

  23. LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

    Authors: Jinhan Xin, Kai Hwang, Zhibin Yu

    Abstract: Spark SQL has been widely deployed in industry but it is challenging to tune its performance. Recent studies try to employ machine learning (ML) to solve this problem, but suffer from two drawbacks. First, it takes a long time (high overhead) to collect training samples. Second, the optimal configuration for one input data size of the same application might not be optimal for others. To address th… ▽ More

    Submitted 7 November, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: 16 pages, 21 figures, SIGMOD '22. This arxiv version is an extended version of the SIGMOD '22 paper with same title, allowed by conference chairs

  24. arXiv:2202.10612  [pdf, other

    cs.MA cs.AI

    A Decentralized Communication Framework based on Dual-Level Recurrence for Multi-Agent Reinforcement Learning

    Authors: Jingchen Li, Haobin Shi, Kao-Shing Hwang

    Abstract: We propose a model enabling decentralized multiple agents to share their perception of environment in a fair and adaptive way. In our model, both the current message and historical observation are taken into account, and they are handled in the same recurrent model but in different forms. We present a dual-level recurrent communication framework for multi-agent systems, in which the first recurren… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  25. arXiv:2202.05093  [pdf, other

    cs.AI cs.DC cs.LG cs.NE eess.SY

    Two-Stage Deep Anomaly Detection with Heterogeneous Time Series Data

    Authors: Kyeong-Joong Jeong, Jin-Duk Park, Kyusoon Hwang, Seong-Lyun Kim, Won-Yong Shin

    Abstract: We introduce a data-driven anomaly detection framework using a manufacturing dataset collected from a factory assembly line. Given heterogeneous time series data consisting of operation cycle signals and sensor signals, we aim at discovering abnormal events. Motivated by our empirical findings that conventional single-stage benchmark approaches may not exhibit satisfactory performance under our ch… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 10 pages, 4 figures, 4 tables; published in the IEEE Access (Please cite our journal version.)

  26. arXiv:2201.05724  [pdf, ps, other

    cs.DM physics.bio-ph q-bio.BM

    StemP: A fast and deterministic Stem-graph approach for RNA and protein folding prediction

    Authors: Mengyi Tang, Kumbit Hwang, Sung Ha Kang

    Abstract: We propose a new deterministic methodology to predict RNA sequence and protein folding. Is stem enough for structure prediction? The main idea is to consider all possible stem formation in the given sequence. With the stem loop energy and the strength of stem, we explore how to deterministically utilize stem information for RNA sequence and protein folding structure prediction. We use graph notati… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

    MSC Class: 92-10 (Primary) 68R99 (Secondary) ACM Class: G.2.3

  27. arXiv:2103.13620  [pdf, other

    cs.SD cs.AI

    SubSpectral Normalization for Neural Audio Data Processing

    Authors: Simyung Chang, Hyoungwoo Park, Janghoon Cho, Hyunsin Park, Sungrack Yun, Kyuwoong Hwang

    Abstract: Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convo… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: 4 pages, ICASSP '21 accepted

  28. arXiv:2011.01156  [pdf, other

    cs.LG stat.ML

    SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

    Authors: Ting-Yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel

    Abstract: Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbe… ▽ More

    Submitted 15 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted at ICASSP 2021

  29. arXiv:2007.10878  [pdf, other

    cs.LG eess.SP

    DeepNetQoE: Self-adaptive QoE Optimization Framework of Deep Networks

    Authors: Rui Wang, Min Chen, Nadra Guizani, Yong Li, Hamid Gharavi, Kai Hwang

    Abstract: Future advances in deep learning and its impact on the development of artificial intelligence (AI) in all fields depends heavily on data size and computational power. Sacrificing massive computing resources in exchange for better precision rates of the network model is recognized by many researchers. This leads to huge computing consumption and satisfactory results are not always expected when com… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  30. Integrating Deep Learning into CAD/CAE System: Generative Design and Evaluation of 3D Conceptual Wheel

    Authors: Soyoung Yoo, Sunghee Lee, Seongsin Kim, Kwang Hyeon Hwang, Jong Ho Park, Namwoo Kang

    Abstract: Engineering design research integrating artificial intelligence (AI) into computer-aided design (CAD) and computer-aided engineering (CAE) is actively being conducted. This study proposes a deep learning-based CAD/CAE framework in the conceptual design phase that automatically generates 3D CAD designs and evaluates their engineering performance. The proposed framework comprises seven stages: (1) 2… ▽ More

    Submitted 13 June, 2021; v1 submitted 25 May, 2020; originally announced June 2020.

    Journal ref: Structural and Multidisciplinary Optimization, 64(4), pp. 2725-2747 (2021)

  31. arXiv:2002.03493  [pdf, other

    cs.DC cs.PF

    AI-oriented Medical Workload Allocation for Hierarchical Cloud/Edge/Device Computing

    Authors: Tianshu Hao, Jianfeng Zhan, Kai Hwang, Wanling Gao, Xu Wen

    Abstract: In a hierarchically-structured cloud/edge/device computing environment, workload allocation can greatly affect the overall system performance. This paper deals with AI-oriented medical workload generated in emergency rooms (ER) or intensive care units (ICU) in metropolitan areas. The goal is to optimize AI-workload allocation to cloud clusters, edge servers, and end devices so that minimum respons… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

  32. arXiv:1910.06790  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Weakly Labeled Sound Event Detection Using Tri-training and Adversarial Learning

    Authors: Hyoungwoo Park, Sungrack Yun, Jungyun Eum, Janghoon Cho, Kyuwoong Hwang

    Abstract: This paper considers a semi-supervised learning framework for weakly labeled polyphonic sound event detection problems for the DCASE 2019 challenge's task4 by combining both the tri-training and adversarial learning. The goal of the task4 is to detect onsets and offsets of multiple sound events in a single audio clip. The entire dataset consists of the synthetic data with a strong label (sound eve… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 5 pages, DCASE 2019 Workshop

  33. arXiv:1910.06784  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Acoustic Scene Classification Based on a Large-margin Factorized CNN

    Authors: Janghoon Cho, Sungrack Yun, Hyoungwoo Park, Jungyun Eum, Kyuwoong Hwang

    Abstract: In this paper, we present an acoustic scene classification framework based on a large-margin factorized convolutional neural network (CNN). We adopt the factorized CNN to learn the patterns in the time-frequency domain by factorizing the 2D kernel into two separate 1D kernels. The factorized kernel leads to learn the main component of two patterns: the long-term ambient and short-term event sounds… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 5 pages, DCASE 2019 Workshop

  34. arXiv:1910.05171  [pdf, other

    cs.LG cs.CL eess.AS stat.ML

    Query-by-example on-device keyword spotting

    Authors: Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, Kyuwoong Hwang

    Abstract: A keyword spotting (KWS) system determines the existence of, usually predefined, keyword in a continuous speech stream. This paper presents a query-by-example on-device KWS system which is user-specific. The proposed system consists of two main steps: query enrollment and testing. In query enrollment step, phonetic posteriors are output by a small-footprint automatic speech recognition model based… ▽ More

    Submitted 13 January, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

    Comments: IEEE ASRU 2019

  35. arXiv:1910.04500  [pdf, other

    cs.LG eess.AS stat.ML

    Orthogonality Constrained Multi-Head Attention For Keyword Spotting

    Authors: Mingu Lee, Jinkyu Lee, Hye Jin Jang, Byeonggeun Kim, Wonil Chang, Kyuwoong Hwang

    Abstract: Multi-head attention mechanism is capable of learning various representations from sequential data while paying attention to different subsequences, e.g., word-pieces or syllables in a spoken word. From the subsequences, it retrieves richer information than a single-head attention which only summarizes the whole sequence into one context vector. However, a naive use of the multi-head attention doe… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted to ASRU 2019

  36. arXiv:1909.06326  [pdf, other

    q-bio.QM cs.CV cs.LG eess.IV physics.med-ph

    Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning

    Authors: Justin D Krogue, Kaiyang V Cheng, Kevin M Hwang, Paul Toogood, Eric G Meinberg, Erik J Geiger, Musa Zaid, Kevin C McGill, Rina Patel, Jae Ho Sohn, Alexandra Wright, Bryan F Darger, Kevin A Padrez, Eugene Ozhinsky, Sharmila Majumdar, Valentina Pedoia

    Abstract: Purpose: Hip fractures are a common cause of morbidity and mortality. Automatic identification and classification of hip fractures using deep learning may improve outcomes by reducing diagnostic errors and decreasing time to operation. Methods: Hip and pelvic radiographs from 1118 studies were reviewed and 3034 hips were labeled via bounding boxes and classified as normal, displaced femoral neck f… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: Presented at Orthopaedic Research Society, Austin, TX, Feb 2, 2019, currently in submission for publication

  37. arXiv:1908.02612  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network

    Authors: Sungrack Yun, Janghoon Cho, Jungyun Eum, Wonil Chang, Kyuwoong Hwang

    Abstract: This paper presents an end-to-end text-independent speaker verification framework by jointly considering the speaker embedding (SE) network and automatic speech recognition (ASR) network. The SE network learns to output an embedding vector which distinguishes the speaker characteristics of the input utterance, while the ASR network learns to recognize the phonetic context of the input. In training… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

    Comments: Will be appeared in INTERSPEECH 2019

  38. arXiv:1908.01924  [pdf, ps, other

    cs.PF cs.DC

    Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking

    Authors: Tianshu Hao, Yunyou Huang, Xu Wen, Wanling Gao, Fan Zhang, Chen Zheng, Lei Wang, Hainan Ye, Kai Hwang, Zujie Ren, Jianfeng Zhan

    Abstract: In edge computing scenarios, the distribution of data and collaboration of workloads on different layers are serious concerns for performance, privacy, and security issues. So for edge computing benchmarking, we must take an end-to-end view, considering all three layers: client-side devices, edge computing layer, and cloud servers. Unfortunately, the previous work ignores this most important point… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

  39. arXiv:1611.06342  [pdf, other

    cs.LG cs.NE

    Quantized neural network design under weight capacity constraint

    Authors: Sungho Shin, Kyuyeon Hwang, Wonyong Sung

    Abstract: The complexity of deep neural network algorithms for hardware implementation can be lowered either by scaling the number of units or reducing the word-length of weights. Both approaches, however, can accompany the performance degradation although many types of research are conducted to relieve this problem. Thus, it is an important question which one, between the network size scaling and the weigh… ▽ More

    Submitted 19 November, 2016; originally announced November 2016.

    Comments: This paper is accepted at NIPS 2016 workshop on Efficient Methods for Deep Neural Networks (EMDNN). arXiv admin note: text overlap with arXiv:1511.06488

  40. arXiv:1610.00552  [pdf, other

    cs.CL cs.LG cs.SD

    FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

    Authors: Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sungwook Choi, Sungho Shin, Wonyong Sung

    Abstract: In this paper, a neural network based real-time speech recognition (SR) system is developed using an FPGA for very low-power operation. The implemented system employs two recurrent neural networks (RNNs); one is a speech-to-character RNN for acoustic modeling (AM) and the other is for character-level language modeling (LM). The system also employs a statistical word-level LM to improve the recogni… ▽ More

    Submitted 30 September, 2016; originally announced October 2016.

    Comments: Accepted to SiPS 2016

  41. arXiv:1609.03777  [pdf, ps, other

    cs.LG cs.CL cs.NE

    Character-Level Language Modeling with Hierarchical Recurrent Neural Networks

    Authors: Kyuyeon Hwang, Wonyong Sung

    Abstract: Recurrent neural network (RNN) based character-level language models (CLMs) are extremely useful for modeling out-of-vocabulary words by nature. However, their performance is generally much worse than the word-level language models (WLMs), since CLMs need to consider longer history of tokens to properly predict the next one. We address this problem by proposing hierarchical RNN architectures, whic… ▽ More

    Submitted 2 February, 2017; v1 submitted 13 September, 2016; originally announced September 2016.

    Comments: Submitted to NIPS 2016 on May 20, 2016 (v1), accepted to ICASSP 2017 (v2)

  42. arXiv:1608.04077  [pdf, other

    cs.LG

    Generative Knowledge Transfer for Neural Language Models

    Authors: Sungho Shin, Kyuyeon Hwang, Wonyong Sung

    Abstract: In this paper, we propose a generative knowledge transfer technique that trains an RNN based language model (student network) using text and output probabilities generated from a previously trained RNN (teacher network). The text generation can be conducted by either the teacher or the student network. We can also improve the performance by taking the ensemble of soft labels obtained from multiple… ▽ More

    Submitted 28 February, 2017; v1 submitted 14 August, 2016; originally announced August 2016.

  43. Character-Level Incremental Speech Recognition with Recurrent Neural Networks

    Authors: Kyuyeon Hwang, Wonyong Sung

    Abstract: In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained w… ▽ More

    Submitted 28 January, 2016; v1 submitted 25 January, 2016; originally announced January 2016.

    Comments: To appear in ICASSP 2016

  44. arXiv:1512.08903  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Online Keyword Spotting with a Character-Level Recurrent Neural Network

    Authors: Kyuyeon Hwang, Minjae Lee, Wonyong Sung

    Abstract: In this paper, we propose a context-aware keyword spotting model employing a character-level recurrent neural network (RNN) for spoken term detection in continuous speech. The RNN is end-to-end trained with connectionist temporal classification (CTC) to generate the probabilities of character and word-boundary labels. There is no need for the phonetic transcription, senone modeling, or system dict… ▽ More

    Submitted 30 December, 2015; originally announced December 2015.

  45. arXiv:1512.08571  [pdf

    cs.NE cs.LG stat.ML

    Structured Pruning of Deep Convolutional Neural Networks

    Authors: Sajid Anwar, Kyuyeon Hwang, Wonyong Sung

    Abstract: Real time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at var… ▽ More

    Submitted 28 December, 2015; originally announced December 2015.

    Comments: 11 pages, 8 figures, 1 table

  46. Fixed-Point Performance Analysis of Recurrent Neural Networks

    Authors: Sungho Shin, Kyuyeon Hwang, Wonyong Sung

    Abstract: Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations. The hardware complexity can be much lowered by minimizing the word-length of weights and signals. This work analyzes the fixed-point performance of recurrent neural networks using a retrain based quantization method. The quantizat… ▽ More

    Submitted 27 September, 2016; v1 submitted 4 December, 2015; originally announced December 2015.

  47. arXiv:1511.06841  [pdf, ps, other

    cs.LG cs.NE

    Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification

    Authors: Kyuyeon Hwang, Wonyong Sung

    Abstract: Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders… ▽ More

    Submitted 2 February, 2017; v1 submitted 21 November, 2015; originally announced November 2015.

    Comments: Final version: Kyuyeon Hwang and Wonyong Sung, "Sequence to Sequence Training of CTC-RNNs with Partial Windowing," Proceedings of The 33rd International Conference on Machine Learning, pp. 2178-2187, 2016. URL: http://www.jmlr.org/proceedings/papers/v48/hwanga16.html

  48. arXiv:1511.06488  [pdf, other

    cs.LG cs.NE

    Resiliency of Deep Neural Networks under Quantization

    Authors: Wonyong Sung, Sungho Shin, Kyuyeon Hwang

    Abstract: The complexity of deep neural network algorithms for hardware implementation can be much lowered by optimizing the word-length of weights and signals. Direct quantization of floating-point weights, however, does not show good performance when the number of bits assigned is small. Retraining of quantized networks has been developed to relieve this problem. In this work, the effects of retraining ar… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  49. Single stream parallelization of generalized LSTM-like RNNs on a GPU

    Authors: Kyuyeon Hwang, Wonyong Sung

    Abstract: Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose… ▽ More

    Submitted 10 March, 2015; originally announced March 2015.

    Comments: Accepted by the 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015