Skip to main content

Showing 1–20 of 20 results for author: Kwak, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16231  [pdf, ps, other

    eess.AS cs.SD

    EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training

    Authors: Doyeop Kwak, Youngjoon Jang, Seongyu Kim, Joon Son Chung

    Abstract: Speech signals in real-world environments are frequently affected by various distortions such as additive noise, reverberation, and bandwidth limitation, which may appear individually or in combination. Traditional speech enhancement methods typically rely on either masking, which focuses on suppressing non-speech components while preserving observable structure, or mapping, which seeks to recover… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  2. arXiv:2504.03380  [pdf, other

    cs.CL cs.AI

    Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning

    Authors: Sanghwan Bae, Jiwoo Hong, Min Young Lee, Hanbyul Kim, JeongYeon Nam, Donghyun Kwak

    Abstract: Reasoning-Oriented Reinforcement Learning (RORL) enhances the reasoning ability of Large Language Models (LLMs). However, due to the sparsity of rewards in RORL, effective training is highly dependent on the selection of problems of appropriate difficulty. Although curriculum learning attempts to address this by adjusting difficulty, it often relies on static schedules, and even recent online filt… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  3. arXiv:2501.10300  [pdf

    cs.AI

    An Ontology for Social Determinants of Education (SDoEd) based on Human-AI Collaborative Approach

    Authors: Navya Martin Kollapally, James Geller, Patricia Morreale, Daehan Kwak

    Abstract: The use of computational ontologies is well-established in the field of Medical Informatics. The topic of Social Determinants of Health (SDoH) has also received extensive attention. Work at the intersection of ontologies and SDoH has been published. However, a standardized framework for Social Determinants of Education (SDoEd) is lacking. In this paper, we are closing the gap by introducing an SDo… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

    Comments: Accepted in CONSORTIUM FOR COMPUTING SCIENCES IN COLLEGES

  4. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  5. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  6. arXiv:2403.10105  [pdf, other

    cs.RO cs.AI cs.LG

    Belief Aided Navigation using Bayesian Reinforcement Learning for Avoiding Humans in Blind Spots

    Authors: Jinyeob Kim, Daewon Kwak, Hyunwoo Rim, Donghan Kim

    Abstract: Recent research on mobile robot navigation has focused on socially aware navigation in crowded environments. However, existing methods do not adequately account for human robot interactions and demand accurate location information from omnidirectional sensors, rendering them unsuitable for practical applications. In response to this need, this study introduces a novel algorithm, BNBRL+, predicated… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  7. arXiv:2305.13735  [pdf, other

    cs.CL cs.AI cs.LG

    Aligning Large Language Models through Synthetic Feedback

    Authors: Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, Minjoon Seo

    Abstract: Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant human demonstrations and feedback or distillation from proprietary LLMs such as ChatGPT. In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprieta… ▽ More

    Submitted 20 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 main conference

  8. arXiv:2212.10504  [pdf, other

    cs.CL

    Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?

    Authors: Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, Donghyun Kwak, Hyungsuk Noh, Woomyoung Park

    Abstract: Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-worl… ▽ More

    Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  9. arXiv:2210.08750  [pdf, other

    cs.CL cs.AI

    Keep Me Updated! Memory Management in Long-term Conversations

    Authors: Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, Nako Sung

    Abstract: Remembering important information from the past and continuing to talk about it in the present are crucial in long-term conversations. However, previous literature does not deal with cases where the memorized information is outdated, which may cause confusion in later conversations. To address this issue, we present a novel task and a corresponding dataset of memory management in long-term convers… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP2022 Findings

  10. arXiv:2205.12502  [pdf, other

    cs.CV cs.CL cs.LG

    The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

    Authors: Gi-Cheon Kang, Sungdong Kim, Jin-Hwa Kim, Donghyun Kwak, Byoung-Tak Zhang

    Abstract: Visual dialog (VisDial) is a task of answering a sequence of questions grounded in an image, using the dialog history as context. Prior work has trained the dialog agents solely on VisDial data via supervised learning or leveraged pre-training on related vision-and-language datasets. This paper presents a semi-supervised learning approach for visually-grounded dialog, called Generative Self-Traini… ▽ More

    Submitted 2 March, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR 2023

  11. arXiv:2205.00176  [pdf, other

    cs.CL

    Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models

    Authors: Sanghwan Bae, Donghyun Kwak, Sungdong Kim, Donghoon Ham, Soyoung Kang, Sang-Woo Lee, Woomyoung Park

    Abstract: Recent open-domain dialogue models have brought numerous breakthroughs. However, building a chat system is not scalable since it often requires a considerable volume of human-human dialogue data, especially when enforcing features such as persona, style, or safety. In this work, we study the challenge of imposing roles on open-domain dialogue systems, with the goal of making the systems maintain c… ▽ More

    Submitted 30 April, 2022; originally announced May 2022.

    Comments: Accepted to NAACL2022 as a long paper

  12. arXiv:2111.10612  [pdf

    cs.CV physics.app-ph

    A photosensor employing data-driven binning for ultrafast image recognition

    Authors: Lukas Mennel, Aday J. Molina-Mendoza, Matthias Paur, Dmitry K. Polyushkin, Dohyun Kwak, Miriam Giparakis, Maximilian Beiser, Aaron Maxwell Andrews, Thomas Mueller

    Abstract: Pixel binning is a technique, widely used in optical image acquisition and spectroscopy, in which adjacent detector elements of an image sensor are combined into larger pixels. This reduces the amount of data to be processed as well as the impact of noise, but comes at the cost of a loss of information. Here, we push the concept of binning to its limit by combining a large fraction of the sensor e… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 10 pages, 4 figures

  13. arXiv:2109.04650  [pdf, other

    cs.CL

    What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

    Authors: Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, Jinseong Park , et al. (12 additional authors not shown)

    Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a K… ▽ More

    Submitted 28 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP2021 as a long paper. Fixed some typos

  14. arXiv:2104.07253  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

    Authors: Seunghyun Seo, Donghyun Kwak, Bowon Lee

    Abstract: Most End-to-End (E2E) SLU networks leverage the pre-trained ASR networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained NLU networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation, cross-modal shared embed… ▽ More

    Submitted 16 February, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted for ICASSP 2022

  15. arXiv:2005.08213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

    Authors: Won Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim

    Abstract: Speech is one of the most effective means of communication and is full of information that helps the transmission of utterer's thoughts. However, mainly due to the cumbersome processing of acoustic features, phoneme or word posterior probability has frequently been discarded in understanding the natural language. Thus, some recent spoken language understanding (SLU) modules have utilized end-to-en… ▽ More

    Submitted 8 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020 Camera-ready

  16. arXiv:1908.02569  [pdf, other

    cs.SI cs.IR cs.LG stat.ML

    Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation

    Authors: Kyung-Min Kim, Donghyun Kwak, Hanock Kwak, Young-Jin Park, Sangkwon Sim, Jae-Han Cho, Minkyu Kim, Jihun Kwon, Nako Sung, Jung-Woo Ha

    Abstract: Graph Neural Networks (GNNs) have been emerging as a promising method for relational representation including recommender systems. However, various challenging issues of social graphs hinder the practical usage of GNNs for social recommendation, such as their complex noisy connections and high heterogeneity. The oversmoothing of GNNs is an obstacle of GNN-based social recommendation as well. Here… ▽ More

    Submitted 24 July, 2019; originally announced August 2019.

    Comments: 6 pages, accepted for RecSys 2019 LBR Track

  17. arXiv:1712.05902  [pdf, other

    cs.LG cs.DC

    NSML: A Machine Learning Platform That Enables You to Focus on Your Models

    Authors: Nako Sung, Minkyu Kim, Hyunwoo Jo, Youngil Yang, Jingwoong Kim, Leonard Lausen, Youngkwan Kim, Gayoung Lee, Donghyun Kwak, Jung-Woo Ha, Sunghun Kim

    Abstract: Machine learning libraries such as TensorFlow and PyTorch simplify model implementation. However, researchers are still required to perform a non-trivial amount of manual tasks such as GPU allocation, training status tracking, and comparison of models with different hyperparameter settings. We propose a system to handle these tasks and help researchers focus on models. We present the requirements… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

    Comments: 8 pages, 4figures

  18. arXiv:1703.03933  [pdf, other

    cs.AI

    Micro-Objective Learning : Accelerating Deep Reinforcement Learning through the Discovery of Continuous Subgoals

    Authors: Sungtae Lee, Sang-Woo Lee, Jinyoung Choi, Dong-Hyun Kwak, Byoung-Tak Zhang

    Abstract: Recently, reinforcement learning has been successfully applied to the logical game of Go, various Atari games, and even a 3D game, Labyrinth, though it continues to have problems in sparse reward settings. It is difficult to explore, but also difficult to exploit, a small number of successes when learning policy. To solve this issue, the subgoal and option framework have been proposed. However, di… ▽ More

    Submitted 11 March, 2017; originally announced March 2017.

  19. arXiv:1701.05334  [pdf

    cs.AI cs.CL

    Fuzzy Ontology-Based Sentiment Analysis of Transportation and City Feature Reviews for Safe Traveling

    Authors: Farman Ali, D. Kwak, Pervez Khan, S. M. Riazul Islam, K. H. Kim, K. S. Kwak

    Abstract: Traffic congestion is rapidly increasing in urban areas, particularly in mega cities. To date, there exist a few sensor network based systems to address this problem. However, these techniques are not suitable enough in terms of monitoring an entire transportation system and delivering emergency services when needed. These techniques require real-time data and intelligent ways to quickly determine… ▽ More

    Submitted 19 January, 2017; originally announced January 2017.

    Comments: 24 pages, 7 figures, Transportation Research Part C

  20. arXiv:1606.01455  [pdf, other

    cs.CV

    Multimodal Residual Learning for Visual QA

    Authors: Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

    Abstract: Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively… ▽ More

    Submitted 31 August, 2016; v1 submitted 4 June, 2016; originally announced June 2016.

    Comments: 13 pages, 7 figures, accepted for NIPS 2016