Skip to main content

Showing 1–50 of 142 results for author: Seo, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.17390  [pdf, other

    cs.CL

    PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

    Authors: Jihyun Lee, Yejin Jeon, Seungyeon Seo, Gary Geunbae Lee

    Abstract: Task-Oriented Dialogue (TOD) systems are designed to fulfill user requests through natural language interactions, yet existing systems often produce generic, monotonic responses that lack individuality and fail to adapt to users' personal attributes. To address this, we introduce PicPersona-TOD, a novel dataset that incorporates user images as part of the persona, enabling personalized responses t… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted in NAACL 2025 main

  2. arXiv:2504.14805  [pdf, other

    cs.LG cs.AI cs.RO

    Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment

    Authors: Jinwoo Choi, Seung-Woo Seo

    Abstract: Reinforcement learning (RL) has made significant progress in various domains, but scaling it to long-horizon tasks with complex decision-making remains challenging. Skill learning attempts to address this by abstracting actions into higher-level behaviors. However, current approaches often fail to recognize semantically similar behaviors as the same skill and use fixed skill lengths, limiting flex… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: ICLR 2025; 23 pages, 12 figures

  3. arXiv:2504.14124  [pdf, other

    cs.DM math.CO

    Progress on Self Identifying Codes

    Authors: Devin Jean, Suk Seo

    Abstract: The concept of an identifying code for a graph was introduced by Karpovsky, Chakrabarty, and Levitin in 1998 as the problem of covering the vertices of a graph such that we can uniquely identify any vertex in the graph by examining the vertices that cover it. An application of an identifying code would be to detect a faulty processor in a multiprocessor system. In 2020, a variation of identify cod… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  4. arXiv:2504.08051  [pdf, other

    cs.LG cs.AI

    Compositional Flows for 3D Molecule and Synthesis Pathway Co-design

    Authors: Tony Shen, Seonghwan Seo, Ross Irwin, Kieran Didi, Simon Olsson, Woo Youn Kim, Martin Ester

    Abstract: Many generative applications, such as synthesis-based 3D molecular design, involve constructing compositional objects with continuous features. Here, we introduce Compositional Generative Flows (CGFlow), a novel framework that extends flow matching to generate objects in compositional steps while modeling continuous states. Our key insight is that modeling compositional state transitions can be fo… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Spotlighted at ICLR 2025 GEM and AI4Mat workshops, 29 pages, 7 figures

  5. arXiv:2503.17125  [pdf, other

    cs.RO cs.AI

    LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning

    Authors: Chan Kim, Seung-Woo Seo, Seong-Woo Kim

    Abstract: Deep Reinforcement Learning (DRL) has demonstrated strong performance in robotic control but remains susceptible to out-of-distribution (OOD) states, often resulting in unreliable actions and task failure. While previous methods have focused on minimizing or preventing OOD occurrences, they largely neglect recovery once an agent encounters such states. Although the latest research has attempted to… ▽ More

    Submitted 28 March, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

    Comments: 14 pages, 16 figures

  6. arXiv:2503.12947  [pdf, other

    cs.CV

    DivCon-NeRF: Generating Augmented Rays with Diversity and Consistency for Few-shot View Synthesis

    Authors: Ingyun Lee, Jae Won Jang, Seunghyeon Seo, Nojun Kwak

    Abstract: Neural Radiance Field (NeRF) has shown remarkable performance in novel view synthesis but requires many multiview images, making it impractical for few-shot scenarios. Ray augmentation was proposed to prevent overfitting for sparse training data by generating additional rays. However, existing methods, which generate augmented rays only near the original rays, produce severe floaters and appearanc… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: 11 pages, 6 figures

  7. arXiv:2503.10256  [pdf, other

    cs.CV

    ROODI: Reconstructing Occluded Objects with Denoising Inpainters

    Authors: Yeonjin Chang, Erqun Dong, Seunghyeon Seo, Nojun Kwak, Kwang Moo Yi

    Abstract: While the quality of novel-view images has improved dramatically with 3D Gaussian Splatting, extracting specific objects from scenes remains challenging. Isolating individual 3D Gaussian primitives for each object and handling occlusions in scenes remain far from being solved. We propose a novel object extraction method based on two key principles: (1) being object-centric by pruning irrelevant pr… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Project page: https://yeonjin-chang.github.io/ROODI/

  8. arXiv:2503.00717  [pdf, other

    cs.MA cs.AI

    LLMDR: LLM-Driven Deadlock Detection and Resolution in Multi-Agent Pathfinding

    Authors: Seungbae Seo, Junghwan Kim, Minjeong Shin, Bongwon Suh

    Abstract: Multi-Agent Pathfinding (MAPF) is a core challenge in multi-agent systems. Existing learning-based MAPF methods often struggle with scalability, particularly when addressing complex scenarios that are prone to deadlocks. To address these challenges, we introduce LLMDR (LLM-Driven Deadlock Detection and Resolution), an approach designed to resolve deadlocks and improve the performance of learnt MAP… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  9. arXiv:2502.17643  [pdf, other

    cs.AI cs.HC cs.LG cs.MA

    Socratic: Enhancing Human Teamwork via AI-enabled Coaching

    Authors: Sangwon Seo, Bing Han, Rayan E. Harari, Roger D. Dias, Marco A. Zenati, Eduardo Salas, Vaibhav Unhelkar

    Abstract: Coaches are vital for effective collaboration, but cost and resource constraints often limit their availability during real-world tasks. This limitation poses serious challenges in life-critical domains that rely on effective teamwork, such as healthcare and disaster response. To address this gap, we propose and realize an innovative application of AI: task-time team coaching. Specifically, we int… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Extended version of an identically-titled paper accepted at AAMAS 2025

  10. arXiv:2502.17618  [pdf, other

    cs.LG cs.AI cs.MA

    Hierarchical Imitation Learning of Team Behavior from Heterogeneous Demonstrations

    Authors: Sangwon Seo, Vaibhav Unhelkar

    Abstract: Successful collaboration requires team members to stay aligned, especially in complex sequential tasks. Team members must dynamically coordinate which subtasks to perform and in what order. However, real-world constraints like partial observability and limited communication bandwidth often lead to suboptimal collaboration. Even among expert teams, the same task can be executed in multiple ways. To… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Extended version of an identically-titled paper accepted at AAMAS 2025

  11. arXiv:2502.16427  [pdf, other

    cs.CV

    Fine-Grained Video Captioning through Scene Graph Consolidation

    Authors: Sanghyeok Chu, Seonguk Seo, Bohyung Han

    Abstract: Recent advances in visual language models (VLMs) have significantly improved image captioning, but extending these gains to video understanding remains challenging due to the scarcity of fine-grained video captioning datasets. To bridge this gap, we propose a novel zero-shot video captioning approach that combines frame-level scene graphs from a video to obtain intermediate representations for cap… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  12. arXiv:2502.08789  [pdf, ps, other

    cs.IT eess.SY

    Delay Analysis of 5G HARQ in the Presence of Decoding and Feedback Latencies

    Authors: Vishnu N Moothedath, Sangwon Seo, Neda Petreska, Bernhard Kloiber, James Gross

    Abstract: The growing demand for stringent quality of service (QoS) guarantees in 5G networks requires accurate characterisation of delay performance, often measured using Delay Violation Probability (DVP) for a given target delay. Widely used retransmission schemes like Automatic Repeat reQuest (ARQ) and Hybrid ARQ (HARQ) improve QoS through effective feedback, incremental redundancy (IR), and parallel ret… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  13. arXiv:2502.08671  [pdf, other

    eess.IV cs.CV

    Color Universal Design Neural Network for the Color Vision Deficiencies

    Authors: Sunyong Seo, Jinho Park

    Abstract: Information regarding images should be visually understood by anyone, including those with color deficiency. However, such information is not recognizable if the color that seems to be distorted to the color deficiencies meets an adjacent object. The aim of this paper is to propose a color universal design network, called CUD-Net, that generates images that are visually understandable by individua… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 12 pages, 10 figures

  14. arXiv:2502.03966  [pdf, other

    cs.CV cs.AI cs.LG

    MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation

    Authors: YoonJe Kang, Yonghoon Jung, Wonseop Shin, Bumsoo Kim, Sanghyun Seo

    Abstract: In this paper, we present synthetic data generation framework for flood hazard detection system. For high fidelity and quality, we characterize several real-world properties into virtual world and simulate the flood situation by controlling them. For the sake of efficiency, recent generative models in image-to-3D and urban city synthesis are leveraged to easily composite flood environments so that… ▽ More

    Submitted 13 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: 6 pages, 6 figures. Accepted as Oral Presentation to AAAI 2025 Workshop on Good-Data

  15. arXiv:2501.18921  [pdf, other

    eess.IV cs.CV

    Full-scale Representation Guided Network for Retinal Vessel Segmentation

    Authors: Sunyong Seo, Huisu Yoon, Semin Kim, Jongha Lee

    Abstract: The U-Net architecture and its variants have remained state-of-the-art (SOTA) for retinal vessel segmentation over the past decade. In this study, we introduce a Full Scale Guided Network (FSG-Net), where the feature representation network with modernized convolution blocks extracts full-scale information and the guided convolution block refines that information. Attention-guided filter is introdu… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: 10 pages, 7 figures

  16. arXiv:2501.18086  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

    Authors: Se-Wook Yoo, Seung-Woo Seo

    Abstract: Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related t… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: 16 pages, 14 figures, 6 tables, submission to T-RO in 2024

  17. arXiv:2501.05981  [pdf, other

    cs.CL

    Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea

    Authors: Eunjung Cho, Won Ik Cho, Soomin Seo

    Abstract: Hallucination in large language models (LLMs) remains a significant challenge for their safe deployment, particularly due to its potential to spread misinformation. Most existing solutions address this challenge by focusing on aligning the models with credible sources or by improving how models communicate their confidence (or lack thereof) in their outputs. While these measures may be effective i… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: Accepted at COLING 2025

  18. arXiv:2412.15311  [pdf, other

    cs.LG

    Re-evaluating Group Robustness via Adaptive Class-Specific Scaling

    Authors: Seonguk Seo, Bohyung Han

    Abstract: Group distributionally robust optimization, which aims to improve robust accuracies -- worst-group and unbiased accuracies -- is a prominent algorithm used to mitigate spurious correlations and address dataset bias. Although existing approaches have reported improvements in robust accuracies, these gains often come at the cost of average accuracy due to inherent trade-offs. To control this trade-o… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

  19. arXiv:2412.13708  [pdf, other

    cs.CV

    JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts

    Authors: Taein Son, Soo Won Seo, Jisong Kim, Seok Hwan Lee, Jun Won Choi

    Abstract: Video Action Detection (VAD) entails localizing and categorizing action instances within videos, which inherently consist of diverse information sources such as audio, visual cues, and surrounding scene contexts. Leveraging this multi-modal information effectively for VAD poses a significant challenge, as the model must identify action-relevant cues with precision. In this study, we introduce a no… ▽ More

    Submitted 3 February, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted to AAAI Conference on Artificial Intelligence 2025, 10 pages, 6 figures

  20. arXiv:2412.04727  [pdf, other

    eess.IV cs.CV

    Learning to Translate Noise for Robust Image Denoising

    Authors: Inju Ha, Donghun Ryou, Seonguk Seo, Bohyung Han

    Abstract: Deep learning-based image denoising techniques often struggle with poor generalization performance to out-of-distribution real-world noise. To tackle this challenge, we propose a novel noise translation framework that performs denoising on an image with translated noise rather than directly denoising an original noisy image. Specifically, our approach translates complex, unknown real-world noise i… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: The project page is available at https://hij1112.github.io/learning-to-translate-noise/

  21. arXiv:2411.06385  [pdf, other

    cs.AI

    Class Granularity: How richly does your knowledge graph represent the real world?

    Authors: Sumin Seo, Heeseon Cheon, Hyunho Kim

    Abstract: To effectively manage and utilize knowledge graphs, it is crucial to have metrics that can assess the quality of knowledge graphs from various perspectives. While there have been studies on knowledge graph quality metrics, there has been a lack of research on metrics that measure how richly ontologies, which form the backbone of knowledge graphs, are defined or the impact of richly defined ontolog… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: 10 pages

  22. arXiv:2410.21836  [pdf, other

    cs.CL

    Multi-aspect Depression Severity Assessment via Inductive Dialogue System

    Authors: Chaebin Lee, Seungyeon Seo, Heejin Do, Gary Geunbae Lee

    Abstract: With the advancement of chatbots and the growing demand for automatic depression detection, identifying depression in patient conversations has gained more attention. However, prior methods often assess depression in a binary way or only a single score without diverse feedback and lack focus on enhancing dialogue responses. In this paper, we present a novel task of multi-aspect depression severity… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

  23. arXiv:2410.04542  [pdf, other

    q-bio.BM cs.LG

    Generative Flows on Synthetic Pathway for Drug Design

    Authors: Seonghwan Seo, Minsu Kim, Tony Shen, Martin Ester, Jinkyoo Park, Sungsoo Ahn, Woo Youn Kim

    Abstract: Generative models in drug discovery have recently gained attention as efficient alternatives to brute-force virtual screening. However, most existing models do not account for synthesizability, limiting their practical use in real-world scenarios. In this paper, we propose RxnFlow, which sequentially assembles molecules using predefined molecular building blocks and chemical reaction templates to… ▽ More

    Submitted 6 March, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025, 32 pages, 17 figures, code: https://github.com/SeonghwanSeo/RxnFlow

  24. arXiv:2409.19382  [pdf, other

    cs.CL

    Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models

    Authors: Seongmin Lee, Jaewook Shin, Youngjin Ahn, Seokin Seo, Ohjoon Kwon, Kee-Eung Kim

    Abstract: Recent advances in large language models (LLMs) have significantly impacted the domain of multi-hop question answering (MHQA), where systems are required to aggregate information and infer answers from disparate pieces of text. However, the autoregressive nature of LLMs inherently poses a challenge as errors may accumulate if mistakes are made in the intermediate reasoning steps. This paper introd… ▽ More

    Submitted 1 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

    Comments: Work in Progress

  25. arXiv:2409.10027  [pdf, other

    cs.RO cs.AI

    E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

    Authors: Chan Kim, Keonwoo Kim, Mintaek Oh, Hanbi Baek, Jiyang Lee, Donghwi Jung, Soojin Woo, Younkyung Woo, John Tucker, Roya Firoozi, Seung-Woo Seo, Mac Schwager, Seong-Woo Kim

    Abstract: Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stocha… ▽ More

    Submitted 2 February, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 19 pages, 28 figures. Project page: https://e2map.github.io. Accepted to ICRA 2025

  26. arXiv:2408.06044  [pdf, other

    cs.CL

    DiagESC: Dialogue Synthesis for Integrating Depression Diagnosis into Emotional Support Conversation

    Authors: Seungyeon Seo, Gary Geunbae Lee

    Abstract: Dialogue systems for mental health care aim to provide appropriate support to individuals experiencing mental distress. While extensive research has been conducted to deliver adequate emotional support, existing studies cannot identify individuals who require professional medical intervention and cannot offer suitable guidance. We introduce the Diagnostic Emotional Support Conversation task for an… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by SIGDIAL 2024

  27. arXiv:2408.03612  [pdf, other

    cs.CV cs.LG

    JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling

    Authors: Seok Hwan Lee, Taein Son, Soo Won Seo, Jisong Kim, Jun Won Choi

    Abstract: Video action detection (VAD) is a formidable vision task that involves the localization and classification of actions within the spatial and temporal dimensions of a video clip. Among the myriad VAD architectures, two-stage VAD methods utilize a pre-trained person detector to extract the region of interest features, subsequently employing these features for action detection. However, the performan… ▽ More

    Submitted 17 September, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: 31 pages, 10 figures, update references

  28. arXiv:2407.17710  [pdf, other

    cs.LG

    Revisiting Machine Unlearning with Dimensional Alignment

    Authors: Seonguk Seo, Dongwan Kim, Bohyung Han

    Abstract: Machine unlearning, an emerging research topic focusing on compliance with data privacy regulations, enables trained models to remove the information learned from specific data. While many existing methods indirectly address this issue by intentionally injecting incorrect supervisions, they can drastically and unpredictably alter the decision boundaries and feature spaces, leading to training inst… ▽ More

    Submitted 20 December, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  29. arXiv:2407.11057  [pdf, other

    cs.LG cs.AI q-bio.BM

    SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction

    Authors: Seungyeon Choi, Sangmin Seo, Sanghyun Park

    Abstract: Accurate prediction of protein-ligand binding affinity is crucial for rapid and efficient drug development. Recently, the importance of predicting binding affinity has led to increased attention on research that models the three-dimensional structure of protein-ligand complexes using graph neural networks to predict binding affinity. However, traditional methods often fail to accurately model the… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECAI 2024

  30. arXiv:2406.13214  [pdf, other

    cs.LG

    Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

    Authors: Sangwoo Seo, Sungwon Kim, Jihyeong Jung, Yoonho Lee, Chanyoung Park

    Abstract: Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal grap… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  31. arXiv:2406.09170  [pdf, other

    cs.CL

    Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

    Authors: Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, Jinyeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi

    Abstract: Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-trai… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  32. arXiv:2405.16751  [pdf, other

    cs.AI cs.CL cs.CV cs.MA

    REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity

    Authors: SeungWon Seo, SeongRae Noh, Junhyeok Lee, SooBin Lim, Won Hee Lee, HyeongYeop Kang

    Abstract: We address the challenge of multi-agent cooperation, where agents achieve a common goal by cooperating with decentralized agents under complex partial observations. Existing cooperative agent systems often struggle with efficiently processing continuously accumulating information, managing globally suboptimal planning due to lack of consideration of collaborators, and addressing false planning cau… ▽ More

    Submitted 18 December, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: v2 is the AAAI'25 camera-ready version, including the appendix, which has been enhanced based on the reviewers' comments

  33. arXiv:2405.13345  [pdf, other

    cs.RO cs.LG

    Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

    Authors: Sang-Hyun Lee, Daehyeok Kwon, Seung-Woo Seo

    Abstract: Recent reinforcement learning (RL) algorithms have demonstrated impressive results in simulated driving environments. However, autonomous vehicles trained in simulation often struggle to work well in the real world due to the fidelity gap between simulated and real-world environments. While directly training real-world autonomous vehicles with RL algorithms is a promising approach to bypass the fi… ▽ More

    Submitted 15 January, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, 2 tables, conference

  34. arXiv:2404.16989  [pdf, other

    cs.LG cs.AI cs.RO

    IDIL: Imitation Learning of Intent-Driven Expert Behavior

    Authors: Sangwon Seo, Vaibhav Unhelkar

    Abstract: When faced with accomplishing a task, human experts exhibit intentional behavior. Their unique intents shape their plans and decisions, resulting in experts demonstrating diverse behaviors to accomplish the same task. Due to the uncertainties encountered in the real world and their bounded rationality, experts sometimes adjust their intents, which in turn influences their behaviors during task exe… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Extended version of an identically-titled paper accepted at AAMAS 2024

  35. Traversability-aware Adaptive Optimization for Path Planning and Control in Mountainous Terrain

    Authors: Se-Wook Yoo, E In Son, Seung-Woo Seo

    Abstract: Autonomous navigation in extreme mountainous terrains poses challenges due to the presence of mobility-stressing elements and undulating surfaces, making it particularly difficult compared to conventional off-road driving scenarios. In such environments, estimating traversability solely based on exteroceptive sensors often leads to the inability to reach the goal due to a high prevalence of non-tr… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 8 pages, 7 figures, accepted 2024 RA-L

    Journal ref: IEEE Robotics and Automation Letters 2024

  36. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  37. arXiv:2404.01914  [pdf, other

    cs.CL cs.AI

    SCANNER: Knowledge-Enhanced Approach for Robust Multi-modal Named Entity Recognition of Unseen Entities

    Authors: Hyunjong Ok, Taeho Kil, Sukmin Seo, Jaeho Lee

    Abstract: Recent advances in named entity recognition (NER) have pushed the boundary of the task to incorporate visual signals, leading to many variants, including multi-modal NER (MNER) or grounded MNER (GMNER). A key challenge to these tasks is that the model should be able to generalize to the entities unseen during the training, and should be able to handle the training samples with noisy annotations. T… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures, NAACL 2024

  38. arXiv:2404.01745  [pdf, other

    cs.CV cs.AI

    Unleash the Potential of CLIP for Video Highlight Detection

    Authors: Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak

    Abstract: Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-train… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  39. arXiv:2403.15048  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information

    Authors: Bumsoo Kim, Wonseop Shin, Kyuchul Lee, Yonghoon Jung, Sanghyun Seo

    Abstract: Leveraging large-scale Text-to-Image (TTI) models have become a common technique for generating exemplar or training dataset in the fields of image synthesis, video editing, 3D reconstruction. However, semantic structural visual hallucinations involving perceptually severe defects remain a concern, especially in the domain of non-photorealistic rendering (NPR) such as cartoons and pixelization-sty… ▽ More

    Submitted 22 January, 2025; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted at WACV 2025, Project page: https://gh-bumsookim.github.io/Cartoon-Hallucinations-Detection/

  40. arXiv:2403.10906  [pdf, other

    cs.CV

    ARC-NeRF: Area Ray Casting for Broader Unseen View Coverage in Few-shot Object Rendering

    Authors: Seunghyeon Seo, Yeonjin Chang, Jayeon Yoo, Seungwoo Lee, Hojun Lee, Nojun Kwak

    Abstract: Recent advancements in the Neural Radiance Field (NeRF) have enhanced its capabilities for novel view synthesis, yet its reliance on dense multi-view training images poses a practical challenge, often leading to artifacts and a lack of fine object details. Addressing this, we propose ARC-NeRF, an effective regularization-based approach with a novel Area Ray Casting strategy. While the previous ray… ▽ More

    Submitted 7 April, 2025; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: CVPR 2025 Workshop: 4th Computer Vision for Metaverse Workshop

  41. arXiv:2403.01594  [pdf, other

    cs.HC

    Never Tell the Trick: Covert Interactive Mixed Reality System for Immersive Storytelling

    Authors: Chanwoo Lee, Kyubeom Shim, Sanggyo Seo, Gwonu Ryu, Yongsoon Choi

    Abstract: This study explores the integration of Ultra-Wideband (UWB) technology into Mixed Reality (MR) Systems for immersive storytelling. Addressing the limitations of existing technologies like Microsoft Kinect and HTC Vive, the research focuses on overcoming challenges in robustness to occlusion, tracking volume, and cost efficiency in props tracking. Utilizing UWB technology, the interactive MR system… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: To be presented in IEEE VR 2024

  42. arXiv:2403.01233  [pdf, other

    cs.RO

    Results and Lessons Learned from Autonomous Driving Transportation Services in Airfield, Crowded Indoor, and Urban Environments

    Authors: Doosan Baek, Sanghyun Kim, Seung-Woo Seo, Sang-Hyun Lee

    Abstract: Autonomous vehicles have been actively investigated over the past few decades. Several recent works show the potential of autonomous vehicles in urban environments with impressive experimental results. However, these works note that autonomous vehicles are still occasionally inferior to expert drivers in complex scenarios. Furthermore, they do not focus on the possibilities of autonomous driving t… ▽ More

    Submitted 20 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, 4 tables

  43. StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention

    Authors: Seungwon Seo, Suho Lee, Sangheum Hwang

    Abstract: Utilizing large-scale pretrained models is a well-known strategy to enhance performance on various target tasks. It is typically achieved through fine-tuning pretrained models on target tasks. However, naïve fine-tuning may not fully leverage knowledge embedded in pretrained models. In this study, we introduce a novel fine-tuning method, called stochastic cross-attention (StochCA), specific to Tra… ▽ More

    Submitted 28 September, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: The updated version was published in Neural Networks (https://www.sciencedirect.com/science/article/abs/pii/S0893608024005872). The first two authors contributed equally

  44. arXiv:2402.15363  [pdf, other

    cs.RO

    Follow the Footprints: Self-supervised Traversability Estimation for Off-road Vehicle Navigation based on Geometric and Visual Cues

    Authors: Yurim Jeon, E In Son, Seung-Woo Seo

    Abstract: In this study, we address the off-road traversability estimation problem, that predicts areas where a robot can navigate in off-road environments. An off-road environment is an unstructured environment comprising a combination of traversable and non-traversable spaces, which presents a challenge for estimating traversability. This study highlights three primary factors that affect a robot's traver… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2024

  45. arXiv:2402.05706  [pdf, other

    cs.CL cs.SD eess.AS

    Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

    Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo

    Abstract: Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken respons… ▽ More

    Submitted 27 November, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024, Project Page: https://unifiedsdm.github.io/

  46. arXiv:2402.05448  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application

    Authors: Bumsoo Kim, Sanghyun Byun, Yonghoon Jung, Wonseop Shin, Sareer UI Amin, Sanghyun Seo

    Abstract: In this paper, we first present the character texture generation system \textit{Minecraft-ify}, specified to Minecraft video game toward in-game application. Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold. While existing projects or works only generate texture, proposed system can inverse the user-provided real image, or generate aver… ▽ More

    Submitted 3 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 2 pages, 2 figures. Accepted as Spotlight to NeurIPS 2023 Workshop on Machine Learning for Creativity and Design

  47. arXiv:2402.02733  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer

    Authors: Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo

    Abstract: Face re-aging is a prominent field in computer vision and graphics, with significant applications in photorealistic domains such as movies, advertising, and live streaming. Recently, the need to apply face re-aging to non-photorealistic images, like comics, illustrations, and animations, has emerged as an extension in various entertainment sectors. However, the lack of a network that can seamlessl… ▽ More

    Submitted 28 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at CVPR 2024 AI4CC Workshop, Project Page: https://gh-bumsookim.github.io/ToonAging/

  48. arXiv:2401.12624  [pdf, other

    cs.AI cs.IT cs.LG cs.NI

    Knowledge Distillation from Language-Oriented to Emergent Communication for Multi-Agent Remote Control

    Authors: Yongjun Kim, Sejin Seo, Jihong Park, Mehdi Bennis, Seong-Lyun Kim, Junil Choi

    Abstract: In this work, we compare emergent communication (EC) built upon multi-agent deep reinforcement learning (MADRL) and language-oriented semantic communication (LSC) empowered by a pre-trained large language model (LLM) using human language. In a multi-agent remote navigation task, with multimodal input data comprising location and channel maps, it is shown that EC incurs high training cost and strug… ▽ More

    Submitted 3 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  49. arXiv:2401.04928  [pdf, other

    cs.LG

    Relaxed Contrastive Learning for Federated Learning

    Authors: Seonguk Seo, Jinkyu Kim, Geeho Kim, Bohyung Han

    Abstract: We propose a novel contrastive learning framework to effectively address the challenges of data heterogeneity in federated learning. We first analyze the inconsistency of gradient updates across clients during local training and establish its dependence on the distribution of feature representations, leading to the derivation of the supervised contrastive learning (SCL) objective to mitigate local… ▽ More

    Submitted 31 May, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  50. arXiv:2401.03240  [pdf, other

    cs.LG math.OC

    Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free Optimization

    Authors: Min-Kook Suh, Seung-Woo Seo

    Abstract: We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent. However, although steepest descent methods offer an intuitive approach to finding minima, many deep learning applications require adaptive gradient methods to a… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: Preprint