Skip to main content

Showing 1–50 of 112 results for author: Jung, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10343  [pdf, other

    cs.CL cs.AI

    Code Execution as Grounded Supervision for LLM Reasoning

    Authors: Dongwon Jung, Wenxuan Zhou, Muhao Chen

    Abstract: Training large language models (LLMs) with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities. However, obtaining reliable and accurate reasoning supervision remains a significant challenge. We propose a scalable method for generating a high-quality CoT supervision dataset by leveraging the determinism of program execution. Unlike existing reasoning dat… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2506.05011  [pdf, other

    cs.CV

    UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting

    Authors: Jaehoon Choi, Dongki Jung, Christopher Maxey, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon

    Abstract: Despite significant advancements in dynamic neural rendering, existing methods fail to address the unique challenges posed by UAV-captured scenarios, particularly those involving monocular camera setups, top-down perspective, and multiple small, moving humans, which are not adequately represented in existing datasets. In this work, we introduce UAV4D, a framework for enabling photorealistic render… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2505.19503  [pdf, other

    cs.CV

    Locality-Aware Zero-Shot Human-Object Interaction Detection

    Authors: Sanghyun Kim, Deunsol Jung, Minsu Cho

    Abstract: Recent methods for zero-shot Human-Object Interaction (HOI) detection typically leverage the generalization ability of large Vision-Language Model (VLM), i.e., CLIP, on unseen categories, showing impressive results on various zero-shot settings. However, existing methods struggle to adapt CLIP representations for human-object pairs, as CLIP tends to overlook fine-grained information necessary for… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR2025; Code is available at: https://github.com/OreoChocolate/LAIN

  4. arXiv:2505.17503  [pdf, ps, other

    cs.CL

    CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents

    Authors: Minsoo Khang, Sangjun Park, Teakgyu Hong, Dawoon Jung

    Abstract: Large Language Models (LLMs) have made substantial progress in recent years, yet evaluating their capabilities in practical Retrieval-Augmented Generation (RAG) scenarios remains challenging. In practical applications, LLMs must demonstrate complex reasoning, refuse to answer appropriately, provide precise citations, and effectively understand document layout. These capabilities are crucial for ad… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  5. arXiv:2505.11152  [pdf, ps, other

    cs.CV

    Learning Dense Hand Contact Estimation from Imbalanced Data

    Authors: Daniel Sungho Jung, Kyoung Mu Lee

    Abstract: Hands are essential to human interaction, and understanding contact between hands and the world can promote comprehensive understanding of their function. Recently, there have been growing number of hand interaction datasets that cover interaction with object, other hand, scene, and body. Despite the significance of the task and increasing high-quality data, how to effectively learn dense hand con… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Project page: http://haco-release.github.io

  6. arXiv:2505.03359  [pdf, other

    cs.AI

    Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection

    Authors: June-Woo Kim, Haram Yoon, Wonkyo Oh, Dawoon Jung, Sung-Hoon Yoon, Dae-Jin Kim, Dong-Ho Lee, Sang-Yeol Lee, Chan-Mo Yang

    Abstract: Speech-based AI models are emerging as powerful tools for detecting depression and the presence of Post-traumatic stress disorder (PTSD), offering a non-invasive and cost-effective way to assess mental health. However, these models often struggle with gender bias, which can lead to unfair and inaccurate predictions. In this study, our study addresses this issue by introducing a domain adversarial… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted to EMBC 2025

  7. arXiv:2504.02158  [pdf, other

    cs.CV

    UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting

    Authors: Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon

    Abstract: We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs). Specifically, our approach focuses on synthesizing foreground components, such as various human instances in motion within complex scene backgrounds, from UAV perspectives. This is achieved by integrating… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  8. arXiv:2504.00843  [pdf, other

    cs.AI cs.HC

    Investigating Large Language Models in Diagnosing Students' Cognitive Skills in Math Problem-solving

    Authors: Hyoungwook Jin, Yoonsu Kim, Dongyun Jung, Seungju Kim, Kiyoon Choi, Jinho Son, Juho Kim

    Abstract: Mathematics learning entails mastery of both content knowledge and cognitive processing of knowing, applying, and reasoning with it. Automated math assessment primarily has focused on grading students' exhibition of content knowledge by finding textual evidence, such as specific numbers, formulas, and statements. Recent advancements in problem-solving, image recognition, and reasoning capabilities… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  9. arXiv:2504.00717  [pdf, ps, other

    cs.NE cs.AI

    Advancements in Multimodal Differential Evolution: A Comprehensive Review and Future Perspectives

    Authors: Dikshit Chauhan, Shivani, Donghwi Jung, Anupam Yadav

    Abstract: Multi-modal optimization involves identifying multiple global and local optima of a function, offering valuable insights into diverse optimal solutions within the search space. Evolutionary algorithms (EAs) excel at finding multiple solutions in a single run, providing a distinct advantage over classical optimization techniques that often require multiple restarts without guarantee of obtaining di… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  10. arXiv:2503.19540  [pdf, other

    cs.CL cs.AI

    FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

    Authors: Dahyun Jung, Seungyoon Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced interactions between users and models. These advancements concurrently underscore the need for rigorous safety evaluations due to the manifestation of social biases, which can lead to harmful societal impacts. Despite these concerns, existing benchmarks may overlook the intrinsic weaknesses of LLMs, which can generate… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to NAACL 2025 findings

  11. arXiv:2502.20685  [pdf, other

    cs.CV

    EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

    Authors: Dongki Jung, Jaehoon Choi, Yonghan Lee, Somi Jeong, Taejae Lee, Dinesh Manocha, Suyong Yeon

    Abstract: We introduce the first learning-based dense matching algorithm, termed Equirectangular Projection-Oriented Dense Kernelized Feature Matching (EDM), specifically designed for omnidirectional images. Equirectangular projection (ERP) images, with their large fields of view, are particularly suited for dense matching techniques that aim to establish comprehensive correspondences across images. However… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  12. arXiv:2502.18934  [pdf, other

    cs.CL cs.LG

    Kanana: Compute-efficient Bilingual Language Models

    Authors: Kanana LLM Team, Yunju Bak, Hojin Lee, Minho Ryu, Jiyeon Ham, Seungjae Jung, Daniel Wontae Nam, Taegyeong Eo, Donghun Lee, Doohae Jung, Boseop Kim, Nayeon Kim, Jaesun Park, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Kyoung-Woon On, Seulye Baeg, Junrae Cho, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee , et al. (4 additional authors not shown)

    Abstract: We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat… ▽ More

    Submitted 28 February, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 40 pages, 15 figures

  13. arXiv:2502.15826  [pdf, other

    cs.CL cs.AI

    CoME: An Unlearning-based Approach to Conflict-free Model Editing

    Authors: Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

    Abstract: Large language models (LLMs) often retain outdated or incorrect information from pre-training, which undermines their reliability. While model editing methods have been developed to address such errors without full re-training, they frequently suffer from knowledge conflicts, where outdated information interferes with new knowledge. In this work, we propose Conflict-free Model Editing (CoME), a no… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted to NAACL 2025 main conference

  14. arXiv:2502.12545  [pdf, other

    cs.CV

    IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras

    Authors: Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha

    Abstract: We present a novel 3D reconstruction pipeline for 360$^\circ$ cameras for 3D mapping and rendering of indoor environments. Traditional Structure-from-Motion (SfM) methods may not work well in large-scale indoor scenes due to the prevalence of textureless and repetitive regions. To overcome these challenges, our approach (IM360) leverages the wide field of view of omnidirectional images and integra… ▽ More

    Submitted 19 February, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  15. arXiv:2502.11330  [pdf, other

    cs.CL cs.AI

    System Message Generation for User Preferences using Open-Source Models

    Authors: Minbyul Jeong, Jungho Cho, Minsoo Khang, Dawoon Jung, Teakgyu Hong

    Abstract: System messages play a crucial role in interactions with large language models (LLMs), often serving as prompts to initiate conversations. Through system messages, users can assign specific roles, perform intended tasks, incorporate background information, and specify various output formats and communication styles. Despite such versatility, publicly available datasets often lack system messages a… ▽ More

    Submitted 22 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  16. arXiv:2501.14328  [pdf, other

    cs.CR cs.AR

    Securing DRAM at Scale: ARFM-Driven Row Hammer Defense with Unveiling the Threat of Short tRC Patterns

    Authors: Nogeun Joo, Donghyuk Kim, Hyunjun Cho, Junseok Noh, Dongha Jung, Joo-Young Kim

    Abstract: To address the issue of powerful row hammer (RH) attacks, our study involved an extensive analysis of the prevalent attack patterns in the field. We discovered a strong correlation between the timing and density of the active-to-active command period, ${tRC}$, and the likelihood of RH attacks. In this paper, we introduce MARC, an innovative ARFM-driven RH mitigation IP that significantly reinforce… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: 12 pages, 19 figures

  17. arXiv:2501.13277  [pdf

    cs.CV

    MEDFORM: A Foundation Model for Contrastive Learning of CT Imaging and Clinical Numeric Data in Multi-Cancer Analysis

    Authors: Daeun Jung, Jaehyeok Jang, Sooyoung Jang, Yu Rang Park

    Abstract: Computed tomography (CT) and clinical numeric data are essential modalities for cancer evaluation, but building large-scale multimodal training datasets for developing medical foundation models remains challenging due to the structural complexity of multi-slice CT data and high cost of expert annotation. In this study, we propose MEDFORM, a multimodal pre-training strategy that guides CT image rep… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: 8 pages, 1 figure

  18. arXiv:2501.10913  [pdf, other

    cs.CV cs.CL

    Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

    Authors: Junsung Park, Jungbeom Lee, Jongyoon Song, Sangwon Yu, Dahuin Jung, Sungroh Yoon

    Abstract: While CLIP has significantly advanced multimodal understanding by bridging vision and language, the inability to grasp negation - such as failing to differentiate concepts like "parking" from "no parking" - poses substantial challenges. By analyzing the data used in the public CLIP model's pre-training, we posit this limitation stems from a lack of negation-inclusive data. To address this, we intr… ▽ More

    Submitted 31 March, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

  19. GOTPR: General Outdoor Text-based Place Recognition Using Scene Graph Retrieval with OpenStreetMap

    Authors: Donghwi Jung, Keonwoo Kim, Seong-Woo Kim

    Abstract: We propose GOTPR, a robust place recognition method designed for outdoor environments where GPS signals are unavailable. Unlike existing approaches that use point cloud maps, which are large and difficult to store, GOTPR leverages scene graphs generated from text descriptions and maps for place recognition. This method improves scalability by replacing point clouds with compact data structures, al… ▽ More

    Submitted 22 May, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Journal ref: IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 6488-6495, June 2025

  20. arXiv:2501.03700  [pdf, other

    cs.CV cs.AI

    AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features

    Authors: Ruochen Zhang, Hyeung-Sik Choi, Dongwook Jung, Phan Huy Nam Anh, Sang-Ki Jeong, Zihao Zhu

    Abstract: Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and hinder real-time performance. To overcome these limitations, we propose AuxDepthNet, an efficient framework for real-time monocular… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  21. arXiv:2410.14902  [pdf, other

    cs.IT

    Modeling and Analysis of Hybrid GEO-LEO Satellite Networks

    Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

    Abstract: As the number of low Earth orbit (LEO) satellites rapidly increases, the consideration of frequency sharing or cooperation between geosynchronous Earth orbit (GEO) and LEO satellites is gaining attention. In this paper, we consider a hybrid GEO-LEO satellite network where GEO and LEO satellites are distributed according to independent Poisson point processes (PPPs) and share the same frequency res… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 5 pages, 4 figures, 1 table, submitted to IEEE Transactions on Vehicular Technology

  22. arXiv:2410.04646  [pdf, other

    cs.CV cs.RO

    Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

    Authors: Yonghan Lee, Jaehoon Choi, Dongki Jung, Jaeseong Yun, Soohyun Ryu, Dinesh Manocha, Suyong Yeon

    Abstract: We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on t… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  23. arXiv:2410.03973  [pdf, other

    cs.LG stat.ML

    Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

    Authors: Jianxin Zhang, Josh Viktorov, Doosan Jung, Emily Pitler

    Abstract: Neural Stochastic Differential Equations (Neural SDEs) have emerged as powerful mesh-free generative models for continuous stochastic processes, with critical applications in fields such as finance, physics, and biology. Previous state-of-the-art methods have relied on adversarial training, such as GANs, or on minimizing distance measures between processes using signature kernels. However, GANs su… ▽ More

    Submitted 26 March, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  24. arXiv:2409.19840  [pdf, other

    cs.CV

    Textual Training for the Hassle-Free Removal of Unwanted Visual Data: Case Studies on OOD and Hateful Image Detection

    Authors: Saehyung Lee, Jisoo Mok, Sangha Park, Yongho Shin, Dahuin Jung, Sungroh Yoon

    Abstract: In our study, we explore methods for detecting unwanted content lurking in visual datasets. We provide a theoretical analysis demonstrating that a model capable of successfully partitioning visual data can be obtained using only textual data. Based on the analysis, we propose Hassle-Free Textual Training (HFTT), a streamlined method capable of acquiring detectors for unwanted visual content, using… ▽ More

    Submitted 23 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: NeurIPS 2024

  25. arXiv:2409.15326  [pdf

    cs.HC cs.AI

    Evaluating the Impact of a Specialized LLM on Physician Experience in Clinical Decision Support: A Comparison of Ask Avo and ChatGPT-4

    Authors: Daniel Jung, Alex Butler, Joongheum Park, Yair Saperstein

    Abstract: The use of Large language models (LLMs) to augment clinical decision support systems is a topic with rapidly growing interest, but current shortcomings such as hallucinations and lack of clear source citations make them unreliable for use in the clinical environment. This study evaluates Ask Avo, an LLM-derived software by AvoMD that incorporates a proprietary Language Model Augmented Retrieval (L… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 8 pages, 1 figure

  26. Point Cloud Structural Similarity-based Underwater Sonar Loop Detection

    Authors: Donghwi Jung, Andres Pulido, Jane Shin, Seong-Woo Kim

    Abstract: In this letter, we propose a point cloud structural similarity-based loop detection method for underwater Simultaneous Localization and Mapping using sonar sensors. Existing sonar-based loop detection approaches often rely on 2D projection and keypoint extraction, which can lead to data loss and poor performance in feature-scarce environments. Additionally, methods based on neural networks or Bag-… ▽ More

    Submitted 18 March, 2025; v1 submitted 21 September, 2024; originally announced September 2024.

    Journal ref: IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 3859-3866, April 2025

  27. arXiv:2409.12468  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation

    Authors: Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen

    Abstract: Retrieval-augmented generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieved from external sources. However, it often struggles to cope with inconsistent and irrelevant information that can distract the LM from its tasks, especially when multiple evidence pieces are required. While compressing the retrieved evidence with a compressi… ▽ More

    Submitted 16 December, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

  28. arXiv:2409.10027  [pdf, other

    cs.RO cs.AI

    E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

    Authors: Chan Kim, Keonwoo Kim, Mintaek Oh, Hanbi Baek, Jiyang Lee, Donghwi Jung, Soojin Woo, Younkyung Woo, John Tucker, Roya Firoozi, Seung-Woo Seo, Mac Schwager, Seong-Woo Kim

    Abstract: Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stocha… ▽ More

    Submitted 2 February, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 19 pages, 28 figures. Project page: https://e2map.github.io. Accepted to ICRA 2025

  29. arXiv:2408.08090  [pdf, other

    cs.IT

    UV-Plane Beam Mapping for Non-Terrestrial Networks in 3GPP System-Level Simulations

    Authors: Dong-Hyun Jung, Sucheol Kim, Miyeon Lee, Joon-Gyu Ryu, Junil Choi

    Abstract: Due to the high altitudes and large beam sizes of satellites, the curvature of the Earth's surface can impact system-level performance. To consider this, 3GPP introduces the UV-plane beam mapping for system-level simulations of non-terrestrial networks (NTNs). This paper aims to provide a comprehensive understanding of how beams and user equipments (UEs) are placed on the UV-plane and subsequently… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 5 pages, 9 figures, 1 table

  30. arXiv:2408.02872  [pdf, other

    cs.IT cs.NI

    Rate-Splitting for Joint Unicast and Multicast Transmission in LEO Satellite Networks with Non-Uniform Traffic Demand

    Authors: Jaehyup Seong, Juha Park, Dong-Hyun Jung, Jeonghun Park, Wonjae Shin

    Abstract: Low Earth orbit (LEO) satellite communications (SATCOM) with ubiquitous global connectivity is deemed a pivotal catalyst in advancing wireless communication systems for 5G and beyond. LEO SATCOM excels in delivering versatile information services across expansive areas, facilitating both unicast and multicast transmissions via high-speed broadband capability. Nonetheless, given the broadband cover… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 39 pages, 9 figures

  31. arXiv:2407.19849  [pdf, other

    cs.CV

    Normality Addition via Normality Detection in Industrial Image Anomaly Detection Models

    Authors: Jihun Yi, Dahuin Jung, Sungroh Yoon

    Abstract: The task of image anomaly detection (IAD) aims to identify deviations from normality in image data. These anomalies are patterns that deviate significantly from what the IAD model has learned from the data during training. However, in real-world scenarios, the criteria for what constitutes normality often change, necessitating the reclassification of previously anomalous instances as normal. To ad… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  32. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 6 October, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Published at EMNLP 2024 Findings

  33. arXiv:2407.01073  [pdf, other

    cs.RO

    No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

    Authors: Soojin Woo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  34. arXiv:2406.19848  [pdf, other

    cs.RO

    3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints

    Authors: Yoonkyu Yoo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic stru… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  35. arXiv:2406.17869  [pdf, other

    cs.CV

    Burst Image Super-Resolution with Base Frame Selection

    Authors: Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho

    Abstract: Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: CVPR2024W NTIRE accepted

  36. arXiv:2406.17256  [pdf, other

    cs.CV

    Disentangled Motion Modeling for Video Frame Interpolation

    Authors: Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon

    Abstract: Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works have employed generative models for improved perceptual quality. However, they require complex training and large computational costs for pixel space modeling. In this paper, we introd… ▽ More

    Submitted 18 December, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: AAAI 2025

  37. arXiv:2404.04819  [pdf, other

    cs.CV

    Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

    Authors: Hyeongjin Nam, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

    Abstract: Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Published at CVPR 2024, 19 pages including the supplementary material

  38. arXiv:2404.00450  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Planning and Editing What You Retrieve for Enhanced Tool Learning

    Authors: Tenghao Huang, Dongwon Jung, Muhao Chen

    Abstract: Recent advancements in integrating external tools with Large Language Models (LLMs) have opened new frontiers, with applications in mathematical reasoning, code generators, and smart assistants. However, existing methods, relying on simple one-time retrieval strategies, fall short on effectively and accurately shortlisting relevant tools. This paper introduces a novel PLUTO (Planning, Learning, an… ▽ More

    Submitted 4 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: This paper is accepted at NAACL-Findings 2024

  39. arXiv:2403.10911  [pdf, other

    cs.CV

    Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

    Authors: Yeongtak Oh, Jonghyun Lee, Jooyoung Choi, Dahuin Jung, Uiwon Hwang, Sungroh Yoon

    Abstract: Test-time adaptation (TTA) addresses the unforeseen distribution shifts occurring during test time. In TTA, performance, memory consumption, and time consumption are crucial considerations. A recent diffusion-based TTA approach for restoring corrupted images involves image-level updates. However, using pixel space diffusion significantly increases resource requirements compared to conventional mod… ▽ More

    Submitted 11 July, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: ECCV 2024 Camera Ready

  40. arXiv:2403.09055  [pdf, ps, other

    cs.CV

    SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models

    Authors: Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee

    Abstract: We introduce SemanticDraw, a new paradigm of interactive content creation where high-quality images are generated in near real-time from given multiple hand-drawn regions, each encoding prescribed semantic meaning. In order to maximize the productivity of content creators and to fully realize their artistic imagination, it requires both quick interactive interfaces and fine-grained regional contro… ▽ More

    Submitted 1 June, 2025; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: CVPR 2025 camera ready

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  41. arXiv:2403.07366  [pdf, other

    cs.CV cs.LG

    Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors

    Authors: Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, Sungroh Yoon

    Abstract: Test-time adaptation (TTA) fine-tunes pre-trained deep neural networks for unseen test data. The primary challenge of TTA is limited access to the entire test dataset during online updates, causing error accumulation. To mitigate it, TTA methods have utilized the model output's entropy as a confidence metric that aims to determine which samples have a lower likelihood of causing error. Through exp… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 Spotlight; 26 pages, 9 figures, 20 tables;

  42. arXiv:2402.04535  [pdf, other

    cs.RO

    MuNES: Multifloor Navigation Including Elevators and Stairs

    Authors: Donghwi Jung, Chan Kim, Jae-Kyung Cho, Seong-Woo Kim

    Abstract: We propose a scheme called MuNES for single mapping and trajectory planning including elevators and stairs. Optimized multifloor trajectories are important for optimal interfloor movements of robots. However, given two or more options of moving between floors, it is difficult to select the best trajectory because there are no suitable indoor multifloor maps in the existing methods. To solve this p… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  43. arXiv:2401.14616  [pdf, other

    cs.CL cs.AI

    Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

    Authors: Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim

    Abstract: We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can c… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted for The First Workshop on Data-Centric AI (DCAI) at ICDM 2023

  44. arXiv:2401.04143  [pdf, other

    cs.CV

    RHOBIN Challenge: Reconstruction of Human Object Interaction

    Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeongjin Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

    Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

  45. arXiv:2312.15924  [pdf, other

    cs.IT eess.SP

    Modeling and Analysis of GEO Satellite Networks

    Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

    Abstract: The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to mod… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 12 pages, 9 figures, submitted to IEEE Transactions on Wireless Communications

  46. arXiv:2312.04266  [pdf, other

    cs.CV

    Activity Grammars for Temporal Action Segmentation

    Authors: Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

    Abstract: Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective act… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted to NeurIPS 2023

  47. arXiv:2311.15890  [pdf, other

    cs.LG cs.CV

    Stability-Informed Initialization of Neural Ordinary Differential Equations

    Authors: Theodor Westny, Arman Mohammadi, Daniel Jung, Erik Frisk

    Abstract: This paper addresses the training of Neural Ordinary Differential Equations (neural ODEs), and in particular explores the interplay between numerical integration techniques, stability regions, step size, and initialization techniques. It is shown how the choice of integration technique implicitly regularizes the learned model, and how the solver's corresponding stability region affects training an… ▽ More

    Submitted 6 August, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: In Proceedings of the 41 st International Conference on Machine Learning

  48. arXiv:2310.16492  [pdf, other

    cs.CV cs.AI cs.LG

    On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection

    Authors: Sangha Park, Jisoo Mok, Dahuin Jung, Saehyung Lee, Sungroh Yoon

    Abstract: Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  49. arXiv:2310.10088  [pdf, other

    eess.IV cs.CV cs.LG

    PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising

    Authors: Hyemi Jang, Junsung Park, Dahuin Jung, Jaihyun Lew, Ho Bae, Sungroh Yoon

    Abstract: Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised deno… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  50. arXiv:2309.01943  [pdf, other

    cs.CV

    Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery

    Authors: JoonKyu Park, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

    Abstract: Understanding how two hands interact with each other is a key component of accurate 3D interacting hand mesh recovery. However, recent Transformer-based methods struggle to learn the interaction between two hands as they directly utilize two hand features as input tokens, which results in distant token problem. The distant token problem represents that input tokens are in heterogeneous spaces, lea… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted at ICCVW 2023