Skip to main content

Showing 1–20 of 20 results for author: Kim, W J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.04353  [pdf, ps, other

    cs.CV cs.AI cs.CE cs.CL cs.LG

    ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding

    Authors: Ankit Pal, Jung-Oh Lee, Xiaoman Zhang, Malaikannan Sankarasubbu, Seunghyeon Roh, Won Jung Kim, Meesun Lee, Pranav Rajpurkar

    Abstract: We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core r… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  2. arXiv:2505.22978  [pdf, ps, other

    cs.CV

    Pose-free 3D Gaussian splatting via shape-ray estimation

    Authors: Youngju Na, Taeyeon Kim, Jumin Lee, Kyu Beom Han, Woo Jae Kim, Sung-eui Yoon

    Abstract: While generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcome… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: ICIP 2025

  3. arXiv:2504.08703  [pdf, other

    cs.SE

    SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

    Authors: Muhammad Shihab Rashid, Christian Bock, Yuan Zhuang, Alexander Buchholz, Tim Esler, Simon Valentin, Luca Franceschi, Martin Wistuba, Prabhu Teja Sivaprasad, Woo Jung Kim, Anoop Deoras, Giovanni Zappella, Laurent Callot

    Abstract: Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains challenging. We introduce SWE-PolyBench, a new multi-language benchmark for repository-level, execution-based evaluation of coding agents. SWE-PolyBench contains 2110 instances from 21… ▽ More

    Submitted 23 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: 20 pages, 6 figures, corrected author name spelling

  4. arXiv:2503.10081  [pdf, other

    cs.CV cs.CR

    AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption

    Authors: Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-eui Yoon

    Abstract: The outstanding capability of diffusion models in generating high-quality images poses significant threats when misused by adversaries. In particular, we assume malicious adversaries exploiting diffusion models for inpainting tasks, such as replacing a specific region with a celebrity. While existing methods for protecting images from manipulation in diffusion-based generative models have primaril… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Accepted to ICLR 2025

  5. arXiv:2411.15265  [pdf, other

    cs.CV cs.LG

    Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI

    Authors: Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye

    Abstract: Gradient-based methods are a prototypical family of explainability techniques, especially for image-based models. Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

    Comments: 19 pages, 5 figures

  6. arXiv:2411.11471  [pdf, other

    cs.CV

    Generalizable Person Re-identification via Balancing Alignment and Uniformity

    Authors: Yoonki Cho, Jaeyoon Kim, Woo Jae Kim, Junsik Jung, Sung-eui Yoon

    Abstract: Domain generalizable person re-identification (DG re-ID) aims to learn discriminative representations that are robust to distributional shifts. While data augmentation is a straightforward solution to improve generalization, certain augmentations exhibit a polarized effect in this task, enhancing in-distribution performance while deteriorating out-of-distribution performance. In this paper, we inv… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  7. arXiv:2407.09303  [pdf, other

    cs.CV

    ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

    Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee

    Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework calle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project Page: https://sungmin-woo.github.io/prodepth/

  8. arXiv:2403.05086  [pdf, other

    cs.CV

    UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets

    Authors: Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, Sung-eui Yoon

    Abstract: Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable com… ▽ More

    Submitted 17 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: accepted at CVPR 2024 project page: https://youngju-na.github.io/uforecon.github.io/

  9. Deep Video Inpainting Guided by Audio-Visual Self-Supervision

    Authors: Kyuyeon Kim, Junsik Jung, Woo Jae Kim, Sung-Eui Yoon

    Abstract: Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of video inpainting. To implement the prior knowledge, we first train the audio-visual network, which learns the correspondence between auditory and visual information. Then, the audio-vis… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2022

  10. arXiv:2309.05438  [pdf, other

    cs.CV cs.IR

    Towards Content-based Pixel Retrieval in Revisited Oxford and Paris

    Authors: Guoyuan An, Woo Jae Kim, Saelyne Yang, Rong Li, Yuchi Huo, Sung-Eui Yoon

    Abstract: This paper introduces the first two pixel retrieval benchmarks. Pixel retrieval is segmented instance retrieval. Like semantic segmentation extends classification to the pixel level, pixel retrieval is an extension of image retrieval and offers information about which pixels are related to the query object. In addition to retrieving images for the given query, it helps users quickly identify the q… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  11. arXiv:2305.11490  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation

    Authors: Suhyeon Lee, Won Jun Kim, Jinho Chang, Jong Chul Ye

    Abstract: Following the impressive development of LLMs, vision-language alignment in LLMs is actively being researched to enable multimodal reasoning and visual IO. This direction of research is particularly relevant to medical imaging because medical image analysis and generation consist of reasoning based on a combination of visual features and prior knowledge. Many recent works have focused on training a… ▽ More

    Submitted 17 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 21 pages, 8 figures; ICLR 2024 (poster)

  12. arXiv:2304.04967  [pdf

    cs.GR cs.CV

    Pixel-wise Guidance for Utilizing Auxiliary Features in Monte Carlo Denoising

    Authors: Kyu Beom Han, Olivia G. Odenthal, Woo Jae Kim, Sung-Eui Yoon

    Abstract: Auxiliary features such as geometric buffers (G-buffers) and path descriptors (P-buffers) have been shown to significantly improve Monte Carlo (MC) denoising. However, recent approaches implicitly learn to exploit auxiliary features for denoising, which could lead to insufficient utilization of each type of auxiliary features. To overcome such an issue, we propose a denoising framework that relies… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: 19 pages

  13. arXiv:2303.13846  [pdf, other

    cs.CV

    Feature Separation and Recalibration for Adversarial Robustness

    Authors: Woo Jae Kim, Yoonki Cho, Junsik Jung, Sung-Eui Yoon

    Abstract: Deep neural networks are susceptible to adversarial attacks due to the accumulation of perturbations in the feature level, and numerous works have boosted model robustness by deactivating the non-robust feature activations that cause model mispredictions. However, we claim that these malicious activations still contain discriminative cues and that with recalibration, they can capture additional us… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 (Highlight)

  14. arXiv:2209.03139  [pdf, other

    cs.CV

    Pixel-Level Equalized Matching for Video Object Segmentation

    Authors: Suhwan Cho, Woo Jin Kim, MyeongAh Cho, Seunghoon Lee, Minhyeok Lee, Chaewon Park, Sangyoun Lee

    Abstract: Feature similarity matching, which transfers the information of the reference frame to the query frame, is a key component in semi-supervised video object segmentation. If surjective matching is adopted, background distractors can easily occur and degrade the performance. Bijective matching mechanisms try to prevent this by restricting the amount of information being transferred to the query frame… ▽ More

    Submitted 1 February, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

  15. arXiv:2208.05650  [pdf, other

    cs.CV

    Diverse Generative Perturbations on Attention Space for Transferable Adversarial Attacks

    Authors: Woo Jae Kim, Seunghoon Hong, Sung-Eui Yoon

    Abstract: Adversarial attacks with improved transferability - the ability of an adversarial example crafted on a known model to also fool unknown models - have recently received much attention due to their practicality. Nevertheless, existing transferable attacks craft perturbations in a deterministic manner and often fail to fully explore the loss surface, thus falling into a poor local optimum and sufferi… ▽ More

    Submitted 2 December, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: ICIP 2022 (Oral)

  16. arXiv:2203.14675  [pdf, other

    cs.CV

    Part-based Pseudo Label Refinement for Unsupervised Person Re-identification

    Authors: Yoonki Cho, Woo Jae Kim, Seunghoon Hong, Sung-Eui Yoon

    Abstract: Unsupervised person re-identification (re-ID) aims at learning discriminative representations for person retrieval from unlabeled data. Recent techniques accomplish this task by using pseudo-labels, but these labels are inherently noisy and deteriorate the accuracy. To overcome this problem, several pseudo-label refinement methods have been proposed, but they neglect the fine-grained local context… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  17. arXiv:2203.09456  [pdf, other

    physics.chem-ph cs.AI cs.LG

    MolNet: A Chemically Intuitive Graph Neural Network for Prediction of Molecular Properties

    Authors: Yeji Kim, Yoonho Jeong, Jihoo Kim, Eok Kyun Lee, Won June Kim, Insung S. Choi

    Abstract: The graph neural network (GNN) has been a powerful deep-learning tool in chemistry domain, due to its close connection with molecular graphs. Most GNN models collect and update atom and molecule features from the fed atom (and, in some cases, bond) features, which are basically based on the two-dimensional (2D) graph representation of 3D molecules. Correspondingly, the adjacency matrix, containing… ▽ More

    Submitted 1 February, 2022; originally announced March 2022.

    Comments: 25 pages including 6-page Supporting Information

  18. arXiv:2110.07128  [pdf, other

    cs.HC

    WebAssembly enables low latency interoperable augmented and virtual reality software

    Authors: Woo Jae Kim, Bohdan B. Khomtchouk

    Abstract: There is a clear difference in runtime performance between native applications that use augmented/virtual reality (AR/VR) device-specific hardware and comparable web-based implementations. Here we show that WebAssembly (Wasm) offers a promising developer solution that can bring near-native low latency performance to web-based applications, enabling hardware-agnostic interoperability at scale throu… ▽ More

    Submitted 2 December, 2024; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 11 pages, 2 figures

  19. arXiv:2010.07524  [pdf, other

    cs.CV

    Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit Latent Features

    Authors: MyeongAh Cho, Taeoh Kim, Woo Jin Kim, Suhwan Cho, Sangyoun Lee

    Abstract: In contemporary society, surveillance anomaly detection, i.e., spotting anomalous events such as crimes or accidents in surveillance videos, is a critical task. As anomalies occur rarely, most training data consists of unlabeled videos without anomalous events, which makes the task challenging. Most existing methods use an autoencoder (AE) to learn to reconstruct normal videos; they then detect an… ▽ More

    Submitted 3 August, 2022; v1 submitted 15 October, 2020; originally announced October 2020.

  20. arXiv:2001.02090  [pdf, other

    cs.CV

    AD-VO: Scale-Resilient Visual Odometry Using Attentive Disparity Map

    Authors: Joosung Lee, Sangwon Hwang, Kyungjae Lee, Woo Jin Kim, Junhyeop Lee, Tae-young Chung, Sangyoun Lee

    Abstract: Visual odometry is an essential key for a localization module in SLAM systems. However, previous methods require tuning the system to adapt environment changes. In this paper, we propose a learning-based approach for frame-to-frame monocular visual odometry estimation. The proposed network is only learned by disparity maps for not only covering the environment changes but also solving the scale pr… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

    Comments: 5 pages, 5 figures, 2018.02 papers