Skip to main content

Showing 1–50 of 192 results for author: Kwak, S

.
  1. arXiv:2506.02882  [pdf, ps, other

    cs.CV

    GaRA-SAM: Robustifying Segment Anything Model with Gated-Rank Adaptation

    Authors: Sohyun Lee, Yeho Kwon, Lukas Hoyer, Suha Kwak

    Abstract: Improving robustness of the Segment Anything Model (SAM) to input degradations is critical for its deployment in high-stakes applications such as autonomous driving and robotics. Our approach to this challenge prioritizes three key aspects: first, parameter efficiency to maintain the inherent generalization capability of SAM; second, fine-grained and input-aware robustification to precisely addres… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  2. arXiv:2505.13232  [pdf, other

    cs.AI cs.CV

    StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment

    Authors: Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin

    Abstract: Learning robust representations from data often requires scale, which has led to the success of recent zero-shot models such as CLIP. However, the obtained robustness can easily be deteriorated when these models are fine-tuned on other downstream tasks (e.g., of smaller scales). Previous works often interpret this phenomenon in the context of domain shift, developing fine-tuning methods that aim t… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: IJCAI 2025; Code is available at https://github.com/alinlab/StarFT

  3. arXiv:2504.15118  [pdf, other

    cs.CV cs.SD

    Improving Sound Source Localization with Joint Slot Attention on Image and Audio

    Authors: Inho Kim, Youngkil Song, Jicheol Park, Won Hwa Kim, Suha Kwak

    Abstract: Sound source localization (SSL) is the task of locating the source of sound within an image. Due to the lack of localization labels, the de facto standard in SSL has been to represent an image and audio as a single embedding vector each, and use them to learn SSL via contrastive learning. To this end, previous work samples one of local image features as the image embedding and aggregates all local… ▽ More

    Submitted 11 May, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025

  4. arXiv:2504.04981  [pdf, ps, other

    cs.CV cs.AI

    TestDG: Test-time Domain Generalization for Continual Test-time Adaptation

    Authors: Sohyun Lee, Nayeong Kim, Juwon Kang, Seong Joon Oh, Suha Kwak

    Abstract: This paper studies continual test-time adaptation (CTTA), the task of adapting a model to constantly changing unseen domains in testing while preserving previously learned knowledge. Existing CTTA methods mostly focus on adaptation to the current test domain only, overlooking generalization to arbitrary test domains a model may face in the future. To tackle this limitation, we present a novel onli… ▽ More

    Submitted 3 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  5. arXiv:2504.02397  [pdf, other

    cs.CV

    Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval

    Authors: Boseung Jeong, Jicheol Park, Sungyeon Kim, Suha Kwak

    Abstract: Video-text retrieval, the task of retrieving videos based on a textual query or vice versa, is of paramount importance for video understanding and multimodal information retrieval. Recent methods in this area rely primarily on visual and textual features and often ignore audio, although it helps enhance overall comprehension of video content. Moreover, traditional models that incorporate audio bli… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025

  6. arXiv:2503.19868  [pdf, other

    cs.IR cs.AI cs.CV cs.LG

    GENIUS: A Generative Framework for Universal Multimodal Search

    Authors: Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak

    Abstract: Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework suppo… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  7. arXiv:2502.06209  [pdf, other

    cs.LG cs.CV

    Enhancing Cost Efficiency in Active Learning with Candidate Set Query

    Authors: Yeho Gwon, Sehyun Hwang, Hoyoung Kim, Jungseul Ok, Suha Kwak

    Abstract: This paper introduces a cost-efficient active learning (AL) framework for classification, featuring a novel query design called candidate set query. Unlike traditional AL queries requiring the oracle to examine all possible classes, our method narrows down the set of candidate classes likely to include the ground-truth class, significantly reducing the search space and labeling cost. Moreover, we… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 20 pages, 17 figures, 4 tables

  8. arXiv:2501.07730  [pdf, other

    cs.CV

    Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

    Authors: Dongwon Kim, Ju He, Qihang Yu, Chenglin Yang, Xiaohui Shen, Suha Kwak, Liang-Chieh Chen

    Abstract: Image tokenizers form the foundation of modern text-to-image generative models but are notoriously difficult to train. Furthermore, most existing text-to-image models rely on large-scale, high-quality private datasets, making them challenging to replicate. In this work, we introduce Text-Aware Transformer-based 1-Dimensional Tokenizer (TA-TiTok), an efficient and powerful image tokenizer that can… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

    Comments: Project page at https://tacju.github.io/projects/maskgen.html

  9. arXiv:2501.03714  [pdf, other

    cs.CV

    MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

    Authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim

    Abstract: 3D Gaussian Splatting (3DGS) has made significant strides in scene representation and neural rendering, with intense efforts focused on adapting it for dynamic scenes. Despite delivering remarkable rendering quality and speed, existing methods struggle with storage demands and representing complex real-world motions. To tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting… ▽ More

    Submitted 24 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: CVPR2025 (camera ready ver.). The last two authors are co-corresponding authors. Please visit our project page at https://kaist-viclab.github.io/MoDecGS-site/

  10. arXiv:2501.00318  [pdf, other

    cs.CV cs.LG

    Improving Text-based Person Search via Part-level Cross-modal Correspondence

    Authors: Jicheol Park, Boseung Jeong, Dongwon Kim, Suha Kwak

    Abstract: Text-based person search is the task of finding person images that are the most relevant to the natural language text description given as query. The main challenge of this task is a large gap between the target images and text queries, which makes it difficult to establish correspondence and distinguish subtle differences across people. To address this challenge, we introduce an efficient encoder… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

  11. arXiv:2412.12042  [pdf, other

    cs.HC cs.AI

    The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports

    Authors: Julián N. Acosta, Siddhant Dogra, Subathra Adithan, Kay Wu, Michael Moritz, Stephen Kwak, Pranav Rajpurkar

    Abstract: Radiologists face increasing workload pressures amid growing imaging volumes, creating risks of burnout and delayed reporting times. While artificial intelligence (AI) based automated radiology report generation shows promise for reporting workflow optimization, evidence of its real-world impact on clinical accuracy and efficiency remains limited. This study evaluated the effect of draft reports o… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  12. arXiv:2412.04353  [pdf, other

    cs.CV cs.LG

    ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

    Authors: Dayoung Gong, Suha Kwak, Minsu Cho

    Abstract: Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated as separate and distinct tasks. In this work, we tackle these two problems, action segmentation and action anticipation, jointly using a unified diffusion model… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: Accepted to NeurIPS 2024

  13. arXiv:2411.16801  [pdf, other

    cs.CV

    Controllable Human Image Generation with Personalized Multi-Garments

    Authors: Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin

    Abstract: We present BootComp, a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments. Here, the main bottleneck is data acquisition for training: collecting a large-scale dataset of high-quality reference garment images per human subject is quite challenging, i.e., ideally, one needs to manually gather every single garment photogra… ▽ More

    Submitted 1 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: CVPR 2025. Project page: https://omnious.github.io/BootComp

  14. arXiv:2411.09064  [pdf, other

    stat.ML cs.CR cs.LG

    Minimax Optimal Two-Sample Testing under Local Differential Privacy

    Authors: Jongmin Mun, Seungwoo Kwak, Ilmun Kim

    Abstract: We explore the trade-off between privacy and statistical utility in private two-sample testing under local differential privacy (LDP) for both multinomial and continuous data. We begin by addressing the multinomial case, where we introduce private permutation tests using practical privacy mechanisms such as Laplace, discrete Laplace, and Google's RAPPOR. We then extend our multinomial approach to… ▽ More

    Submitted 22 November, 2024; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: 66 pages, 6 figures, 1 table; added a graphical illustration of central and local differential privacy in Section 1, referenced the Python package, fixed typos, and changed the citation style

    MSC Class: 62G10

  15. arXiv:2411.01801  [pdf, other

    cs.CV cs.LG

    Bootstrapping Top-down Information for Self-modulating Slot Attention

    Authors: Dongwon Kim, Seoyeon Kim, Suha Kwak

    Abstract: Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning. Traditional OCL methods primarily employ bottom-up approaches that aggregate homogeneous visual features to represent objects. However, in complex visual environments, these methods often fall short due to the hete… ▽ More

    Submitted 7 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024

  16. arXiv:2410.20877  [pdf, other

    math.GT math.GR

    Surfaces proper homotopy equivalent to graphs and their Dehn-Nielsen-Baer maps

    Authors: Ryan Dickmann, Hannah Hoganson, Sanghoon Kwak

    Abstract: Motivated by the recent work of Algom-Kfir and Bestinva introducing the mapping class group of an infinite graph via proper homotopy equivalences, we give a necessary and sufficient condition for a surface to be properly homotopy equivalent to a graph. We consider second-countable orientable surfaces that are possibly infinite-type and have noncompact boundary. For surfaces proper homotopy equival… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: 31 pages, 10 figures. Comments welcome!

  17. arXiv:2410.06940  [pdf, other

    cs.CV cs.LG

    Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

    Authors: Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, Saining Xie

    Abstract: Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learni… ▽ More

    Submitted 28 February, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 (Oral). Project page: https://sihyun.me/REPA

  18. arXiv:2409.15591  [pdf, other

    math.GR math.DS math.GT

    Nonunique Ergodicity on the Boundary of Outer space

    Authors: Mladen Bestvina, Elizabeth Field, Sanghoon Kwak

    Abstract: To an $\mathbb{R}$-tree in the boundary of Outer space, we associate two simplices: the simplex of projective length measures, and the simplex of projective dual currents. For both kinds of simplices, we estimate the dimension of maximal simplices for arational $\mathbb{R}$-trees in the boundary of Outer space.

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 42 pages, 9 figures. Comments are welcome!

  19. arXiv:2409.13475  [pdf, other

    cs.CV

    PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery

    Authors: Jicheol Park, Dongwon Kim, Boseung Jeong, Suha Kwak

    Abstract: Text-based person search, employing free-form text queries to identify individuals within a vast image collection, presents a unique challenge in aligning visual and textual representations, particularly at the human part level. Existing methods often struggle with part feature extraction and alignment due to the lack of direct part-level supervision and reliance on heuristic features. We propose… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  20. arXiv:2409.03303  [pdf, other

    cs.LG cs.CV

    Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization

    Authors: Nayeong Kim, Juwon Kang, Sungsoo Ahn, Jungseul Ok, Suha Kwak

    Abstract: We study the problem of training an unbiased and accurate model given a dataset with multiple biases. This problem is challenging since the multiple biases cause multiple undesirable shortcuts during training, and even worse, mitigating one may exacerbate the other. We propose a novel training method to tackle this challenge. Our method first groups training data so that different groups induce di… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: International Conference on Machine Learning 2024

  21. arXiv:2408.05749  [pdf, other

    cs.CV cs.LG

    Efficient and Versatile Robust Fine-Tuning of Zero-shot Models

    Authors: Sungyeon Kim, Boseung Jeong, Donghyun Kim, Suha Kwak

    Abstract: Large-scale image-text pre-trained models enable zero-shot classification and provide consistent accuracy across various data distributions. Nonetheless, optimizing these models in downstream tasks typically requires fine-tuning, which reduces generalization to out-of-distribution (OOD) data and demands extensive computational resources. We introduce Robust Adapter (R-Adapter), a novel method for… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  22. arXiv:2408.02957  [pdf, other

    cs.CV

    Online Temporal Action Localization with Memory-Augmented Transformer

    Authors: Youngkil Song, Dongkeun Kim, Minsu Cho, Suha Kwak

    Abstract: Online temporal action localization (On-TAL) is the task of identifying multiple action instances given a streaming video. Since existing methods take as input only a video segment of fixed size per iteration, they are limited in considering long-term context and require tuning the segment size carefully. To overcome these limitations, we propose memory-augmented transformer (MATR). MATR utilizes… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024, Project page: https://cvlab.postech.ac.kr/research/MATR/

  23. arXiv:2407.19698  [pdf, other

    cs.CV

    Classification Matters: Improving Video Action Detection with Class-Specific Attention

    Authors: Jinsung Lee, Taeoh Kim, Inwoong Lee, Minho Shim, Dongyoon Wee, Minsu Cho, Suha Kwak

    Abstract: Video action detection (VAD) aims to detect actors and classify their actions in a video. We figure that VAD suffers more from classification rather than localization of actors. Hence, we analyze how prevailing methods form features for classification and find that they prioritize actor regions, yet often overlooking the essential contextual information necessary for accurate classification. Accor… ▽ More

    Submitted 11 September, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: 31 pages, accepted to ECCV 2024 (oral)

  24. arXiv:2407.13437  [pdf, other

    cs.CV

    FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions

    Authors: Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak

    Abstract: Robust semantic segmentation under adverse conditions is crucial in real-world applications. To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a novel feature restoration framework for source-free domain adaptation (SFDA) of semantic segmentation to adverse conditions. FREST alternates two steps: (1) lear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  25. arXiv:2406.06496  [pdf, other

    cs.LG cs.CL cs.CV

    Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

    Authors: Oishi Banerjee, Hong-Yu Zhou, Subathra Adithan, Stephen Kwak, Kay Wu, Pranav Rajpurkar

    Abstract: Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrai… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Added acknowledgemnts

  26. arXiv:2405.20729  [pdf, other

    cs.CV

    Extreme Point Supervised Instance Segmentation

    Authors: Hyeonjun Lee, Sehyun Hwang, Suha Kwak

    Abstract: This paper introduces a novel approach to learning instance segmentation using extreme points, i.e., the topmost, leftmost, bottommost, and rightmost points, of each object. These points are readily available in the modern bounding box annotation process while offering strong clues for precise segmentation, and thus allows to improve performance at the same annotation cost with box-supervised meth… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  27. arXiv:2405.05967  [pdf, other

    cs.CV cs.GR cs.LG

    Distilling Diffusion Models into Conditional GANs

    Authors: Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

    Abstract: We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose… ▽ More

    Submitted 17 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Project page: https://mingukkang.github.io/Diffusion2GAN/ (ECCV2024)

  28. arXiv:2403.10820  [pdf, other

    cs.CV

    Active Label Correction for Semantic Segmentation with Foundation Models

    Authors: Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok

    Abstract: Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels,… ▽ More

    Submitted 4 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  29. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  30. arXiv:2403.05139  [pdf, other

    cs.CV

    Improving Diffusion Models for Authentic Virtual Try-on in the Wild

    Authors: Yisol Choi, Sangkyung Kwak, Kyungmin Lee, Hyungwon Choi, Jinwoo Shin

    Abstract: This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve… ▽ More

    Submitted 29 July, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: ECCV 2024

  31. arXiv:2402.12004  [pdf, other

    cs.CV

    Direct Consistency Optimization for Robust Customization of Text-to-Image Diffusion Models

    Authors: Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin

    Abstract: Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, can generate visuals with a high degree of consistency. However, such fine-tuned models are not robust; they often fail to compose with concepts of pretrained model or other fine-tuned models. To address this, we propose a novel fine-tuning objective, dubbed Direct Consistency Optimization, which controls the deviation… ▽ More

    Submitted 12 December, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024. Project page: https://dco-t2i.github.io/

  32. arXiv:2401.14654  [pdf, other

    cs.CL cs.LG

    A Korean Legal Judgment Prediction Dataset for Insurance Disputes

    Authors: Alice Saebom Kwak, Cheonkam Jeong, Ji Weon Lim, Byeongcheol Min

    Abstract: This paper introduces a Korean legal judgment prediction (LJP) dataset for insurance disputes. Successful LJP models on insurance disputes can benefit insurance companies and their customers. It can save both sides' time and money by allowing them to predict how the result would come out if they proceed to the dispute mediation process. As is often the case with low-resource languages, there is a… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 5 pages, 1 figure

  33. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

  34. arXiv:2312.11421  [pdf, other

    eess.SP

    Frequency analysis and filter design for directed graphs with polar decomposition

    Authors: Semin Kwak, Laura Shimabukuro, Antonio Ortega

    Abstract: In this study, we challenge the traditional approach of frequency analysis on directed graphs, which typically relies on a single measure of signal variation such as total variation. We argue that the inherent directionality in directed graphs necessitates a multifaceted analytical approach that incorporates multiple signal variations definitions. Our methodology leverages the polar decomposition… ▽ More

    Submitted 15 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Conference paper accepted for ICASSP 2024

  35. arXiv:2312.04266  [pdf, other

    cs.CV

    Activity Grammars for Temporal Action Segmentation

    Authors: Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

    Abstract: Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective act… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted to NeurIPS 2023

  36. arXiv:2312.02878  [pdf, other

    cs.CV

    Towards More Practical Group Activity Detection: A New Benchmark and Model

    Authors: Dongkeun Kim, Youngkil Song, Minsu Cho, Suha Kwak

    Abstract: Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video. While GAD has been studied recently, there is still much room for improvement in both dataset and methodology due to their limited capability to address practical GAD scenarios. To resolve these issues, we first present a new dataset, dubbed Café. U… ▽ More

    Submitted 25 July, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024, Project page: https://cvlab.postech.ac.kr/research/CAFE

  37. arXiv:2312.02361  [pdf, other

    math.GT

    Coarsely bounded generating sets for mapping class groups of infinite-type surfaces

    Authors: Thomas Hill, Sanghoon Kwak, Rebecca Rechkin

    Abstract: Mann and Rafi's seminal work initiated the study of the coarse geometry of big mapping class groups. Specifically, they construct coarsely bounded (CB) generating sets for mapping class groups of a large class of infinite-type surfaces. In this expository note, we illustrate examples of surfaces whose mapping class groups admit such generating sets, as well as those that do not, with the goal of e… ▽ More

    Submitted 22 May, 2025; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 22 pages, 16 figures. v2: incorporated referee's comments

  38. arXiv:2310.17811  [pdf, other

    cs.AI cs.CL

    Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting

    Authors: Benjamin Yan, Ruochen Liu, David E. Kuo, Subathra Adithan, Eduardo Pontes Reis, Stephen Kwak, Vasantha Kumar Venugopal, Chloe P. O'Connell, Agustina Saenz, Pranav Rajpurkar, Michael Moor

    Abstract: Automatically generated reports from medical images promise to improve the workflow of radiologists. Existing methods consider an image-to-report modeling task by directly generating a fully-fledged report from an image. However, this conflates the content of the report (e.g., findings and their attributes) with its style (e.g., format and choice of words), which can lead to clinically inaccurate… ▽ More

    Submitted 31 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  39. arXiv:2309.15266  [pdf, ps, other

    math.OC

    A New Spectral Conjugate Subgradient Method with Application in Computed Tomography Image Reconstruction

    Authors: Milagros Loreto, Thomas Humphries, Chella Raghavan, Kenneth Wu, Sam Kwak

    Abstract: A new spectral conjugate subgradient method is presented to solve nonsmooth unconstrained optimization problems. The method combines the spectral conjugate gradient method for smooth problems with the spectral subgradient method for nonsmooth problems. We study the effect of two different choices of line search, as well as three formulas for determining the conjugate directions. In addition to num… ▽ More

    Submitted 5 June, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 23 pages, 7 figures

    MSC Class: 90C30; 90C56; 94A08

  40. arXiv:2309.09319  [pdf, other

    cs.CV cs.AI cs.LG

    Active Learning for Semantic Segmentation with Multi-class Label Query

    Authors: Sehyun Hwang, Sohyun Lee, Hoyoung Kim, Minhyeon Oh, Jungseul Ok, Suha Kwak

    Abstract: This paper proposes a new active learning method for semantic segmentation. The core of our method lies in a new annotation query design. It samples informative local image regions (e.g., superpixels), and for each of such regions, asks an oracle for a multi-hot vector indicating all classes existing in the region. This multi-class labeling strategy is substantially more efficient than existing on… ▽ More

    Submitted 6 November, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 accepted

    MSC Class: 68T07 ACM Class: I.2.10

  41. arXiv:2309.08944  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning

    Authors: Sungyeon Kim, Donghyun Kim, Suha Kwak

    Abstract: A common practice in metric learning is to train and test an embedding model for each dataset. This dataset-specific approach fails to simulate real-world scenarios that involve multiple heterogeneous distributions of data. In this regard, we explore a new metric learning paradigm, called Unified Metric Learning (UML), which learns a unified distance metric capable of capturing relations across mu… ▽ More

    Submitted 18 January, 2025; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted to WACV 2025

  42. arXiv:2309.07885  [pdf, other

    math.GR math.GT

    Generating Sets and Algebraic Properties of Pure Mapping Class Groups of Infinite Graphs

    Authors: George Domat, Hannah Hoganson, Sanghoon Kwak

    Abstract: We completely classify the locally finite, infinite graphs with pure mapping class groups admitting a coarsely bounded generating set. We also study algebraic properties of the pure mapping class group: We establish a semidirect product decomposition, compute first integral cohomology, and classify when they satisfy residual finiteness and the Tits alternative. These results provide a framework an… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 36 pages, 10 figures

    MSC Class: 57S05; 37E25; 57M07; 20E08; 20F65; 54H05

  43. arXiv:2308.15512  [pdf, other

    cs.CV

    Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

    Authors: Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak

    Abstract: Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source… ▽ More

    Submitted 24 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023, Project page: https://southflame.github.io/sag/

  44. arXiv:2308.00994  [pdf, other

    cs.CV cs.LG

    SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems

    Authors: Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh

    Abstract: Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues. A straightforward solution is to carefully curate training data, but given the enormous scale of modern neural networks, this is prohibitively labor-intensive and thus impractical. Inspired by recent developments in generative models, this paper explores the potent… ▽ More

    Submitted 25 April, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: The paper is under consideration at Pattern Recognition Letters

  45. arXiv:2307.15199  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

    Authors: Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak

    Abstract: In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse… ▽ More

    Submitted 15 August, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023, Project Page: https://promptstyler.github.io/

  46. arXiv:2307.03405  [pdf, ps, other

    math.AG math.AC

    Syzygies of secant varieties of smooth projective curves and gonality sequences

    Authors: Junho Choe, Sijong Kwak, Jinhyung Park

    Abstract: The purpose of this paper is to prove that one can read off the gonality sequence of a smooth projective curve from syzygies of secant varieties of the curve embedded by a line bundle of sufficiently large degree. More precisely, together with Ein-Niu-Park's theorem, our main result shows that the gonality sequence of a smooth projective curve completely determines the shape of the minimal free re… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 22 pages, any comments are welcome

    MSC Class: 14N07; 14N05; 13D02

  47. arXiv:2306.12978  [pdf, other

    cs.IT eess.SP

    Rate-Splitting Multiple Access for 6G Networks: Ten Promising Scenarios and Applications

    Authors: Jeonghun Park, Byungju Lee, Jinseok Choi, Hoon Lee, Namyoon Lee, Seok-Hwan Park, Kyoung-Jae Lee, Junil Choi, Sung Ho Chae, Sang-Woon Jeon, Kyung Sup Kwak, Bruno Clerckx, Wonjae Shin

    Abstract: In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple acce… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 17 pages, 6 figures, submitted to IEEE Network Magazine

  48. arXiv:2306.08498  [pdf, other

    cs.CV

    Extending CLIP's Image-Text Alignment to Referring Image Segmentation

    Authors: Seoyeon Kim, Minguk Kang, Dongwon Kim, Jaesik Park, Suha Kwak

    Abstract: Referring Image Segmentation (RIS) is a cross-modal task that aims to segment an instance described by a natural language expression. Recent methods leverage large-scale pretrained unimodal models as backbones along with fusion techniques for joint reasoning across modalities. However, the inherent cross-modal nature of RIS raises questions about the effectiveness of unimodal backbones. We propose… ▽ More

    Submitted 7 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: NAACL 2024

  49. Accelerated Bayesian inference of plasma profiles with self-consistent MHD equilibria at W7-X via neural networks

    Authors: Andrea Merlo, Andrea Pavone, Daniel Böckenhoff, Ekkehard Pasch, Golo Fuchert, Kai Jakob Brunner, Kian Rahbarnia, Jonathan Schilling, Udo Höfel, Sehyun Kwak, Jakob Svensson, Thomas Sunn Pedersen, the W7-X team

    Abstract: High-$\langle β\rangle$ operations require a fast and robust inference of plasma parameters with a self-consistent MHD equilibrium. Precalculated MHD equilibria are usually employed at W7-X due to the high computational cost. To address this, we couple a physics-regularized NN model that approximates the ideal-MHD equilibrium with the Bayesian modeling framework Minerva. We show the fast and robus… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 18 pages, 6 figures

  50. arXiv:2303.16817  [pdf, other

    cs.CV

    Adaptive Superpixel for Active Learning in Semantic Segmentation

    Authors: Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok

    Abstract: Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive. To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per superpixel instead. To be specific, it consists of adaptive superpixel and sieving mechanisms, fully dedicated to AL. At each round of AL, we adaptively merge neigh… ▽ More

    Submitted 20 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.