Skip to main content

Showing 1–50 of 69 results for author: Oh, S J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.23781  [pdf, ps, other

    cs.CV cs.AI

    GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning

    Authors: Nayeong Kim, Seong Joon Oh, Suha Kwak

    Abstract: Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) excels in various vision tasks thanks to the rich knowledge and generalization ability of VLMs. However, recent studies revealed that such fine-tuned VLMs are vulnerable to spurious correlations stemming from the subgroup imbalance in the fine-tuning datasets. To resolve this issue, we propose Group Context Optimization (Group… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: This paper was first submitted to NeurIPS 2024 in May 2024

  2. arXiv:2507.20836  [pdf, ps, other

    cs.LG cs.AI

    First Hallucination Tokens Are Different from Conditional Ones

    Authors: Jakob Snel, Seong Joon Oh

    Abstract: Hallucination, the generation of untruthful content, is one of the major concerns regarding foundational models. Detecting hallucinations at the token level is vital for real-time filtering and targeted correction, yet the variation of hallucination signals within token sequences is not fully understood. Leveraging the RAGTruth corpus with token-level annotations and reproduced logits, we analyse… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: 4.5 pages, 3 figures, Dataset, Knowledge Paper, Hallucination, Trustworthiness

  3. arXiv:2507.07102  [pdf, ps, other

    cs.LG

    Does Data Scaling Lead to Visual Compositional Generalization?

    Authors: Arnas Uselis, Andrea Dittadi, Seong Joon Oh

    Abstract: Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data sc… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: ICML 2025

  4. arXiv:2507.03683  [pdf, ps, other

    cs.CV

    On the rankability of visual embeddings

    Authors: Ankit Sonthalia, Arnas Uselis, Seong Joon Oh

    Abstract: We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embedding… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  5. arXiv:2506.15674  [pdf, ps, other

    cs.CL cs.AI cs.CR

    Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

    Authors: Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh

    Abstract: We study privacy leakage in the reasoning traces of large reasoning models used as personal agents. Unlike final outputs, reasoning traces are often assumed to be internal and safe. We challenge this assumption by showing that reasoning traces frequently contain sensitive user data, which can be extracted via prompt injections or accidentally leak into outputs. Through probing and agentic evaluati… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  6. arXiv:2506.11097  [pdf, ps, other

    cs.CL cs.AI cs.IR

    C-SEO Bench: Does Conversational SEO Work?

    Authors: Haritz Puerto, Martin Gubri, Tommaso Green, Seong Joon Oh, Sangdoo Yun

    Abstract: Large Language Models (LLMs) are transforming search engines into Conversational Search Engines (CSE). Consequently, Search Engine Optimization (SEO) is being shifted into Conversational Search Engine Optimization (C-SEO). We are beginning to see dedicated C-SEO methods for modifying web documents to increase their visibility in CSE responses. However, they are often tested only for a limited brea… ▽ More

    Submitted 23 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  7. arXiv:2505.20295  [pdf, ps, other

    cs.CL cs.AI cs.LG stat.ML

    SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?

    Authors: Michael Kirchhof, Luca Füger, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Seong Joon Oh, Sinead Williamson

    Abstract: The common approach to communicate a large language model's (LLM) uncertainty is to add a percentage number or a hedging word to its response. But is this all we can do? Instead of generating a single answer and then hedging it, an LLM that is fully transparent to the user needs to be able to reflect on its internal belief distribution and output a summary of all options it deems possible, and how… ▽ More

    Submitted 30 September, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  8. arXiv:2505.17955  [pdf, ps, other

    cs.CV

    Diffusion Classifiers Understand Compositionality, but Conditions Apply

    Authors: Yujin Jeong, Arnas Uselis, Seong Joon Oh, Anna Rohrbach

    Abstract: Understanding visual scenes is fundamental to human intelligence. While discriminative models have significantly advanced computer vision, they often struggle with compositional understanding. In contrast, recent generative text-to-image diffusion models excel at synthesizing complex scenes, suggesting inherent compositional capabilities. Building on this, zero-shot diffusion classifiers have been… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  9. arXiv:2504.07092  [pdf, other

    cs.CV cs.AI cs.LG

    Are We Done with Object-Centric Learning?

    Authors: Alexander Rubinstein, Ameya Prabhu, Matthias Bethge, Seong Joon Oh

    Abstract: Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization, sample-efficient composition, and modeling of structured environments. Most research has focused on developing unsupervised mechanisms that separate objects into… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  10. arXiv:2504.05461  [pdf, other

    cs.LG

    Intermediate Layer Classifiers for OOD generalization

    Authors: Arnas Uselis, Seong Joon Oh

    Abstract: Deep classifiers are known to be sensitive to data distribution shifts, primarily due to their reliance on spurious correlations in training data. It has been suggested that these classifiers can still find useful features in the network's last layer that hold up under such shifts. In this work, we question the use of last-layer representations for out-of-distribution (OOD) generalisation and expl… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: ICLR 2025

  11. arXiv:2504.04981  [pdf, ps, other

    cs.CV cs.AI

    TestDG: Test-time Domain Generalization for Continual Test-time Adaptation

    Authors: Sohyun Lee, Nayeong Kim, Juwon Kang, Seong Joon Oh, Suha Kwak

    Abstract: This paper studies continual test-time adaptation (CTTA), the task of adapting a model to constantly changing unseen domains in testing while preserving previously learned knowledge. Existing CTTA methods mostly focus on adaptation to the current test domain only, overlooking generalization to arbitrary test domains a model may face in the future. To tackle this limitation, we present a novel onli… ▽ More

    Submitted 3 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  12. arXiv:2502.03566  [pdf, other

    cs.CV cs.LG

    CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally

    Authors: Darina Koishigarina, Arnas Uselis, Seong Joon Oh

    Abstract: CLIP (Contrastive Language-Image Pretraining) has become a popular choice for various downstream tasks. However, recent studies have questioned its ability to represent compositional concepts effectively. These works suggest that CLIP often acts like a bag-of-words (BoW) model, interpreting images and text as sets of individual concepts without grasping the structural relationships. In particular,… ▽ More

    Submitted 8 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  13. arXiv:2411.00154  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

    Authors: Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh

    Abstract: Membership inference attacks (MIA) attempt to verify the membership of a given data sample in the training set for a model. MIA has become relevant in recent years, following the rapid development of large language models (LLM). Many are concerned about the usage of copyrighted materials for training them and call for methods for detecting such usage. However, recent research has largely concluded… ▽ More

    Submitted 3 February, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

    Comments: Findings of NAACL 2025. Our code is available at https://github.com/parameterlab/mia-scaling

  14. arXiv:2410.11536  [pdf, other

    cs.CV

    Overcoming Domain Limitations in Open-vocabulary Segmentation

    Authors: Dongjun Hwang, Seong Joon Oh, Junsuk Choe

    Abstract: Open-vocabulary segmentation (OVS) has gained attention for its ability to recognize a broader range of classes. However, OVS models show significant performance drops when applied to unseen domains beyond the previous training dataset. Fine-tuning these models on new datasets can improve performance, but often leads to the catastrophic forgetting of previously learned knowledge. To address this i… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  15. arXiv:2409.16978  [pdf, other

    cs.HC cs.AI cs.LG

    Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI

    Authors: Elisa Nguyen, Johannes Bertram, Evgenii Kortukov, Jean Y. Song, Seong Joon Oh

    Abstract: While Explainable AI (XAI) aims to make AI understandable and useful to humans, it has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. We propose an alternative to this bottom-up approach inspired by design thinking: the XAI research community should adopt a top-down, user-focused perspective to ensure user relevance. We i… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  16. arXiv:2409.16797  [pdf, other

    cs.LG cs.AI cs.CV

    Scalable Ensemble Diversification for OOD Generalization and Detection

    Authors: Alexander Rubinstein, Luca Scimeca, Damien Teney, Seong Joon Oh

    Abstract: Training a diverse ensemble of models has several practical applications such as providing candidates for model selection with better out-of-distribution (OOD) generalization, and enabling the detection of OOD samples via Bayesian principles. An existing approach to diverse ensemble training encourages the models to disagree on provided OOD samples. However, the approach is computationally expensi… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Under review

  17. arXiv:2404.16032  [pdf, other

    cs.LG

    Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents

    Authors: Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh

    Abstract: Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowl… ▽ More

    Submitted 8 October, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  18. arXiv:2403.07968  [pdf, other

    cs.LG cs.AI

    Do Deep Neural Network Solutions Form a Star Domain?

    Authors: Ankit Sonthalia, Alexander Rubinstein, Ehsan Abbasnejad, Seong Joon Oh

    Abstract: It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require ver… ▽ More

    Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  19. arXiv:2403.05973  [pdf, other

    cs.CL cs.AI cs.LG

    Calibrating Large Language Models Using Their Generations Only

    Authors: Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

    Abstract: As large language models (LLMs) are increasingly deployed in user-facing applications, building trust and maintaining safety by accurately quantifying a model's confidence in its prediction becomes even more important. However, finding effective ways to calibrate LLMs - especially when the only interface to the models is their generated text - remains a challenge. We propose APRICOT (auxiliary pre… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  20. arXiv:2402.19460  [pdf, other

    cs.LG stat.ML

    Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks

    Authors: Bálint Mucsányi, Michael Kirchhof, Seong Joon Oh

    Abstract: Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one source of uncertainty. This paper presents the first benchmark of uncertainty disentangl… ▽ More

    Submitted 27 November, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 68 pages

  21. arXiv:2402.16569  [pdf, other

    cs.CV cs.LG

    Pretrained Visual Uncertainties

    Authors: Michael Kirchhof, Mark Collier, Seong Joon Oh, Enkelejda Kasneci

    Abstract: Accurate uncertainty estimation is vital to trustworthy machine learning, yet uncertainties typically have to be learned for each task anew. This work introduces the first pretrained uncertainty modules for vision models. Similar to standard pretraining this enables the zero-shot transfer of uncertainties learned on a large pretraining dataset to specialized downstream datasets. We enable our larg… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  22. arXiv:2402.12991  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

    Authors: Martin Gubri, Dennis Ulmer, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

    Abstract: Large Language Model (LLM) services and models often come with legal rules on who can use them and how they must use them. Assessing the compliance of the released LLMs is crucial, as these rules protect the interests of the LLM contributor and prevent misuse. In this context, we describe the novel fingerprinting problem of Black-box Identity Verification (BBIV). The goal is to determine whether a… ▽ More

    Submitted 6 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (findings)

  23. arXiv:2312.01638  [pdf, other

    eess.IV cs.CV

    J-Net: Improved U-Net for Terahertz Image Super-Resolution

    Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

    Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  24. arXiv:2311.16176  [pdf, other

    cs.LG cs.AI cs.CV

    Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles

    Authors: Luca Scimeca, Alexander Rubinstein, Damien Teney, Seong Joon Oh, Yoshua Bengio

    Abstract: Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut learning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose DiffDiv an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) to mitigate this form of bias. We show that at pa… ▽ More

    Submitted 2 April, 2025; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: Accepted as a workshop paper at ICLR 2025. arXiv admin note: substantial text overlap with arXiv:2310.02230

  25. arXiv:2310.20477  [pdf, other

    cs.HC cs.LG

    Exploring Practitioner Perspectives On Training Data Attribution Explanations

    Authors: Elisa Nguyen, Evgenii Kortukov, Jean Y. Song, Seong Joon Oh

    Abstract: Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for… ▽ More

    Submitted 22 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS XAI in Action workshop 2023

  26. arXiv:2310.08215  [pdf, other

    cs.LG cs.AI

    Trustworthy Machine Learning

    Authors: Bálint Mucsányi, Michael Kirchhof, Elisa Nguyen, Alexander Rubinstein, Seong Joon Oh

    Abstract: As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machi… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 373 pages, textbook at the University of Tübingen

    ACM Class: I.2.0

  27. arXiv:2307.03810  [pdf, other

    cs.LG cs.AI stat.ML

    URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

    Authors: Michael Kirchhof, Bálint Mucsányi, Seong Joon Oh, Enkelejda Kasneci

    Abstract: Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such m… ▽ More

    Submitted 19 October, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted at the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS D&B 2023)

  28. arXiv:2307.01881  [pdf, other

    cs.CR cs.CL

    ProPILE: Probing Privacy Leakage in Large Language Models

    Authors: Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh

    Abstract: The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, o… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  29. arXiv:2305.19765  [pdf, other

    cs.LG

    A Bayesian Approach To Analysing Training Data Attribution In Deep Learning

    Authors: Elisa Nguyen, Minjoon Seo, Seong Joon Oh

    Abstract: Training data attribution (TDA) techniques find influential training data for the model's prediction on the test data of interest. They approximate the impact of down- or up-weighting a particular training sample. While conceptually useful, they are hardly applicable to deep models in practice, particularly because of their sensitivity to different model initialisation. In this paper, we introduce… ▽ More

    Submitted 31 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  30. Playing repeated games with Large Language Models

    Authors: Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, Eric Schulz

    Abstract: LLMs are increasingly used in applications where they interact with humans and other agents. We propose to use behavioural game theory to study LLM's cooperation and coordination behaviour. We let different LLMs play finitely repeated $2\times2$ games with each other, with human-like strategies, and actual human players. Our results show that LLMs perform particularly well at self-interested games… ▽ More

    Submitted 7 May, 2025; v1 submitted 26 May, 2023; originally announced May 2023.

  31. arXiv:2303.17595  [pdf, other

    cs.CV cs.LG

    Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts

    Authors: Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh

    Abstract: Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotat… ▽ More

    Submitted 26 July, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Code & data at https://github.com/naver-ai/NeglectedFreeLunch. To be presented at ICCV'23

  32. arXiv:2302.02865  [pdf, other

    cs.LG cs.AI stat.ML

    Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs

    Authors: Michael Kirchhof, Enkelejda Kasneci, Seong Joon Oh

    Abstract: Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated th… ▽ More

    Submitted 17 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  33. arXiv:2211.02291  [pdf, other

    cs.CV cs.AI cs.LG

    SelecMix: Debiased Learning by Contradicting-pair Sampling

    Authors: Inwoo Hwang, Sangjun Lee, Yunhyeok Kwak, Seong Joon Oh, Damien Teney, Jin-Hwa Kim, Byoung-Tak Zhang

    Abstract: Neural networks trained with ERM (empirical risk minimization) sometimes learn unintended decision rules, in particular when their training data is biased, i.e., when training labels are strongly correlated with undesirable features. To prevent a network from learning such features, recent methods augment training data such that examples displaying spurious correlations (i.e., bias-aligned example… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022

  34. arXiv:2210.08457  [pdf, other

    cs.CV cs.AI cs.LG

    Scratching Visual Transformer's Back with Uniform Attention

    Authors: Nam Hyeon-Woo, Kim Yu-Ji, Byeongho Heo, Dongyoon Han, Seong Joon Oh, Tae-Hyun Oh

    Abstract: The favorable performance of Vision Transformers (ViTs) is often attributed to the multi-head self-attention (MSA). The MSA enables global interactions at each layer of a ViT model, which is a contrasting feature against Convolutional Neural Networks (CNNs) that gradually increase the range of interaction across multiple layers. We study the role of the density of the attention. Our preliminary an… ▽ More

    Submitted 25 December, 2024; v1 submitted 16 October, 2022; originally announced October 2022.

  35. arXiv:2209.00613  [pdf, other

    cs.LG cs.CV

    ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

    Authors: Damien Teney, Yong Lin, Seong Joon Oh, Ehsan Abbasnejad

    Abstract: Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD genera… ▽ More

    Submitted 19 May, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

  36. arXiv:2207.10324  [pdf, other

    eess.IV cs.CV cs.LG

    Enhancing Generative Networks for Chest Anomaly Localization through Automatic Registration-Based Unpaired-to-Pseudo-Paired Training Data Translation

    Authors: Kyungsu Kim, Seong Je Oh, Chae Yeon Lim, Ju Hwan Lee, Tae Uk Kim, Myung Jin Chung

    Abstract: Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To addr… ▽ More

    Submitted 15 June, 2024; v1 submitted 21 July, 2022; originally announced July 2022.

  37. arXiv:2206.13504  [pdf, other

    eess.IV cs.CV cs.LG

    AI-based computer-aided diagnostic system of chest digital tomography synthesis: Demonstrating comparative advantage with X-ray-based AI systems

    Authors: Kyung-Su Kim, Ju Hwan Lee, Seong Je Oh, Myung Jin Chung

    Abstract: Compared with chest X-ray (CXR) imaging, which is a single image projected from the front of the patient, chest digital tomosynthesis (CDTS) imaging can be more advantageous for lung lesion detection because it acquires multiple images projected from multiple angles of the patient. Various clinical comparative analysis and verification studies have been reported to demonstrate this, but there were… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: Kyung-Su Kim, Ju Hwan Lee, and Seong Je Oh have contributed equally to this work as the co-first author. Kyung-Su Kim ([email protected]) and Myung Jin Chung ([email protected]) have contributed equally to this work as the co-corresponding author

  38. arXiv:2206.13385  [pdf, other

    eess.IV cs.CV cs.LG

    3D unsupervised anomaly detection and localization through virtual multi-view projection and reconstruction: Clinical validation on low-dose chest computed tomography

    Authors: Kyung-Su Kim, Seong Je Oh, Ju Hwan Lee, Myung Jin Chung

    Abstract: Computer-aided diagnosis for low-dose computed tomography (CT) based on deep learning has recently attracted attention as a first-line automatic testing tool because of its high accuracy and low radiation exposure. However, existing methods rely on supervised learning, imposing an additional burden to doctors for collecting disease data or annotating spatial labels for network training, consequent… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: Kyung-Su Kim and Seong Je Oh have contributed equally to this work as the co-first author. Kyung-Su Kim ([email protected]) and Myung Jin Chung ([email protected]) have contributed equally to this work as the co-corresponding author

  39. arXiv:2205.14959  [pdf, other

    cs.LG

    Dataset Condensation via Efficient Synthetic-Data Parameterization

    Authors: Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, Hyun Oh Song

    Abstract: The great success of machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning. Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset. However, the existing approaches have fundamental limitations in optimization due to the limited representability of… ▽ More

    Submitted 2 June, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: ICML 2022; Codes at https://github.com/snu-mllab/Efficient-Dataset-Condensation.git

  40. arXiv:2204.03359  [pdf, other

    cs.CV

    ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO

    Authors: Sanghyuk Chun, Wonjae Kim, Song Park, Minsuk Chang, Seong Joon Oh

    Abstract: Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models. However, existing ITM benchmarks have a significant limitation. They have many missing correspondences, originating from the data construction process itself. For example, a caption is only matched with one image although the caption can be matched with other similar images and vice versa. To… ▽ More

    Submitted 3 January, 2024; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Published in ECCV 2022; 32 pages (2.3MB); Code and dataset: https://github.com/naver-ai/eccv-caption; v5 fixes errors in Table 4: the COCO 1K R@1 numbers were incorrect. All other tables and figures are correct. v5 also adds RSUM scores in Tab 4 and 5: RSUM has a high correlation with COCO 1K recalls; v4 fixes errors in v3 -- see the v4 comment for details

  41. arXiv:2203.03860  [pdf, other

    cs.CV

    Weakly Supervised Semantic Segmentation using Out-of-Distribution Data

    Authors: Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon

    Abstract: Weakly supervised semantic segmentation (WSSS) methods are often built on pixel-level localization maps obtained from a classifier. However, training on class labels only, classifiers suffer from the spurious correlation between foreground and background cues (e.g. train and rail), fundamentally bounding the performance of WSSS. There have been previous endeavors to address this issue with additio… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  42. arXiv:2112.11916  [pdf, other

    cs.CL cs.AI cs.LG

    ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification

    Authors: Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

    Abstract: Data augmentation has been an important ingredient for boosting performances of learned models. Prior data augmentation methods for few-shot text classification have led to great performance boosts. However, they have not been designed to capture the intricate compositional structure of natural language. As a result, they fail to generate samples with plausible and diverse sentence structures. Mot… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to AAAI2022

  43. arXiv:2110.03095  [pdf, other

    cs.LG cs.CV stat.ML

    Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective

    Authors: Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Michael Poli, Sangdoo Yun

    Abstract: Deep neural networks (DNNs) often rely on easy-to-learn discriminatory features, or cues, that are not necessarily essential to the problem at hand. For example, ducks in an image may be recognized based on their typical background scenery, such as lakes or streams. This phenomenon, also known as shortcut learning, is emerging as a key limitation of the current generation of machine learning model… ▽ More

    Submitted 10 February, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: To be published in "The International Conference on Learning Representations" (ICLR 2022)(Accepted) First two authors have contributed equally

  44. arXiv:2106.07861  [pdf, other

    cs.CV

    Keep CALM and Improve Visual Feature Attribution

    Authors: Jae Myung Kim, Junsuk Choe, Zeynep Akata, Seong Joon Oh

    Abstract: The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks. Its simplicity and effectiveness have led to wide applications in the explanation of visual predictions and weakly-supervised localization tasks. However, CAM has its own shortcomings. The computation of attribution maps relies on ad-hoc calibration steps that are not part of th… ▽ More

    Submitted 12 August, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: ICCV 2021 camera-ready. First two authors contributed equally

  45. arXiv:2106.04165  [pdf, other

    cs.LG cs.NE eess.SY math.DS

    Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions

    Authors: Michael Poli, Stefano Massaroli, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, Jinkyoo Park, Animesh Garg

    Abstract: Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  46. arXiv:2103.16302  [pdf, other

    cs.CV

    Rethinking Spatial Dimensions of Vision Transformers

    Authors: Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh

    Abstract: Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against the existing convolutional neural networks (CNN). Since the transformer-based architecture has been innovative for computer vision modeling, the design convention towards an effective architecture has been less studied yet. From the s… ▽ More

    Submitted 17 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: ICCV 2021 camera-ready version

  47. arXiv:2101.05068  [pdf, other

    cs.CV

    Probabilistic Embeddings for Cross-Modal Retrieval

    Authors: Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus

    Abstract: Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally make sense. In this paper, w… ▽ More

    Submitted 14 June, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: Accepted to CVPR 2021; Code is available at https://github.com/naver-ai/pcme

  48. arXiv:2101.05022  [pdf, other

    cs.CV

    Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

    Authors: Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun

    Abstract: ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise. Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark. They have thus proposed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image. However,… ▽ More

    Submitted 22 July, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: CVPR 2021 camera ready version

  49. arXiv:2012.03457  [pdf, other

    cs.CV

    VideoMix: Rethinking Data Augmentation for Video Classification

    Authors: Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim

    Abstract: State-of-the-art video action classifiers often suffer from overfitting. They tend to be biased towards specific objects and scene cues, rather than the foreground action content, leading to sub-optimal generalization performances. Recent data augmentation strategies have been reported to address the overfitting problems in static image classifiers. Despite the effectiveness on the static image cl… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: 15 pages

  50. arXiv:2007.04178  [pdf, other

    cs.CV

    Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets

    Authors: Junsuk Choe, Seong Joon Oh, Sanghyuk Chun, Seungho Lee, Zeynep Akata, Hyunjung Shim

    Abstract: Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization superv… ▽ More

    Submitted 7 December, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: TPAMI submission. First two authors contributed equally. This is a journal extension of our CVPR 2020 paper arXiv:2001.07437. Code: https://github.com/clovaai/wsolevaluation