Search | arXiv e-print repository

A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances

Authors: Brian B. Moser, Arundhati S. Shanbhag, Stanislav Frolov, Federico Raue, Joachim Folz, Andreas Dengel

Abstract: Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning. Although several surveys have examined data reduction strategies before, most focus narrowly on either classical geometry-based methods or active learning techniques. In contrast, this survey presents a more comprehensive view by uni… ▽ More Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning. Although several surveys have examined data reduction strategies before, most focus narrowly on either classical geometry-based methods or active learning techniques. In contrast, this survey presents a more comprehensive view by unifying three major lines of coreset research, namely, training-free, training-oriented, and label-free approaches, into a single taxonomy. We present subfields often overlooked by existing work, including submodular formulations, bilevel optimization, and recent progress in pseudo-labeling for unlabeled datasets. Additionally, we examine how pruning strategies influence generalization and neural scaling laws, offering new insights that are absent from prior reviews. Finally, we compare these methods under varying computational, robustness, and performance demands and highlight open challenges, such as robustness, outlier filtering, and adapting coreset selection to foundation models, for future research. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2411.12073 [pdf, other]

Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning

Authors: Arundhati S. Shanbhag, Brian B. Moser, Tobias C. Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel

Abstract: Diffusion models, celebrated for their generative capabilities, have recently demonstrated surprising effectiveness in image classification tasks by using Bayes' theorem. Yet, current diffusion classifiers must evaluate every label candidate for each input, creating high computational costs that impede their use in large-scale applications. To address this limitation, we propose a Hierarchical Dif… ▽ More Diffusion models, celebrated for their generative capabilities, have recently demonstrated surprising effectiveness in image classification tasks by using Bayes' theorem. Yet, current diffusion classifiers must evaluate every label candidate for each input, creating high computational costs that impede their use in large-scale applications. To address this limitation, we propose a Hierarchical Diffusion Classifier (HDC) that exploits hierarchical label structures or well-defined parent-child relationships in the dataset. By pruning irrelevant high-level categories and refining predictions only within relevant subcategories (leaf nodes and sub-trees), HDC reduces the total number of class evaluations. As a result, HDC can speed up inference by as much as 60% while preserving and sometimes even improving classification accuracy. In summary, our work provides a tunable control mechanism between speed and precision, making diffusion-based classification more feasible for large-scale applications. △ Less

Submitted 7 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

arXiv:2401.00736 [pdf, other]

doi 10.1109/TNNLS.2024.3476671

Diffusion Models, Image Super-Resolution And Everything: A Survey

Authors: Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, Andreas Dengel

Abstract: Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high c… ▽ More Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This survey articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this survey sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area. △ Less

Submitted 23 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Showing 1–3 of 3 results for author: Shanbhag, A S