Skip to main content

Showing 1–33 of 33 results for author: Raue, F

.
  1. arXiv:2505.17799  [pdf, ps, other

    cs.LG cs.CV

    A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances

    Authors: Brian B. Moser, Arundhati S. Shanbhag, Stanislav Frolov, Federico Raue, Joachim Folz, Andreas Dengel

    Abstract: Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning. Although several surveys have examined data reduction strategies before, most focus narrowly on either classical geometry-based methods or active learning techniques. In contrast, this survey presents a more comprehensive view by uni… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2503.09399  [pdf, other

    cs.CV cs.AI cs.LG

    ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation

    Authors: Tobias Christian Nauen, Brian Moser, Federico Raue, Stanislav Frolov, Andreas Dengel

    Abstract: Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification. However, they often require large amounts of data and can exhibit biases that limit their robustness and generalizability. This paper introduces ForAug, a novel data augmentation scheme that addresses these challenges and explicitly includes inductive biases, which… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    MSC Class: 68T45 ACM Class: I.2.10; I.2.6; I.4.6

  3. arXiv:2502.03656  [pdf, other

    cs.CV cs.AI cs.LG

    A Study in Dataset Distillation for Image Super-Resolution

    Authors: Tobias Dietz, Brian B. Moser, Tobias Nauen, Federico Raue, Stanislav Frolov, Andreas Dengel

    Abstract: Dataset distillation is the concept of condensing large datasets into smaller but highly representative synthetic samples. While previous research has primarily focused on image classification, its application to image Super-Resolution (SR) remains underexplored. This exploratory work studies multiple dataset distillation techniques applied to SR, including pixel- and latent-space approaches under… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  4. arXiv:2411.12115  [pdf, other

    cs.CV cs.AI cs.LG

    Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning

    Authors: Brian B. Moser, Federico Raue, Tobias C. Nauen, Stanislav Frolov, Andreas Dengel

    Abstract: Dataset distillation has gained significant interest in recent years, yet existing approaches typically distill from the entire dataset, potentially including non-beneficial samples. We introduce a novel "Prune First, Distill After" framework that systematically prunes datasets via loss-based sampling prior to distillation. By leveraging pruning before classical distillation techniques and generat… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  5. arXiv:2411.12073  [pdf, other

    cs.CV cs.AI cs.LG

    Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning

    Authors: Arundhati S. Shanbhag, Brian B. Moser, Tobias C. Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel

    Abstract: Diffusion models, celebrated for their generative capabilities, have recently demonstrated surprising effectiveness in image classification tasks by using Bayes' theorem. Yet, current diffusion classifiers must evaluate every label candidate for each input, creating high computational costs that impede their use in large-scale applications. To address this limitation, we propose a Hierarchical Dif… ▽ More

    Submitted 7 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  6. arXiv:2411.12072  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution

    Authors: Brian B. Moser, Stanislav Frolov, Tobias C. Nauen, Federico Raue, Andreas Dengel

    Abstract: Large-scale, pre-trained Text-to-Image (T2I) diffusion models have gained significant popularity in image generation tasks and have shown unexpected potential in image Super-Resolution (SR). However, most existing T2I diffusion models are trained with a resolution limit of 512x512, making scaling beyond this resolution an unresolved but necessary challenge for image SR. In this work, we introduce… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  7. arXiv:2411.10231  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift

    Authors: Sanath Budakegowdanadoddi Nagaraju, Brian Bernhard Moser, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel

    Abstract: Transformer-based Super-Resolution (SR) models have recently advanced image reconstruction quality, yet challenges remain due to computational complexity and an over-reliance on large patch sizes, which constrain fine-grained detail enhancement. In this work, we propose TaylorIR to address these limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  8. arXiv:2408.04442  [pdf, other

    cs.LG cs.AI

    FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data

    Authors: Ahmed Anwar, Brian Moser, Dayananda Herurkar, Federico Raue, Vinit Hegiste, Tatjana Legler, Andreas Dengel

    Abstract: The emergence of federated learning (FL) presents a promising approach to leverage decentralized data while preserving privacy. Furthermore, the combination of FL and anomaly detection is particularly compelling because it allows for detecting rare and critical anomalies (usually also rare in locally gathered data) in sensitive data from multiple sources, such as cybersecurity and healthcare. Howe… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 8 pages, 1 figure

  9. arXiv:2404.17670  [pdf, other

    eess.IV cs.AI cs.CV cs.ET cs.LG

    Federated Learning for Blind Image Super-Resolution

    Authors: Brian B. Moser, Ahmed Anwar, Federico Raue, Stanislav Frolov, Andreas Dengel

    Abstract: Traditional blind image SR methods need to model real-world degradations precisely. Consequently, current research struggles with this dilemma by assuming idealized degradations, which leads to limited applicability to actual user data. Moreover, the ideal scenario - training models on data from the targeted user base - presents significant privacy concerns. To address both challenges, we propose… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  10. arXiv:2403.17083  [pdf, other

    eess.IV cs.AI cs.CV cs.GR cs.LG

    A Study in Dataset Pruning for Image Super-Resolution

    Authors: Brian B. Moser, Federico Raue, Andreas Dengel

    Abstract: In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset pruning to solve these challenges. We introduce a novel approach that reduces a dataset to a core-set of training samples, selected based on their loss values as dete… ▽ More

    Submitted 8 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  11. arXiv:2403.03881  [pdf, other

    cs.CV cs.AI cs.LG

    Latent Dataset Distillation with Diffusion Models

    Authors: Brian B. Moser, Federico Raue, Sebastian Palacio, Stanislav Frolov, Andreas Dengel

    Abstract: Machine learning traditionally relies on increasingly larger datasets. Yet, such datasets pose major storage challenges and usually contain non-influential samples, which could be ignored during training without negatively impacting the training quality. In response, the idea of distilling a dataset into a condensed set of synthetic samples, i.e., a distilled dataset, emerged. One key aspect is th… ▽ More

    Submitted 11 July, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  12. arXiv:2401.00736  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Diffusion Models, Image Super-Resolution And Everything: A Survey

    Authors: Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, Andreas Dengel

    Abstract: Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high c… ▽ More

    Submitted 23 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  13. arXiv:2308.09372  [pdf, other

    cs.CV cs.AI cs.LG

    Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

    Authors: Tobias Christian Nauen, Sebastian Palacio, Federico Raue, Andreas Dengel

    Abstract: Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity, but their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing their efficiency. However, diverse experimental conditions, spanning multiple input domains, prevent a fair comparison based solely on reported results, posing… ▽ More

    Submitted 24 February, 2025; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: v3: new models, analysis of scaling behaviors; v4: WACV 2025 camera ready version, appendix added

    MSC Class: 68T07 ACM Class: I.4.0; I.2.10; I.5.1

  14. Dynamic Attention-Guided Diffusion for Image Super-Resolution

    Authors: Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

    Abstract: Diffusion models in image Super-Resolution (SR) treat all image regions uniformly, which risks compromising the overall image quality by potentially introducing artifacts during denoising of less-complex regions. To address this, we propose ``You Only Diffuse Areas'' (YODA), a dynamic attention-guided diffusion process for image SR. YODA selectively focuses on spatial regions defined by attention… ▽ More

    Submitted 22 November, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Brian B. Moser and Stanislav Frolov contributed equally

  15. arXiv:2307.04593  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    DWA: Differential Wavelet Amplifier for Image Super-Resolution

    Authors: Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

    Abstract: This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing i… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  16. DartsReNet: Exploring new RNN cells in ReNet architectures

    Authors: Brian Moser, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: We present new Recurrent Neural Network (RNN) cells for image classification using a Neural Architecture Search (NAS) approach called DARTS. We are interested in the ReNet architecture, which is a RNN based approach presented as an alternative for convolutional and pooling steps. ReNet can be defined using any standard RNN cells, such as LSTM and GRU. One limitation is that standard RNN cells were… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  17. arXiv:2304.01994  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution

    Authors: Brian Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

    Abstract: This paper presents a novel Diffusion-Wavelet (DiWa) approach for Single-Image Super-Resolution (SISR). It leverages the strengths of Denoising Diffusion Probabilistic Models (DDPMs) and Discrete Wavelet Transformation (DWT). By enabling DDPMs to operate in the DWT domain, our DDPM models effectively hallucinate high-frequency information for super-resolved images on the wavelet spectrum, resultin… ▽ More

    Submitted 5 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  18. arXiv:2209.13131  [pdf, other

    cs.CV cs.LG eess.IV

    Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances

    Authors: Brian Moser, Federico Raue, Stanislav Frolov, Jörn Hees, Sebastian Palacio, Andreas Dengel

    Abstract: With the advent of Deep Learning (DL), Super-Resolution (SR) has also become a thriving research area. However, despite promising results, the field still faces challenges that require further research e.g., allowing flexible upsampling, more effective loss functions, and better evaluation metrics. We review the domain of SR in light of recent advances, and examine state-of-the-art models such as… ▽ More

    Submitted 14 February, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  19. Less is More: Proxy Datasets in NAS approaches

    Authors: Brian Moser, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Neural Architecture Search (NAS) defines the design of Neural Networks as a search problem. Unfortunately, NAS is computationally intensive because of various possibilities depending on the number of elements in the design and the possible connections between them. In this work, we extensively analyze the role of the dataset size based on several sampling approaches for reducing the dataset size (… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Journal ref: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

  20. arXiv:2108.09696  [pdf, other

    cs.CV

    Spatial Transformer Networks for Curriculum Learning

    Authors: Fatemeh Azimi, Jean-Francois Jacques Nicolas Nies, Sebastian Palacio, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Curriculum learning is a bio-inspired training technique that is widely adopted to machine learning for improved optimization and better training of neural networks regarding the convergence rate or obtained accuracy. The main concept in curriculum learning is to start the training with simpler tasks and gradually increase the level of difficulty. Therefore, a natural question is how to determine… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

  21. arXiv:2106.14295  [pdf, other

    cs.LG

    A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

    Authors: Fatemeh Azimi, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: Spatial Transformer Networks (STN) can generate geometric transformations which modify input images to improve the classifier's performance. In this work, we combine the idea of STN with Reinforcement Learning (RL). To this end, we break the affine transformation down into a sequence of simple and discrete transformations. We formulate the task as a Markovian Decision Process (MDP) and use RL to s… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  22. arXiv:2106.13043  [pdf, ps, other

    cs.SD cs.CV eess.AS

    AudioCLIP: Extending CLIP to Image, Text and Audio

    Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: In the past, the rapidly evolving field of sound classification greatly benefited from the application of methods from other domains. Today, we observe the trend to fuse domain-specific tasks and approaches together, which provides the community with new outstanding models. In this work, we present an extension of the CLIP model that handles audio in addition to text and images. Our proposed mod… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: submitted to GCPR 2021

  23. arXiv:2105.10189  [pdf, other

    cs.CV

    Combining Transformer Generators with Convolutional Discriminators

    Authors: Ricard Durall, Stanislav Frolov, Jörn Hees, Federico Raue, Franz-Josef Pfreundt, Andreas Dengel, Janis Keupe

    Abstract: Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only… ▽ More

    Submitted 10 July, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

  24. arXiv:2104.11587  [pdf, other

    cs.SD eess.AS

    ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

    Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: submitted IJCNN 2021

  25. arXiv:2103.13722  [pdf, other

    cs.CV

    AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable Layout and Style

    Authors: Stanislav Frolov, Avneesh Sharma, Jörn Hees, Tushar Karayil, Federico Raue, Andreas Dengel

    Abstract: Conditional image synthesis from layout has recently attracted much interest. Previous approaches condition the generator on object locations as well as class labels but lack fine-grained control over the diverse appearance aspects of individual objects. Gaining control over the image generation process is fundamental to build practical applications with a user-friendly interface. In this paper, w… ▽ More

    Submitted 26 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: Accepted to GCPR 2021. Link to code: https://github.com/stanifrolov/AttrLostGAN

  26. Adversarial Text-to-Image Synthesis: A Review

    Authors: Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: With the advent of generative adversarial networks, synthesizing images from textual descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research effo… ▽ More

    Submitted 6 October, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: Published at Neural Networks Journal, available at https://www.sciencedirect.com/science/article/pii/S0893608021002823

    Journal ref: Neural Networks, 2021

  27. arXiv:2010.05069  [pdf, other

    cs.CV

    Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching

    Authors: Fatemeh Azimi, Stanislav Frolov, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: One-shot Video Object Segmentation~(VOS) is the task of pixel-wise tracking an object of interest within a video sequence, where the segmentation mask of the first frame is given at inference time. In recent years, Recurrent Neural Networks~(RNNs) have been widely used for VOS tasks, but they often suffer from limitations such as drift and error propagation. In this work, we study an RNN-based arc… ▽ More

    Submitted 7 November, 2020; v1 submitted 10 October, 2020; originally announced October 2020.

  28. arXiv:2004.12170  [pdf, other

    cs.CV

    Revisiting Sequence-to-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

    Authors: Fatemeh Azimi, Benjamin Bischke, Sebastian Palacio, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: Video Object Segmentation (VOS) is an active research area of the visual domain. One of its fundamental sub-tasks is semi-supervised / one-shot learning: given only the segmentation mask for the first frame, the task is to provide pixel-accurate masks for the object over the rest of the sequence. Despite much progress in the last years, we noticed that many of the existing approaches lose objects… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

  29. arXiv:2004.07301  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    ESResNet: Environmental Sound Classification Based on Visual Domain Models

    Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Environmental Sound Classification (ESC) is an active research area in the audio domain and has seen a lot of progress in the past years. However, many of the existing approaches achieve high accuracy by relying on domain-specific features and architectures, making it harder to benefit from advances in other fields (e.g., the image domain). Additionally, some of the past successes have been attrib… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: 8 pages, 4 figures; submitted to ICPR 2020

  30. arXiv:2003.11844  [pdf, other

    cs.CV

    P $\approx$ NP, at least in Visual Question Answering

    Authors: Shailza Jolly, Sebastian Palacio, Joachim Folz, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: In recent years, progress in the Visual Question Answering (VQA) field has largely been driven by public challenges and large datasets. One of the most widely-used of these is the VQA 2.0 dataset, consisting of polar ("yes/no") and non-polar questions. Looking at the question distribution over all answers, we find that the answers "yes" and "no" account for 38 % of the questions, while the remaini… ▽ More

    Submitted 27 March, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

  31. arXiv:1901.02322  [pdf, other

    cs.LG cs.AI stat.ML

    Fusion Strategies for Learning User Embeddings with Neural Networks

    Authors: Philipp Blandfort, Tushar Karayil, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Growing amounts of online user data motivate the need for automated processing techniques. In case of user ratings, one interesting option is to use neural networks for learning to predict ratings given an item and a user. While training for prediction, such an approach at the same time learns to map each user to a vector, a so-called user embedding. Such embeddings can for example be valuable for… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: submitted to IJCNN 2019

  32. arXiv:1803.08337  [pdf, other

    cs.CV cs.LG

    What do Deep Networks Like to See?

    Authors: Sebastian Palacio, Joachim Folz, Jörn Hees, Federico Raue, Damian Borth, Andreas Dengel

    Abstract: We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from a pre-trained classifier with fixed parameters. We compared the reconstructed samples from AEs that were fine-tuned on a set of image classifiers (AlexNet, VGG16, ResNet-50, and Inception~v3) and found… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  33. arXiv:1511.04401  [pdf, other

    cs.CV cs.CL cs.LG cs.NE

    Symbol Grounding Association in Multimodal Sequences with Missing Elements

    Authors: Federico Raue, Andreas Dengel, Thomas M. Breuel, Marcus Liwicki

    Abstract: In this paper, we extend a symbolic association framework for being able to handle missing elements in multimodal sequences. The general scope of the work is the symbolic associations of object-word mappings as it happens in language development in infants. In other words, two different representations of the same abstract concepts can associate in both directions. This scenario has been long inte… ▽ More

    Submitted 7 December, 2017; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: Under review on Journal of Artificial Intelligence Research (JAIR) -- Special Track on Deep Learning, Knowledge Representation, and Reasoning