-
Aesthetics Without Semantics
Authors:
C. Alejandro Parraga,
Olivier Penacchio,
Marcos Muňoz Gonzalez,
Bogdan Raducanu,
Xavier Otazu
Abstract:
While it is easy for human observers to judge an image as beautiful or ugly, aesthetic decisions result from a combination of entangled perceptual and cognitive (semantic) factors, making the understanding of aesthetic judgements particularly challenging from a scientific point of view. Furthermore, our research shows a prevailing bias in current databases, which include mostly beautiful images, f…
▽ More
While it is easy for human observers to judge an image as beautiful or ugly, aesthetic decisions result from a combination of entangled perceptual and cognitive (semantic) factors, making the understanding of aesthetic judgements particularly challenging from a scientific point of view. Furthermore, our research shows a prevailing bias in current databases, which include mostly beautiful images, further complicating the study and prediction of aesthetic responses. We address these limitations by creating a database of images with minimal semantic content and devising, and next exploiting, a method to generate images on the ugly side of aesthetic valuations. The resulting Minimum Semantic Content (MSC) database consists of a large and balanced collection of 10,426 images, each evaluated by 100 observers. We next use established image metrics to demonstrate how augmenting an image set biased towards beautiful images with ugly images can modify, or even invert, an observed relationship between image features and aesthetics valuation. Taken together, our study reveals that works in empirical aesthetics attempting to link image content and aesthetic judgements may magnify, underestimate, or simply miss interesting effects due to a limitation of the range of aesthetic values they consider.
△ Less
Submitted 12 June, 2025; v1 submitted 8 May, 2025;
originally announced May 2025.
-
Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models
Authors:
Héctor Laria,
Alexandra Gomez-Villa,
Jiang Qin,
Muhammad Atif Butt,
Bogdan Raducanu,
Javier Vazquez-Corral,
Joost van de Weijer,
Kai Wang
Abstract:
Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free…
▽ More
Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. By systematically analyzing the cross-attention mechanisms within IP-Adapter, we uncover an implicit binding between textual color descriptors and reference image features. Leveraging this insight, our method rewires these bindings to enforce precise color attribution while preserving the generative capabilities of pretrained models. Our approach maintains generation quality and diversity, outperforming prior methods in accuracy and applicability across diverse object categories. Through extensive evaluations, we demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Multi-label out-of-distribution detection via evidential learning
Authors:
Eduardo Aguilar,
Bogdan Raducanu,
Petia Radeva
Abstract:
A crucial requirement for machine learning algorithms is not only to perform well, but also to show robustness and adaptability when encountering novel scenarios. One way to achieve these characteristics is to endow the deep learning models with the ability to detect out-of-distribution (OOD) data, i.e. data that belong to distributions different from the one used during their training. It is even…
▽ More
A crucial requirement for machine learning algorithms is not only to perform well, but also to show robustness and adaptability when encountering novel scenarios. One way to achieve these characteristics is to endow the deep learning models with the ability to detect out-of-distribution (OOD) data, i.e. data that belong to distributions different from the one used during their training. It is even a more complicated situation, when these data usually are multi-label. In this paper, we propose an approach based on evidential deep learning in order to meet these challenges applied to visual recognition problems. More concretely, we designed a CNN architecture that uses a Beta Evidential Neural Network to compute both the likelihood and the predictive uncertainty of the samples. Based on these results, we propose afterwards two new uncertainty-based scores for OOD data detection: (i) OOD - score Max, based on the maximum evidence; and (ii) OOD score - Sum, which considers the evidence from all outputs. Extensive experiments have been carried out to validate the proposed approach using three widely-used datasets: PASCAL-VOC, MS-COCO and NUS-WIDE, demonstrating its outperformance over several State-of-the-Art methods.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial Attack
Authors:
Xide Xu,
Muhammad Atif Butt,
Sandesh Kamath,
Bogdan Raducanu
Abstract:
The growing demand for customized visual content has led to the rise of personalized text-to-image (T2I) diffusion models. Despite their remarkable potential, they pose significant privacy risk when misused for malicious purposes. In this paper, we propose a novel and efficient adversarial attack method, Concept Protection by Selective Attention Manipulation (CoPSAM) which targets only the cross-a…
▽ More
The growing demand for customized visual content has led to the rise of personalized text-to-image (T2I) diffusion models. Despite their remarkable potential, they pose significant privacy risk when misused for malicious purposes. In this paper, we propose a novel and efficient adversarial attack method, Concept Protection by Selective Attention Manipulation (CoPSAM) which targets only the cross-attention layers of a T2I diffusion model. For this purpose, we carefully construct an imperceptible noise to be added to clean samples to get their adversarial counterparts. This is obtained during the fine-tuning process by maximizing the discrepancy between the corresponding cross-attention maps of the user-specific token and the class-specific token, respectively. Experimental validation on a subset of CelebA-HQ face images dataset demonstrates that our approach outperforms existing methods. Besides this, our method presents two important advantages derived from the qualitative evaluation: (i) we obtain better protection results for lower noise levels than our competitors; and (ii) we protect the content from unauthorized use thereby protecting the individual's identity from potential misuse.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
Authors:
Kai Wang,
Fei Yang,
Bogdan Raducanu,
Joost van de Weijer
Abstract:
With the advent of large pre-trained vision-language models such as CLIP, prompt learning methods aim to enhance the transferability of the CLIP model. They learn the prompt given few samples from the downstream task given the specific class names as prior knowledge, which we term as semantic-aware classification. However, in many realistic scenarios, we only have access to few samples and knowled…
▽ More
With the advent of large pre-trained vision-language models such as CLIP, prompt learning methods aim to enhance the transferability of the CLIP model. They learn the prompt given few samples from the downstream task given the specific class names as prior knowledge, which we term as semantic-aware classification. However, in many realistic scenarios, we only have access to few samples and knowledge of the class names (e.g., when considering instances of classes). This challenging scenario represents the semantic-agnostic discriminative case. Text-to-Image (T2I) personalization methods aim to adapt T2I models to unseen concepts by learning new tokens and endowing these tokens with the capability of generating the learned concepts. These methods do not require knowledge of class names as a semantic-aware prior. Therefore, in this paper, we first explore Textual Inversion and reveal that the new concept tokens possess both generation and classification capabilities by regarding each category as a single concept. However, learning classifiers from single-concept textual inversion is limited since the learned tokens are suboptimal for the discriminative tasks. To mitigate this issue, we propose Multi-Class textual inversion, which includes a discriminative regularization term for the token updating process. Using this technique, our method MC-TI achieves stronger Semantic-Agnostic Classification while preserving the generation capability of these modifier tokens given only few samples per category. In the experiments, we extensively evaluate MC-TI on 12 datasets covering various scenarios, which demonstrates that MC-TI achieves superior results in terms of both classification and generation outcomes.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Assessing Open-world Forgetting in Generative Image Model Customization
Authors:
Héctor Laria,
Alex Gomez-Villa,
Kai Wang,
Bogdan Raducanu,
Joost van de Weijer
Abstract:
Recent advances in diffusion models have significantly enhanced image generation capabilities. However, customizing these models with new classes often leads to unintended consequences that compromise their reliability. We introduce the concept of open-world forgetting to characterize the vast scope of these unintended alterations. Our work presents the first systematic investigation into open-wor…
▽ More
Recent advances in diffusion models have significantly enhanced image generation capabilities. However, customizing these models with new classes often leads to unintended consequences that compromise their reliability. We introduce the concept of open-world forgetting to characterize the vast scope of these unintended alterations. Our work presents the first systematic investigation into open-world forgetting in diffusion models, focusing on semantic and appearance drift of representations. Using zero-shot classification, we demonstrate that even minor model adaptations can lead to significant semantic drift affecting areas far beyond newly introduced concepts, with accuracy drops of up to 60% on previously learned concepts. Our analysis of appearance drift reveals substantial changes in texture and color distributions of generated content. To address these issues, we propose a functional regularization strategy that effectively preserves original capabilities while accommodating new concepts. Through extensive experiments across multiple datasets and evaluation metrics, we demonstrate that our approach significantly reduces both semantic and appearance drift. Our study highlights the importance of considering open-world forgetting in future research on model customization and finetuning methods.
△ Less
Submitted 5 February, 2025; v1 submitted 17 October, 2024;
originally announced October 2024.
-
The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks
Authors:
Sandesh Kamath,
Albin Soutif-Cormerais,
Joost van de Weijer,
Bogdan Raducanu
Abstract:
Recent research identified a temporary performance drop on previously learned tasks when transitioning to a new one. This drop is called the stability gap and has great consequences for continual learning: it complicates the direct employment of continually learning since the worse-case performance at task-boundaries is dramatic, it limits its potential as an energy-efficient training paradigm, an…
▽ More
Recent research identified a temporary performance drop on previously learned tasks when transitioning to a new one. This drop is called the stability gap and has great consequences for continual learning: it complicates the direct employment of continually learning since the worse-case performance at task-boundaries is dramatic, it limits its potential as an energy-efficient training paradigm, and finally, the stability drop could result in a reduced final performance of the algorithm. In this paper, we show that the stability gap also occurs when applying joint incremental training of homogeneous tasks. In this scenario, the learner continues training on the same data distribution and has access to all data from previous tasks. In addition, we show that in this scenario, there exists a low-loss linear path to the next minima, but that SGD optimization does not choose this path. We perform further analysis including a finer batch-wise analysis which could provide insights towards potential solution directions.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Continual Evidential Deep Learning for Out-of-Distribution Detection
Authors:
Eduardo Aguilar,
Bogdan Raducanu,
Petia Radeva,
Joost Van de Weijer
Abstract:
Uncertainty-based deep learning models have attracted a great deal of interest for their ability to provide accurate and reliable predictions. Evidential deep learning stands out achieving remarkable performance in detecting out-of-distribution (OOD) data with a single deterministic neural network. Motivated by this fact, in this paper we propose the integration of an evidential deep learning meth…
▽ More
Uncertainty-based deep learning models have attracted a great deal of interest for their ability to provide accurate and reliable predictions. Evidential deep learning stands out achieving remarkable performance in detecting out-of-distribution (OOD) data with a single deterministic neural network. Motivated by this fact, in this paper we propose the integration of an evidential deep learning method into a continual learning framework in order to perform simultaneously incremental object classification and OOD detection. Moreover, we analyze the ability of vacuity and dissonance to differentiate between in-distribution data belonging to old classes and OOD data. The proposed method, called CEDL, is evaluated on CIFAR-100 considering two settings consisting of 5 and 10 tasks, respectively. From the obtained results, we could appreciate that the proposed method, in addition to provide comparable results in object classification with respect to the baseline, largely outperforms OOD detection compared to several posthoc methods on three evaluation metrics: AUROC, AUPR and FPR95.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization
Authors:
Francesco Pelosin,
Saurav Jha,
Andrea Torsello,
Bogdan Raducanu,
Joost van de Weijer
Abstract:
In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation o…
▽ More
In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a) leads to a better overall accuracy, (b) helps enhance the rigidity by maintaining competitive performances. Furthermore, we identify the limitation imposed by the symmetric nature of regularization losses. To alleviate this, we propose an asymmetric variant and apply it to the pooled output distillation (POD) loss adapted for ViTs. Our experiments confirm that introducing asymmetry to POD boosts its plasticity while retaining stability across (a) and (b). Moreover, we acknowledge low forgetting measures for all the compared methods, indicating that ViTs might be naturally inclined continual learner
△ Less
Submitted 5 May, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Transferring Unconditional to Conditional GANs with Hyper-Modulation
Authors:
Héctor Laria,
Yaxing Wang,
Joost van de Weijer,
Bogdan Raducanu
Abstract:
GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, howev…
▽ More
GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, however, conditional GANs are preferable, because they provide more control over the generation process, despite often suffering more training difficulties. Therefore, in this paper, we focus on transferring from high-quality pretrained unconditional GANs to conditional GANs. This requires architectural adaptation of the pretrained GAN to perform the conditioning. To this end, we propose hyper-modulated generative networks that allow for shared and complementary supervision. To prevent the additional weights of the hypernetwork to overfit, with subsequent mode collapse on small target domains, we introduce a self-initialization procedure that does not require any real data to initialize the hypernetwork parameters. To further improve the sample efficiency of the transfer, we apply contrastive learning in the discriminator, which effectively works on very limited batch sizes. In extensive experiments, we validate the efficiency of the hypernetworks, self-initialization and contrastive loss for knowledge transfer on standard benchmarks.
△ Less
Submitted 22 April, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Class-Balanced Active Learning for Image Classification
Authors:
Javad Zolfaghari Bengar,
Joost van de Weijer,
Laura Lopez Fuentes,
Bogdan Raducanu
Abstract:
Active learning aims to reduce the labeling effort that is required to train algorithms by learning an acquisition function selecting the most relevant data for which a label should be requested from a large unlabeled data pool. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalance…
▽ More
Active learning aims to reduce the labeling effort that is required to train algorithms by learning an acquisition function selecting the most relevant data for which a label should be requested from a large unlabeled data pool. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called long-tail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we proposed a general optimization framework that explicitly takes class-balancing into account. Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we showed that also on balanced datasets our method generally results in a performance gain.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Reducing Label Effort: Self-Supervised meets Active Learning
Authors:
Javad Zolfaghari Bengar,
Joost van de Weijer,
Bartlomiej Twardowski,
Bogdan Raducanu
Abstract:
Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent developments in self-training have achieved very impressive…
▽ More
Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets. The current work focuses on whether the two paradigms can benefit from each other. We studied object recognition datasets including CIFAR10, CIFAR100 and Tiny ImageNet with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high. The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
When Deep Learners Change Their Mind: Learning Dynamics for Active Learning
Authors:
Javad Zolfaghari Bengar,
Bogdan Raducanu,
Joost van de Weijer
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untr…
▽ More
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Saliency for free: Saliency prediction as a side-effect of object recognition
Authors:
Carola Figueroa-Flores,
David Berga,
Joost van der Weijer,
Bogdan Raducanu
Abstract:
Saliency is the perceptual capacity of our visual system to focus our attention (i.e. gaze) on relevant objects. Neural networks for saliency estimation require ground truth saliency maps for training which are usually achieved via eyetracking experiments. In the current paper, we demonstrate that saliency maps can be generated as a side-effect of training an object recognition deep neural network…
▽ More
Saliency is the perceptual capacity of our visual system to focus our attention (i.e. gaze) on relevant objects. Neural networks for saliency estimation require ground truth saliency maps for training which are usually achieved via eyetracking experiments. In the current paper, we demonstrate that saliency maps can be generated as a side-effect of training an object recognition deep neural network that is endowed with a saliency branch. Such a network does not require any ground-truth saliency maps for training.Extensive experiments carried out on both real and synthetic saliency datasets demonstrate that our approach is able to generate accurate saliency maps, achieving competitive results on both synthetic and real datasets when compared to methods that do require ground truth data.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets
Authors:
Yaxing Wang,
Hector Laria Mantecon,
Joost van de Weijer,
Laura Lopez-Fuentes,
Bogdan Raducanu
Abstract:
Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require the learning of millions of parameters from scratch. This drawback severely limits its application o…
▽ More
Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require the learning of millions of parameters from scratch. This drawback severely limits its application on small domains. In this paper, we propose a new transfer learning for I2I translation (TransferI2I). We decouple our learning process into the image generation step and the I2I translation step. In the first step we propose two novel techniques: source-target initialization and self-initialization of the adaptor layer. The former finetunes the pretrained generative model (e.g., StyleGAN) on source and target data. The latter allows to initialize all non-pretrained network parameters without the need of any data. These techniques provide a better initialization for the I2I translation step. In addition, we introduce an auxiliary GAN that further facilitates the training of deep I2I systems even from small datasets. In extensive experiments on three datasets, (Animal faces, Birds, and Foods), we show that we outperform existing methods and that mFID improves on several datasets with over 25 points.
△ Less
Submitted 14 May, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Learning to Rank for Active Learning: A Listwise Approach
Authors:
Minghan Li,
Xialei Liu,
Joost van de Weijer,
Bogdan Raducanu
Abstract:
Active learning emerged as an alternative to alleviate the effort to label huge amount of data for data hungry applications (such as image/video indexing and retrieval, autonomous driving, etc.). The goal of active learning is to automatically select a number of unlabeled samples for annotation (according to a budget), based on an acquisition function, which indicates how valuable a sample is for…
▽ More
Active learning emerged as an alternative to alleviate the effort to label huge amount of data for data hungry applications (such as image/video indexing and retrieval, autonomous driving, etc.). The goal of active learning is to automatically select a number of unlabeled samples for annotation (according to a budget), based on an acquisition function, which indicates how valuable a sample is for training the model. The learning loss method is a task-agnostic approach which attaches a module to learn to predict the target loss of unlabeled data, and select data with the highest loss for labeling. In this work, we follow this strategy but we define the acquisition function as a learning to rank problem and rethink the structure of the loss prediction module, using a simple but effective listwise approach. Experimental results on four datasets demonstrate that our method outperforms recent state-of-the-art active learning approaches for both image classification and regression tasks.
△ Less
Submitted 17 October, 2020; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Hallucinating Saliency Maps for Fine-Grained Image Classification for Limited Data Domains
Authors:
Carola Figueroa-Flores,
Bogdan Raducanu,
David Berga,
Joost van de Weijer
Abstract:
Most of the saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline, like for instance, image classification. In the current paper, we propose an approach which does not require explicit saliency maps to improve image classification, but they are learned implicitely, during the training of an end-to-end image classific…
▽ More
Most of the saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline, like for instance, image classification. In the current paper, we propose an approach which does not require explicit saliency maps to improve image classification, but they are learned implicitely, during the training of an end-to-end image classification task. We show that our approach obtains similar results as the case when the saliency maps are provided explicitely. Combining RGB data with saliency maps represents a significant advantage for object recognition, especially for the case when training data is limited. We validate our method on several datasets for fine-grained classification tasks (Flowers, Birds and Cars). In addition, we show that our saliency estimation method, which is trained without any saliency groundtruth data, obtains competitive results on real image saliency benchmark (Toronto), and outperforms deep saliency models with synthetic images (SID4VAM).
△ Less
Submitted 3 February, 2021; v1 submitted 24 July, 2020;
originally announced July 2020.
-
Generative Feature Replay For Class-Incremental Learning
Authors:
Xialei Liu,
Chenshen Wu,
Mikel Menta,
Luis Herranz,
Bogdan Raducanu,
Andrew D. Bagdanov,
Shangling Jui,
Joost van de Weijer
Abstract:
Humans are capable of learning new tasks without forgetting previous ones, while neural networks fail due to catastrophic forgetting between new and previously-learned tasks. We consider a class-incremental setting which means that the task-ID is unknown at inference time. The imbalance between old and new classes typically results in a bias of the network towards the newest ones. This imbalance p…
▽ More
Humans are capable of learning new tasks without forgetting previous ones, while neural networks fail due to catastrophic forgetting between new and previously-learned tasks. We consider a class-incremental setting which means that the task-ID is unknown at inference time. The imbalance between old and new classes typically results in a bias of the network towards the newest ones. This imbalance problem can either be addressed by storing exemplars from previous tasks, or by using image replay methods. However, the latter can only be applied to toy datasets since image generation for complex datasets is a hard problem.
We propose a solution to the imbalance problem based on generative feature replay which does not require any exemplars. To do this, we split the network into two parts: a feature extractor and a classifier. To prevent forgetting, we combine generative feature replay in the classifier with feature distillation in the feature extractor. Through feature generation, our method reduces the complexity of generative replay and prevents the imbalance problem. Our approach is computationally efficient and scalable to large datasets. Experiments confirm that our approach achieves state-of-the-art results on CIFAR-100 and ImageNet, while requiring only a fraction of the storage needed for exemplar-based continual learning. Code available at \url{https://github.com/xialeiliu/GFR-IL}.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Temporal Coherence for Active Learning in Videos
Authors:
Javad Zolfaghari Bengar,
Abel Gonzalez-Garcia,
Gabriel Villalonga,
Bogdan Raducanu,
Hamed H. Aghdam,
Mikhail Mozerov,
Antonio M. Lopez,
Joost van de Weijer
Abstract:
Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploitin…
▽ More
Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
Optimizing speed/accuracy trade-off for person re-identification via knowledge distillation
Authors:
Idoia Ruiz,
Bogdan Raducanu,
Rakesh Mehta,
Jaume Amores
Abstract:
Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO a…
▽ More
Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO and XQDA, to deep learning based techniques, using image classification networks, ResNet and MobileNets. Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time. We evaluate both methods on the Market-1501 and DukeMTMC-reID large-scale datasets, showing that distillation helps reducing the computational cost at inference time while even increasing the accuracy performance.
△ Less
Submitted 5 December, 2019; v1 submitted 7 December, 2018;
originally announced December 2018.
-
Memory Replay GANs: learning to generate images from new categories without forgetting
Authors:
Chenshen Wu,
Luis Herranz,
Xialei Liu,
Yaxing Wang,
Joost van de Weijer,
Bogdan Raducanu
Abstract:
Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous c…
▽ More
Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (i.e. forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories.
△ Less
Submitted 23 September, 2019; v1 submitted 6 September, 2018;
originally announced September 2018.
-
Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data
Authors:
Carola Figueroa Flores,
Abel Gonzalez-García,
Joost van de Weijer,
Bogdan Raducanu
Abstract:
This paper investigates the role of saliency to improve the classification accuracy of a Convolutional Neural Network (CNN) for the case when scarce training data is available. Our approach consists in adding a saliency branch to an existing CNN architecture which is used to modulate the standard bottom-up visual features from the original image input, acting as an attentional mechanism that guide…
▽ More
This paper investigates the role of saliency to improve the classification accuracy of a Convolutional Neural Network (CNN) for the case when scarce training data is available. Our approach consists in adding a saliency branch to an existing CNN architecture which is used to modulate the standard bottom-up visual features from the original image input, acting as an attentional mechanism that guides the feature extraction process. The main aim of the proposed approach is to enable the effective training of a fine-grained recognition model with limited training samples and to improve the performance on the task, thereby alleviating the need to annotate large dataset. % The vast majority of saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline. Our proposed pipeline allows to evaluate saliency methods for the high-level task of object recognition. We perform extensive experiments on various fine-grained datasets (Flowers, Birds, Cars, and Dogs) under different conditions and show that saliency can considerably improve the network's performance, especially for the case of scarce training data. Furthermore, our experiments show that saliency methods that obtain improved saliency maps (as measured by traditional saliency benchmarks) also translate to saliency methods that yield improved performance gains when applied in an object recognition pipeline.
△ Less
Submitted 4 May, 2019; v1 submitted 1 August, 2018;
originally announced August 2018.
-
Transferring GANs: generating images from limited data
Authors:
Yaxing Wang,
Chenshen Wu,
Luis Herranz,
Joost van de Weijer,
Abel Gonzalez-Garcia,
Bogdan Raducanu
Abstract:
Transferring the knowledge of pretrained networks to new domains by means of finetuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several a…
▽ More
Transferring the knowledge of pretrained networks to new domains by means of finetuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pretrained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when the target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pretrained model was trained without conditioning. Our results also suggest that density may be more important than diversity and a dataset with one or few densely sampled classes may be a better source model than more diverse datasets such as ImageNet or Places.
△ Less
Submitted 2 October, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Invertible Conditional GANs for image editing
Authors:
Guim Perarnau,
Joost van de Weijer,
Bogdan Raducanu,
Jose M. Álvarez
Abstract:
Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image i…
▽ More
Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.
△ Less
Submitted 19 November, 2016;
originally announced November 2016.