-
Test-time Vocabulary Adaptation for Language-driven Object Detection
Authors:
Mingxuan Liu,
Tyler L. Hayes,
Massimiliano Mancini,
Elisa Ricci,
Riccardo Volpi,
Gabriela Csurka
Abstract:
Open-vocabulary object detection models allow users to freely specify a class vocabulary in natural language at test time, guiding the detection of desired objects. However, vocabularies can be overly broad or even mis-specified, hampering the overall performance of the detector. In this work, we propose a plug-and-play Vocabulary Adapter (VocAda) to refine the user-defined vocabulary, automatical…
▽ More
Open-vocabulary object detection models allow users to freely specify a class vocabulary in natural language at test time, guiding the detection of desired objects. However, vocabularies can be overly broad or even mis-specified, hampering the overall performance of the detector. In this work, we propose a plug-and-play Vocabulary Adapter (VocAda) to refine the user-defined vocabulary, automatically tailoring it to categories that are relevant for a given image. VocAda does not require any training, it operates at inference time in three steps: i) it uses an image captionner to describe visible objects, ii) it parses nouns from those captions, and iii) it selects relevant classes from the user-defined vocabulary, discarding irrelevant ones. Experiments on COCO and Objects365 with three state-of-the-art detectors show that VocAda consistently improves performance, proving its versatility. The code is open source.
△ Less
Submitted 30 May, 2025;
originally announced June 2025.
-
What could go wrong? Discovering and describing failure modes in computer vision
Authors:
Gabriela Csurka,
Tyler L. Hayes,
Diane Larlus,
Riccardo Volpi
Abstract:
Deep learning models are effective, yet brittle. Even carefully trained, their behavior tends to be hard to predict when confronted with out-of-distribution samples. In this work, our goal is to propose a simple yet effective solution to predict and describe via natural language potential failure modes of computer vision models. Given a pretrained model and a set of samples, our aim is to find sen…
▽ More
Deep learning models are effective, yet brittle. Even carefully trained, their behavior tends to be hard to predict when confronted with out-of-distribution samples. In this work, our goal is to propose a simple yet effective solution to predict and describe via natural language potential failure modes of computer vision models. Given a pretrained model and a set of samples, our aim is to find sentences that accurately describe the visual conditions in which the model underperforms. In order to study this important topic and foster future research on it, we formalize the problem of Language-Based Error Explainability (LBEE) and propose a set of metrics to evaluate and compare different methods for this task. We propose solutions that operate in a joint vision-and-language embedding space, and can characterize through language descriptions model failures caused, e.g., by objects unseen during training or adverse visual conditions. We experiment with different tasks, such as classification under the presence of dataset bias and semantic segmentation in unseen environments, and show that the proposed methodology isolates nontrivial sentences associated with specific error causes. We hope our work will help practitioners better understand the behavior of models, increasing their overall safety and interpretability.
△ Less
Submitted 24 September, 2024; v1 submitted 8 August, 2024;
originally announced August 2024.
-
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Authors:
Mingxuan Liu,
Tyler L. Hayes,
Elisa Ricci,
Gabriela Csurka,
Riccardo Volpi
Abstract:
Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during inference. However, our initial investigation indicates that existing OvOD detectors exhibit significant variability when dealing with vocabularies across various semantic granularities, posing a concern for real-world deployme…
▽ More
Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during inference. However, our initial investigation indicates that existing OvOD detectors exhibit significant variability when dealing with vocabularies across various semantic granularities, posing a concern for real-world deployment. To this end, we introduce Semantic Hierarchy Nexus (SHiNe), a novel classifier that uses semantic knowledge from class hierarchies. It runs offline in three steps: i) it retrieves relevant super-/sub-categories from a hierarchy for each target class; ii) it integrates these categories into hierarchy-aware sentences; iii) it fuses these sentence embeddings to generate the nexus classifier vector. Our evaluation on various detection benchmarks demonstrates that SHiNe enhances robustness across diverse vocabulary granularities, achieving up to +31.9% mAP50 with ground truth hierarchies, while retaining improvements using hierarchies generated by large language models. Moreover, when applied to open-vocabulary classification on ImageNet-1k, SHiNe improves the CLIP zero-shot baseline by +2.8% accuracy. SHiNe is training-free and can be seamlessly integrated with any off-the-shelf OvOD detector, without incurring additional computational overhead during inference. The code is open source.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
PANDAS: Prototype-based Novel Class Discovery and Detection
Authors:
Tyler L. Hayes,
César R. de Souza,
Namil Kim,
Jiwon Kim,
Riccardo Volpi,
Diane Larlus
Abstract:
Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its re…
▽ More
Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its repertoire to be able to detect those newly discovered classes together with the base ones. We propose PANDAS, a method for novel class discovery and detection. It discovers clusters representing novel classes from unlabeled data, and represents old and new classes with prototypes. During inference, a distance-based classifier uses these prototypes to assign a label to each detected object instance. The simplicity of our method makes it widely applicable. We experimentally demonstrate the effectiveness of PANDAS on the VOC 2012 and COCO-to-LVIS benchmarks. It performs favorably against the state of the art for this task while being computationally more affordable.
△ Less
Submitted 30 April, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
Authors:
Pau de Jorge,
Riccardo Volpi,
Puneet K. Dokania,
Philip H. S. Torr,
Gregory Rogez
Abstract:
When deploying a semantic segmentation model into the real world, it will inevitably encounter semantic classes that were not seen during training. To ensure a safe deployment of such systems, it is crucial to accurately evaluate and improve their anomaly segmentation capabilities. However, acquiring and labelling semantic segmentation data is expensive and unanticipated conditions are long-tail a…
▽ More
When deploying a semantic segmentation model into the real world, it will inevitably encounter semantic classes that were not seen during training. To ensure a safe deployment of such systems, it is crucial to accurately evaluate and improve their anomaly segmentation capabilities. However, acquiring and labelling semantic segmentation data is expensive and unanticipated conditions are long-tail and potentially hazardous. Indeed, existing anomaly segmentation datasets capture a limited number of anomalies, lack realism or have strong domain shifts. In this paper, we propose the Placing Objects in Context (POC) pipeline to realistically add any object into any image via diffusion models. POC can be used to easily extend any dataset with an arbitrary number of objects. In our experiments, we present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods across several standardized benchmarks. POC is also effective for learning new classes. For example, we utilize it to augment Cityscapes samples by incorporating a subset of Pascal classes and demonstrate that models trained on such data achieve comparable performance to the Pascal-trained baseline. This corroborates the low synth2real gap of models trained on POC-generated images. Code: https://github.com/naver/poc
△ Less
Submitted 12 July, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
RaSP: Relation-aware Semantic Prior for Weakly Supervised Incremental Segmentation
Authors:
Subhankar Roy,
Riccardo Volpi,
Gabriela Csurka,
Diane Larlus
Abstract:
Class-incremental semantic image segmentation assumes multiple model updates, each enriching the model to segment new categories. This is typically carried out by providing expensive pixel-level annotations to the training algorithm for all new objects, limiting the adoption of such methods in practical applications. Approaches that solely require image-level labels offer an attractive alternative…
▽ More
Class-incremental semantic image segmentation assumes multiple model updates, each enriching the model to segment new categories. This is typically carried out by providing expensive pixel-level annotations to the training algorithm for all new objects, limiting the adoption of such methods in practical applications. Approaches that solely require image-level labels offer an attractive alternative, yet, such coarse annotations lack precise information about the location and boundary of the new objects. In this paper we argue that, since classes represent not just indices but semantic entities, the conceptual relationships between them can provide valuable information that should be leveraged. We propose a weakly supervised approach that exploits such semantic relations to transfer objectness prior from the previously learned classes into the new ones, complementing the supervisory signal from image-level labels. We validate our approach on a number of continual learning tasks, and show how even a simple pairwise interaction between classes can significantly improve the segmentation mask quality of both old and new classes. We show these conclusions still hold for longer and, hence, more realistic sequences of tasks and for a challenging few-shot scenario.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Reliability in Semantic Segmentation: Are We on the Right Track?
Authors:
Pau de Jorge,
Riccardo Volpi,
Philip Torr,
Gregory Rogez
Abstract:
Motivated by the increasing popularity of transformers in computer vision, in recent times there has been a rapid development of novel architectures. While in-domain performance follows a constant, upward trend, properties like robustness or uncertainty estimation are less explored -leaving doubts about advances in model reliability. Studies along these axes exist, but they are mainly limited to c…
▽ More
Motivated by the increasing popularity of transformers in computer vision, in recent times there has been a rapid development of novel architectures. While in-domain performance follows a constant, upward trend, properties like robustness or uncertainty estimation are less explored -leaving doubts about advances in model reliability. Studies along these axes exist, but they are mainly limited to classification models. In contrast, we carry out a study on semantic segmentation, a relevant task for many real-world applications where model reliability is paramount. We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers and assess their reliability based on four metrics: robustness, calibration, misclassification detection and out-of-distribution (OOD) detection. We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation. We further explore methods that can come to the rescue and show that improving calibration can also help with other uncertainty metrics such as misclassification or OOD detection. This is the first study on modern segmentation models focused on both robustness and uncertainty estimation and we hope it will help practitioners and researchers interested in this fundamental vision task. Code available at https://github.com/naver/relis.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Semantic Image Segmentation: Two Decades of Research
Authors:
Gabriela Csurka,
Riccardo Volpi,
Boris Chidlovskii
Abstract:
Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learn…
▽ More
Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers. We complement the review by discussing particular cases of the weak supervision and side machine learning techniques that can be used to improve the semantic segmentation such as curriculum, incremental or self-supervised learning.
State-of-the-art SiS models rely on a large amount of annotated samples, which are more expensive to obtain than labels for tasks such as image classification. Since unlabeled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation (UDA) reached a broad success within the semantic segmentation community. Therefore, a second core contribution of this book is to summarize five years of a rapidly growing field, Domain Adaptation for Semantic Image Segmentation (DASiS) which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. In addition to providing a comprehensive survey on DASiS techniques, we unveil also newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation. Finally, we conclude this survey by describing datasets and benchmarks most widely used in SiS and DASiS and briefly discuss related tasks such as instance and panoptic image segmentation, as well as applications such as medical image segmentation.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
On the Road to Online Adaptation for Semantic Image Segmentation
Authors:
Riccardo Volpi,
Pau de Jorge,
Diane Larlus,
Gabriela Csurka
Abstract:
We propose a new problem formulation and a corresponding evaluation framework to advance research on unsupervised domain adaptation for semantic image segmentation. The overall goal is fostering the development of adaptive learning systems that will continuously learn, without supervision, in ever-changing environments. Typical protocols that study adaptation algorithms for segmentation models are…
▽ More
We propose a new problem formulation and a corresponding evaluation framework to advance research on unsupervised domain adaptation for semantic image segmentation. The overall goal is fostering the development of adaptive learning systems that will continuously learn, without supervision, in ever-changing environments. Typical protocols that study adaptation algorithms for segmentation models are limited to few domains, adaptation happens offline, and human intervention is generally required, at least to annotate data for hyper-parameter tuning. We argue that such constraints are incompatible with algorithms that can continuously adapt to different real-world situations. To address this, we propose a protocol where models need to learn online, from sequences of temporally correlated images, requiring continuous, frame-by-frame adaptation. We accompany this new protocol with a variety of baselines to tackle the proposed formulation, as well as an extensive analysis of their behaviors, which can serve as a starting point for future research.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Make Some Noise: Reliable and Efficient Single-Step Adversarial Training
Authors:
Pau de Jorge,
Adel Bibi,
Riccardo Volpi,
Amartya Sanyal,
Philip H. S. Torr,
Grégory Rogez,
Puneet K. Dokania
Abstract:
Recently, Wong et al. showed that adversarial training with single-step FGSM leads to a characteristic failure mode named Catastrophic Overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. Experimentally they showed that simply adding a random perturbation prior to FGSM (RS-FGSM) could prevent CO. However, Andriushchenko and Flammarion observed that RS-FGSM still le…
▽ More
Recently, Wong et al. showed that adversarial training with single-step FGSM leads to a characteristic failure mode named Catastrophic Overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. Experimentally they showed that simply adding a random perturbation prior to FGSM (RS-FGSM) could prevent CO. However, Andriushchenko and Flammarion observed that RS-FGSM still leads to CO for larger perturbations, and proposed a computationally expensive regularizer (GradAlign) to avoid it. In this work, we methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with \textit{not clipping} is highly effective in avoiding CO for large perturbation radii. We then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous state-of-the-art GradAlign, while achieving 3x speed-up. Code can be found in https://github.com/pdejorge/N-FGSM
△ Less
Submitted 17 October, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
Unsupervised Domain Adaptation for Semantic Image Segmentation: a Comprehensive Survey
Authors:
Gabriela Csurka,
Riccardo Volpi,
Boris Chidlovskii
Abstract:
Semantic segmentation plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. Yet, the state-of-the-art models rely on large amount of annotated samples, which are more expensive to obtain than in tasks such as image classification. Since unlabelled data is instead significantly cheaper to obtain, it is not su…
▽ More
Semantic segmentation plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. Yet, the state-of-the-art models rely on large amount of annotated samples, which are more expensive to obtain than in tasks such as image classification. Since unlabelled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation reached a broad success within the semantic segmentation community.
This survey is an effort to summarize five years of this incredibly rapidly growing field, which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. We present the most important semantic segmentation methods; we provide a comprehensive survey on domain adaptation techniques for semantic segmentation; we unveil newer trends such as multi-domain learning, domain generalization, test-time adaptation or source-free domain adaptation; we conclude this survey by describing datasets and benchmarks most widely used in semantic segmentation research. We hope that this survey will provide researchers across academia and industry with a comprehensive reference guide and will help them in fostering new research directions in the field.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Automatic Feature Extraction for Heartbeat Anomaly Detection
Authors:
Robert-George Colt,
Csongor-Huba Várady,
Riccardo Volpi,
Luigi Malagò
Abstract:
We focus on automatic feature extraction for raw audio heartbeat sounds, aimed at anomaly detection applications in healthcare. We learn features with the help of an autoencoder composed by a 1D non-causal convolutional encoder and a WaveNet decoder trained with a modified objective based on variational inference, employing the Maximum Mean Discrepancy (MMD). Moreover we model the latent distribut…
▽ More
We focus on automatic feature extraction for raw audio heartbeat sounds, aimed at anomaly detection applications in healthcare. We learn features with the help of an autoencoder composed by a 1D non-causal convolutional encoder and a WaveNet decoder trained with a modified objective based on variational inference, employing the Maximum Mean Discrepancy (MMD). Moreover we model the latent distribution using a Gaussian chain graphical model to capture temporal correlations which characterize the encoded signals. After training the autoencoder on the reconstruction task in a unsupervised manner, we test the significance of the learned latent representations by training an SVM to predict anomalies. We evaluate the methods on a problem proposed by the PASCAL Classifying Heart Sounds Challenge and we compare with results in the literature.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning
Authors:
Riccardo Volpi,
Diane Larlus,
Grégory Rogez
Abstract:
Most standard learning approaches lead to fragile models which are prone to drift when sequentially trained on samples of a different nature - the well-known "catastrophic forgetting" issue. In particular, when a model consecutively learns from different visual domains, it tends to forget the past domains in favor of the most recent ones. In this context, we show that one way to learn models that…
▽ More
Most standard learning approaches lead to fragile models which are prone to drift when sequentially trained on samples of a different nature - the well-known "catastrophic forgetting" issue. In particular, when a model consecutively learns from different visual domains, it tends to forget the past domains in favor of the most recent ones. In this context, we show that one way to learn models that are inherently more robust against forgetting is domain randomization - for vision tasks, randomizing the current domain's distribution with heavy image manipulations. Building on this result, we devise a meta-learning strategy where a regularizer explicitly penalizes any loss associated with transferring the model from the current domain to different "auxiliary" meta-domains, while also easing adaptation to them. Such meta-domains are also generated through randomized image manipulations. We empirically demonstrate in a variety of experiments - spanning from classification to semantic segmentation - that our approach results in models that are less prone to catastrophic forgetting when transferred to new domains.
△ Less
Submitted 8 April, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Accelerating MCMC algorithms through Bayesian Deep Networks
Authors:
Hector J. Hortua,
Riccardo Volpi,
Dimitri Marinelli,
Luigi Malago
Abstract:
Markov Chain Monte Carlo (MCMC) algorithms are commonly used for their versatility in sampling from complicated probability distributions. However, as the dimension of the distribution gets larger, the computational costs for a satisfactory exploration of the sampling space become challenging. Adaptive MCMC methods employing a choice of proposal distribution can address this issue speeding up the…
▽ More
Markov Chain Monte Carlo (MCMC) algorithms are commonly used for their versatility in sampling from complicated probability distributions. However, as the dimension of the distribution gets larger, the computational costs for a satisfactory exploration of the sampling space become challenging. Adaptive MCMC methods employing a choice of proposal distribution can address this issue speeding up the convergence. In this paper we show an alternative way of performing adaptive MCMC, by using the outcome of Bayesian Neural Networks as the initial proposal for the Markov Chain. This combined approach increases the acceptance rate in the Metropolis-Hasting algorithm and accelerate the convergence of the MCMC while reaching the same final accuracy. Finally, we demonstrate the main advantages of this approach by constraining the cosmological parameters directly from Cosmic Microwave Background maps.
△ Less
Submitted 28 November, 2020;
originally announced November 2020.
-
Reliable Uncertainties for Bayesian Neural Networks using Alpha-divergences
Authors:
Hector J. Hortua,
Luigi Malago,
Riccardo Volpi
Abstract:
Bayesian Neural Networks (BNNs) often result uncalibrated after training, usually tending towards overconfidence. Devising effective calibration methods with low impact in terms of computational complexity is thus of central interest. In this paper we present calibration methods for BNNs based on the alpha divergences from Information Geometry. We compare the use of alpha divergence in training an…
▽ More
Bayesian Neural Networks (BNNs) often result uncalibrated after training, usually tending towards overconfidence. Devising effective calibration methods with low impact in terms of computational complexity is thus of central interest. In this paper we present calibration methods for BNNs based on the alpha divergences from Information Geometry. We compare the use of alpha divergence in training and in calibration, and we show how the use in calibration provides better calibrated uncertainty estimates for specific choices of alpha and is more efficient especially for complex network architectures. We empirically demonstrate the advantages of alpha calibration in regression problems involving parameter estimation and inferred correlations between output uncertainties.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Natural Reweighted Wake-Sleep
Authors:
Csongor Várady,
Riccardo Volpi,
Luigi Malagò,
Nihat Ay
Abstract:
Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connect…
▽ More
Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake-Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training.
△ Less
Submitted 14 September, 2022; v1 submitted 15 August, 2020;
originally announced August 2020.
-
Constraining the Reionization History using Bayesian Normalizing Flows
Authors:
Héctor J. Hortúa,
Luigi Malago,
Riccardo Volpi
Abstract:
The next generation 21 cm surveys open a new window onto the early stages of cosmic structure formation and provide new insights about the Epoch of Reionization (EoR). However, the non-Gaussian nature of the 21 cm signal along with the huge amount of data generated from these surveys will require more advanced techniques capable to efficiently extract the necessary information to constrain the Rei…
▽ More
The next generation 21 cm surveys open a new window onto the early stages of cosmic structure formation and provide new insights about the Epoch of Reionization (EoR). However, the non-Gaussian nature of the 21 cm signal along with the huge amount of data generated from these surveys will require more advanced techniques capable to efficiently extract the necessary information to constrain the Reionization History of the Universe. In this paper we present the use of Bayesian Neural Networks (BNNs) to predict the posterior distribution for four astrophysical and cosmological parameters. Besides achieving state-of-the-art prediction performances, the proposed methods provide accurate estimation of parameters uncertainties and infer correlations among them. Additionally, we demonstrate the advantages of Normalizing Flows (NF) combined with BNNs, being able to model more complex output distributions and thus capture key information as non-Gaussianities in the parameter conditional density distribution for astrophysical and cosmological dataset. Finally, we propose novel calibration methods employing Normalizing Flows after training, to produce reliable predictions, and we demonstrate the advantages of this approach both in terms of computational cost and prediction performances.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Parameters Estimation from the 21 cm signal using Variational Inference
Authors:
Héctor J. Hortúa,
Riccardo Volpi,
Luigi Malagò
Abstract:
Upcoming experiments such as Hydrogen Epoch of Reionization Array (HERA) and Square Kilometre Array (SKA) are intended to measure the 21cm signal over a wide range of redshifts, representing an incredible opportunity in advancing our understanding about the nature of cosmic Reionization. At the same time these kind of experiments will present new challenges in processing the extensive amount of da…
▽ More
Upcoming experiments such as Hydrogen Epoch of Reionization Array (HERA) and Square Kilometre Array (SKA) are intended to measure the 21cm signal over a wide range of redshifts, representing an incredible opportunity in advancing our understanding about the nature of cosmic Reionization. At the same time these kind of experiments will present new challenges in processing the extensive amount of data generated, calling for the development of automated methods capable of precisely estimating physical parameters and their uncertainties. In this paper we employ Variational Inference, and in particular Bayesian Neural Networks, as an alternative to MCMC in 21 cm observations to report credible estimations for cosmological and astrophysical parameters and assess the correlations among them.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
Explainable Deep Classification Models for Domain Generalization
Authors:
Andrea Zunino,
Sarah Adel Bargal,
Riccardo Volpi,
Mehrnoosh Sameki,
Jianming Zhang,
Stan Sclaroff,
Vittorio Murino,
Kate Saenko
Abstract:
Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in…
▽ More
Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in the form of a saliency map conveying how much each pixel contributed to the network's decision. Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object. We quantify explainability using an automated metric, and using human judgement. We propose explainability as a means for bridging the visual-semantic gap between different domains where model explanations are used as a means of disentagling domain specific information from otherwise relevant features. We demonstrate that this leads to improved generalization to new domains without hindering performance on the original domain.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Learning Unbiased Representations via Mutual Information Backpropagation
Authors:
Ruggero Ragonesi,
Riccardo Volpi,
Jacopo Cavazza,
Vittorio Murino
Abstract:
We are interested in learning data-driven representations that can generalize well, even when trained on inherently biased data. In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties. We tackle this problem through the lens of information theory, leveraging recent findings for a differentiable estima…
▽ More
We are interested in learning data-driven representations that can generalize well, even when trained on inherently biased data. In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties. We tackle this problem through the lens of information theory, leveraging recent findings for a differentiable estimation of mutual information. We propose a novel end-to-end optimization strategy, which simultaneously estimates and minimizes the mutual information between the learned representation and the data attributes. When applied on standard benchmarks, our model shows comparable or superior classification performance with respect to state-of-the-art approaches. Moreover, our method is general enough to be applicable to the problem of ``algorithmic fairness'', with competitive results.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Generative Pseudo-label Refinement for Unsupervised Domain Adaptation
Authors:
Pietro Morerio,
Riccardo Volpi,
Ruggero Ragonesi,
Vittorio Murino
Abstract:
We investigate and characterize the inherent resilience of conditional Generative Adversarial Networks (cGANs) against noise in their conditioning labels, and exploit this fact in the context of Unsupervised Domain Adaptation (UDA). In UDA, a classifier trained on the labelled source set can be used to infer pseudo-labels on the unlabelled target set. However, this will result in a significant amo…
▽ More
We investigate and characterize the inherent resilience of conditional Generative Adversarial Networks (cGANs) against noise in their conditioning labels, and exploit this fact in the context of Unsupervised Domain Adaptation (UDA). In UDA, a classifier trained on the labelled source set can be used to infer pseudo-labels on the unlabelled target set. However, this will result in a significant amount of misclassified examples (due to the well-known domain shift issue), which can be interpreted as noise injection in the ground-truth labels for the target set. We show that cGANs are, to some extent, robust against such "shift noise". Indeed, cGANs trained with noisy pseudo-labels, are able to filter such noise and generate cleaner target samples. We exploit this finding in an iterative procedure where a generative model and a classifier are jointly trained: in turn, the generator allows to sample cleaner data from the target distribution, and the classifier allows to associate better labels to target samples, progressively refining target pseudo-labels. Results on common benchmarks show that our method performs better or comparably with the unsupervised domain adaptation state of the art.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Natural Alpha Embeddings
Authors:
Riccardo Volpi,
Luigi Malagò
Abstract:
Learning an embedding for a large collection of items is a popular approach to overcome the computational limitations associated to one-hot encodings. The aim of item embedding is to learn a low dimensional space for the representations, able to capture with its geometry relevant features or relationships for the data at hand. This can be achieved for example by exploiting adjacencies among items…
▽ More
Learning an embedding for a large collection of items is a popular approach to overcome the computational limitations associated to one-hot encodings. The aim of item embedding is to learn a low dimensional space for the representations, able to capture with its geometry relevant features or relationships for the data at hand. This can be achieved for example by exploiting adjacencies among items in large sets of unlabelled data. In this paper we interpret in an Information Geometric framework the item embeddings obtained from conditional models. By exploiting the $α$-geometry of the exponential family, first introduced by Amari, we introduce a family of natural $α$-embeddings represented by vectors in the tangent space of the probability simplex, which includes as a special case standard approaches available in the literature. A typical example is given by word embeddings, commonly used in natural language processing, such as Word2Vec and GloVe. In our analysis, we show how the $α$-deformation parameter can impact on standard evaluation tasks.
△ Less
Submitted 24 December, 2019; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Parameters Estimation for the Cosmic Microwave Background with Bayesian Neural Networks
Authors:
Hector J. Hortua,
Riccardo Volpi,
Dimitri Marinelli,
Luigi Malagò
Abstract:
In this paper, we present the first study that compares different models of Bayesian Neural Networks (BNNs) to predict the posterior distribution of the cosmological parameters directly from the Cosmic Microwave Background temperature and polarization maps. We focus our analysis on four different methods to sample the weights of the network during training: Dropout, DropConnect, Reparameterization…
▽ More
In this paper, we present the first study that compares different models of Bayesian Neural Networks (BNNs) to predict the posterior distribution of the cosmological parameters directly from the Cosmic Microwave Background temperature and polarization maps. We focus our analysis on four different methods to sample the weights of the network during training: Dropout, DropConnect, Reparameterization Trick (RT), and Flipout. We find out that Flipout outperforms all other methods regardless of the architecture used, and provides tighter constraints for the cosmological parameters. Moreover we compare with MCMC posterior analysis obtaining comparable error correlation among parameters, with BNNs being orders of magnitude faster in inference, although less accurate. Thanks to the speed of the inference process with BNNs, the posterior distribution, outcome of the neural network, can be used as the initial proposal for the Markov Chain. We show that this combined approach increases the acceptance rate in the Metropolis-Hasting algorithm and accelerates the convergence of the MCMC, while reaching the same final accuracy. In the second part of the paper, we present a guide to the training and calibration of a successful multi-channel BNN for the CMB temperature and polarization map. We show how tuning the regularization parameter for the standard deviation of the approximate posterior on the weights in Flipout and RT we can produce unbiased and reliable uncertainty estimates, i.e., the regularizer acts like a hyperparameter analogous to the dropout rate in Dropout. Finally, we show how polarization, when combined with the temperature in a unique multi-channel tensor fed to a single BNN, helps to break degeneracies among parameters and provides stringent constraints.
△ Less
Submitted 30 October, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Addressing Model Vulnerability to Distributional Shifts over Image Transformation Sets
Authors:
Riccardo Volpi,
Vittorio Murino
Abstract:
We are concerned with the vulnerability of computer vision models to distributional shifts. We formulate a combinatorial optimization problem that allows evaluating the regions in the image space where a given model is more vulnerable, in terms of image transformations applied to the input, and face it with standard search algorithms. We further embed this idea in a training procedure, where we de…
▽ More
We are concerned with the vulnerability of computer vision models to distributional shifts. We formulate a combinatorial optimization problem that allows evaluating the regions in the image space where a given model is more vulnerable, in terms of image transformations applied to the input, and face it with standard search algorithms. We further embed this idea in a training procedure, where we define new data augmentation rules according to the image transformations that the current model is most vulnerable to, over iterations. An empirical evaluation on classification and semantic segmentation problems suggests that the devised algorithm allows to train models that are more robust against content-preserving image manipulations and, in general, against distributional shifts.
△ Less
Submitted 20 August, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds
Authors:
Septimia Sârbu,
Riccardo Volpi,
Alexandra Peşte,
Luigi Malagò
Abstract:
In this paper we propose two novel bounds for the log-likelihood based on Kullback-Leibler and the Rényi divergences, which can be used for variational inference and in particular for the training of Variational AutoEncoders. Our proposal is motivated by the difficulties encountered in training VAEs on continuous datasets with high contrast images, such as those with handwritten digits and charact…
▽ More
In this paper we propose two novel bounds for the log-likelihood based on Kullback-Leibler and the Rényi divergences, which can be used for variational inference and in particular for the training of Variational AutoEncoders. Our proposal is motivated by the difficulties encountered in training VAEs on continuous datasets with high contrast images, such as those with handwritten digits and characters, where numerical issues often appear unless noise is added, either to the dataset during training or to the generative model given by the decoder. The new bounds we propose, which are obtained from the maximization of the likelihood of an interval for the observations, allow numerically stable training procedures without the necessity of adding any extra source of noise to the data.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
Generalizing to Unseen Domains via Adversarial Data Augmentation
Authors:
Riccardo Volpi,
Hongseok Namkoong,
Ozan Sener,
John Duchi,
Vittorio Murino,
Silvio Savarese
Abstract:
We are concerned with learning models that generalize well to different \emph{unseen} domains. We consider a worst-case formulation over data distributions that are near the source domain in the feature space. Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the…
▽ More
We are concerned with learning models that generalize well to different \emph{unseen} domains. We consider a worst-case formulation over data distributions that are near the source domain in the feature space. Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the current model. We show that our iterative scheme is an adaptive data augmentation method where we append adversarial examples at each iteration. For softmax losses, we show that our method is a data-dependent regularization scheme that behaves differently from classical regularizers that regularize towards zero (e.g., ridge or lasso). On digit recognition and semantic segmentation tasks, our method learns models improve performance across a range of a priori unknown target domains.
△ Less
Submitted 6 November, 2018; v1 submitted 30 May, 2018;
originally announced May 2018.
-
Adversarial Feature Augmentation for Unsupervised Domain Adaptation
Authors:
Riccardo Volpi,
Pietro Morerio,
Silvio Savarese,
Vittorio Murino
Abstract:
Recent works showed that Generative Adversarial Networks (GANs) can be successfully applied in unsupervised domain adaptation, where, given a labeled source dataset and an unlabeled target dataset, the goal is to train powerful classifiers for the target samples. In particular, it was shown that a GAN objective function can be used to learn target features indistinguishable from the source ones. I…
▽ More
Recent works showed that Generative Adversarial Networks (GANs) can be successfully applied in unsupervised domain adaptation, where, given a labeled source dataset and an unlabeled target dataset, the goal is to train powerful classifiers for the target samples. In particular, it was shown that a GAN objective function can be used to learn target features indistinguishable from the source ones. In this work, we extend this framework by (i) forcing the learned feature extractor to be domain-invariant, and (ii) training it through data augmentation in the feature space, namely performing feature augmentation. While data augmentation in the image space is a well established technique in deep learning, feature augmentation has not yet received the same level of attention. We accomplish it by means of a feature generator trained by playing the GAN minimax game against source features. Results show that both enforcing domain-invariance and performing feature augmentation lead to superior or comparable performance to state-of-the-art results in several unsupervised domain adaptation benchmarks.
△ Less
Submitted 4 May, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
Certifying Some Distributional Robustness with Principled Adversarial Training
Authors:
Aman Sinha,
Hongseok Namkoong,
Riccardo Volpi,
John Duchi
Abstract:
Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserst…
▽ More
Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball, we provide a training procedure that augments model parameter updates with worst-case perturbations of training data. For smooth losses, our procedure provably achieves moderate levels of robustness with little computational or statistical cost relative to empirical risk minimization. Furthermore, our statistical guarantees allow us to efficiently certify robustness for the population loss. For imperceptible perturbations, our method matches or outperforms heuristic approaches.
△ Less
Submitted 1 May, 2020; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Curriculum Dropout
Authors:
Pietro Morerio,
Jacopo Cavazza,
Riccardo Volpi,
Rene Vidal,
Vittorio Murino
Abstract:
Dropout is a very effective way of regularizing neural networks. Stochastically "dropping out" units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network generalization. Besides, Dropout can be interpreted as an approximate model aggregation technique, where an exponential number of smaller networks are averaged in o…
▽ More
Dropout is a very effective way of regularizing neural networks. Stochastically "dropping out" units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network generalization. Besides, Dropout can be interpreted as an approximate model aggregation technique, where an exponential number of smaller networks are averaged in order to get a more powerful ensemble. In this paper, we show that using a fixed dropout probability during training is a suboptimal choice. We thus propose a time scheduling for the probability of retaining neurons in the network. This induces an adaptive regularization scheme that smoothly increases the difficulty of the optimization problem. This idea of "starting easy" and adaptively increasing the difficulty of the learning problem has its roots in curriculum learning and allows one to train better models. Indeed, we prove that our optimization strategy implements a very general curriculum scheme, by gradually adding noise to both the input and intermediate feature representations within the network architecture. Experiments on seven image classification datasets and different network architectures show that our method, named Curriculum Dropout, frequently yields to better generalization and, at worst, performs just as well as the standard Dropout method.
△ Less
Submitted 3 August, 2017; v1 submitted 17 March, 2017;
originally announced March 2017.
-
Modeling Retinal Ganglion Cell Population Activity with Restricted Boltzmann Machines
Authors:
Matteo Zanotto,
Riccardo Volpi,
Alessandro Maccione,
Luca Berdondini,
Diego Sona,
Vittorio Murino
Abstract:
The retina is a complex nervous system which encodes visual stimuli before higher order processing occurs in the visual cortex. In this study we evaluated whether information about the stimuli received by the retina can be retrieved from the firing rate distribution of Retinal Ganglion Cells (RGCs), exploiting High-Density 64x64 MEA technology. To this end, we modeled the RGC population activity u…
▽ More
The retina is a complex nervous system which encodes visual stimuli before higher order processing occurs in the visual cortex. In this study we evaluated whether information about the stimuli received by the retina can be retrieved from the firing rate distribution of Retinal Ganglion Cells (RGCs), exploiting High-Density 64x64 MEA technology. To this end, we modeled the RGC population activity using mean-covariance Restricted Boltzmann Machines, latent variable models capable of learning the joint distribution of a set of continuous observed random variables and a set of binary unobserved random units. The idea was to figure out if binary latent states encode the regularities associated to different visual stimuli, as modes in the joint distribution. We measured the goodness of mcRBM encoding by calculating the Mutual Information between the latent states and the stimuli shown to the retina. Results show that binary states can encode the regularities associated to different stimuli, using both gratings and natural scenes as stimuli. We also discovered that hidden variables encode interesting properties of retinal activity, interpreted as population receptive fields. We further investigated the ability of the model to learn different modes in population activity by comparing results associated to a retina in normal conditions and after pharmacologically blocking GABA receptors (GABAC at first, and then also GABAA and GABAB). As expected, Mutual Information tends to decrease if we pharmacologically block receptors. We finally stress that the computational method described in this work could potentially be applied to any kind of neural data obtained through MEA technology, though different techniques should be applied to interpret the results.
△ Less
Submitted 17 January, 2017; v1 submitted 11 January, 2017;
originally announced January 2017.