-
Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces
Authors:
Jeremy Chopin,
Rozenn Dahyot
Abstract:
Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an alternative to the standard Softmax layer. GMMs based classifiers have recently been shown to have interesting performances as part of deep learning pipelines trained end…
▽ More
Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an alternative to the standard Softmax layer. GMMs based classifiers have recently been shown to have interesting performances as part of deep learning pipelines trained end-to-end. Our first contribution is to investigate GMM based classification performance taking advantage of the embedded spaces CLIP and ImageBind. Our second contribution is in proposing our own GMM based classifier with a lower parameters count than previously proposed. Our findings are, that in most cases, on these tested embedded spaces, one gaussian component in the GMMs is often enough for capturing each class, and we hypothesize that this may be due to the contrastive loss used for training these embedded spaces that naturally concentrates features together for each class. We also observed that ImageBind often provides better performance than CLIP for classification of image datasets even when these embedded spaces are compressed using PCA.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Combining geolocation and height estimation of objects from street level imagery
Authors:
Matej Ulicny,
Vladimir A. Krylov,
Julie Connelly,
Rozenn Dahyot
Abstract:
We propose a pipeline for combined multi-class object geolocation and height estimation from street level RGB imagery, which is considered as a single available input data modality. Our solution is formulated via Markov Random Field optimization with deterministic output. The proposed technique uses image metadata along with coordinates of objects detected in the image plane as found by a custom-t…
▽ More
We propose a pipeline for combined multi-class object geolocation and height estimation from street level RGB imagery, which is considered as a single available input data modality. Our solution is formulated via Markov Random Field optimization with deterministic output. The proposed technique uses image metadata along with coordinates of objects detected in the image plane as found by a custom-trained Convolutional Neural Network. Computing the object height using our methodology, in addition to object geolocation, has negligible effect on the overall computational cost. Accuracy is demonstrated experimentally for water drains and road signs on which we achieve average elevation estimation error lower than 20cm.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Model-based inexact graph matching on top of CNNs for semantic scene understanding
Authors:
Jérémy Chopin,
Jean-Baptiste Fasquel,
Harold Mouchère,
Rozenn Dahyot,
Isabelle Bloch
Abstract:
Deep learning based pipelines for semantic segmentation often ignore structural information available on annotated images used for training. We propose a novel post-processing module enforcing structural knowledge about the objects of interest to improve segmentation results provided by deep learning. This module corresponds to a "many-to-one-or-none" inexact graph matching approach, and is formul…
▽ More
Deep learning based pipelines for semantic segmentation often ignore structural information available on annotated images used for training. We propose a novel post-processing module enforcing structural knowledge about the objects of interest to improve segmentation results provided by deep learning. This module corresponds to a "many-to-one-or-none" inexact graph matching approach, and is formulated as a quadratic assignment problem. Our approach is compared to a CNN-based segmentation (for various CNN backbones) on two public datasets, one for face segmentation from 2D RGB images (FASSEG), and the other for brain segmentation from 3D MRIs (IBSR). Evaluations are performed using two types of structural information (distances and directional relations, , this choice being a hyper-parameter of our generic framework). On FASSEG data, results show that our module improves accuracy of the CNN by about 6.3% (the Hausdorff distance decreases from 22.11 to 20.71). On IBSR data, the improvement is of 51% (the Hausdorff distance decreases from 11.01 to 5.4). In addition, our approach is shown to be resilient to small training datasets that often limit the performance of deep learning methods: the improvement increases as the size of the training dataset decreases.
△ Less
Submitted 1 August, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.
-
Principal Component Classification
Authors:
Rozenn Dahyot
Abstract:
We propose to directly compute classification estimates by learning features encoded with their class scores using PCA. Our resulting model has a encoder-decoder structure suitable for supervised learning, it is computationally efficient and performs well for classification on several datasets.
We propose to directly compute classification estimates by learning features encoded with their class scores using PCA. Our resulting model has a encoder-decoder structure suitable for supervised learning, it is computationally efficient and performs well for classification on several datasets.
△ Less
Submitted 26 October, 2022; v1 submitted 23 October, 2022;
originally announced October 2022.
-
DR-VNet: Retinal Vessel Segmentation via Dense Residual UNet
Authors:
Ali Karaali,
Rozenn Dahyot,
Donal J. Sexton
Abstract:
Accurate retinal vessel segmentation is an important task for many computer-aided diagnosis systems. Yet, it is still a challenging problem due to the complex vessel structures of an eye. Numerous vessel segmentation methods have been proposed recently, however more research is needed to deal with poor segmentation of thin and tiny vessels. To address this, we propose a new deep learning pipeline…
▽ More
Accurate retinal vessel segmentation is an important task for many computer-aided diagnosis systems. Yet, it is still a challenging problem due to the complex vessel structures of an eye. Numerous vessel segmentation methods have been proposed recently, however more research is needed to deal with poor segmentation of thin and tiny vessels. To address this, we propose a new deep learning pipeline combining the efficiency of residual dense net blocks and, residual squeeze and excitation blocks. We validate experimentally our approach on three datasets and show that our pipeline outperforms current state of the art techniques on the sensitivity metric relevant to assess capture of small vessels.
△ Less
Submitted 22 March, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
3D point cloud segmentation using GIS
Authors:
Chao-Jung Liu,
Vladimir Krylov,
Rozenn Dahyot
Abstract:
In this paper we propose an approach to perform semantic segmentation of 3D point cloud data by importing the geographic information from a 2D GIS layer (OpenStreetMap). The proposed automatic procedure identifies meaningful units such as buildings and adjusts their locations to achieve best fit between the GIS polygonal perimeters and the point cloud. Our processing pipeline is presented and illu…
▽ More
In this paper we propose an approach to perform semantic segmentation of 3D point cloud data by importing the geographic information from a 2D GIS layer (OpenStreetMap). The proposed automatic procedure identifies meaningful units such as buildings and adjusts their locations to achieve best fit between the GIS polygonal perimeters and the point cloud. Our processing pipeline is presented and illustrated by segmenting point cloud data of Trinity College Dublin (Ireland) campus constructed from optical imagery collected by a drone.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Context Aware Object Geotagging
Authors:
Chao-Jung Liu,
Matej Ulicny,
Michael Manzke,
Rozenn Dahyot
Abstract:
Localization of street objects from images has gained a lot of attention in recent years. We propose an approach to improve asset geolocation from street view imagery by enhancing the quality of the metadata associated with the images using Structure from Motion. The predicted object geolocation is further refined by imposing contextual geographic information extracted from OpenStreetMap. Our pipe…
▽ More
Localization of street objects from images has gained a lot of attention in recent years. We propose an approach to improve asset geolocation from street view imagery by enhancing the quality of the metadata associated with the images using Structure from Motion. The predicted object geolocation is further refined by imposing contextual geographic information extracted from OpenStreetMap. Our pipeline is validated experimentally against the state of the art approaches for geotagging traffic lights.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Sliced $\mathcal{L}_2$ Distance for Colour Grading
Authors:
Hana Alghamdi,
Rozenn Dahyot
Abstract:
We propose a new method with $\mathcal{L}_2$ distance that maps one $N$-dimensional distribution to another, taking into account available information about correspondences. We solve the high-dimensional problem in 1D space using an iterative projection approach. To show the potentials of this mapping, we apply it to colour transfer between two images that exhibit overlapped scenes. Experiments sh…
▽ More
We propose a new method with $\mathcal{L}_2$ distance that maps one $N$-dimensional distribution to another, taking into account available information about correspondences. We solve the high-dimensional problem in 1D space using an iterative projection approach. To show the potentials of this mapping, we apply it to colour transfer between two images that exhibit overlapped scenes. Experiments show quantitative and qualitative competitive results as compared with the state of the art colour transfer methods.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Tensor Reordering for CNN Compression
Authors:
Matej Ulicny,
Vladimir A. Krylov,
Rozenn Dahyot
Abstract:
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain. Specifically, the representation extracted via Discrete Cosine Transform (DCT) is more conducive for pruning than the original space. By relying on a combination of weight tensor reshaping and reordering we achieve high levels of layer compression with just minor…
▽ More
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain. Specifically, the representation extracted via Discrete Cosine Transform (DCT) is more conducive for pruning than the original space. By relying on a combination of weight tensor reshaping and reordering we achieve high levels of layer compression with just minor accuracy loss. Our approach is applied to compress pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance after a significant parameter reduction. We validate our approach on ResNet-50 and MobileNet-V2 architectures for ImageNet classification task.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Iterative Nadaraya-Watson Distribution Transfer for Colour Grading
Authors:
Hana Alghamdi,
Rozenn Dahyot
Abstract:
We propose a new method with Nadaraya-Watson that maps one N-dimensional distribution to another taking into account available information about correspondences. We extend the 2D/3D problem to higher dimensions by encoding overlapping neighborhoods of data points and solve the high dimensional problem in 1D space using an iterative projection approach. To show potentials of this mapping, we apply…
▽ More
We propose a new method with Nadaraya-Watson that maps one N-dimensional distribution to another taking into account available information about correspondences. We extend the 2D/3D problem to higher dimensions by encoding overlapping neighborhoods of data points and solve the high dimensional problem in 1D space using an iterative projection approach. To show potentials of this mapping, we apply it to colour transfer between two images that exhibit overlapped scene. Experiments show quantitative and qualitative improvements over previous state of the art colour transfer methods.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Patch based Colour Transfer using SIFT Flow
Authors:
Hana Alghamdi,
Rozenn Dahyot
Abstract:
We propose a new colour transfer method with Optimal Transport (OT) to transfer the colour of a sourceimage to match the colour of a target image of the same scene that may exhibit large motion changes betweenimages. By definition OT does not take into account any available information about correspondences whencomputing the optimal solution. To tackle this problem we propose to encode overlapping…
▽ More
We propose a new colour transfer method with Optimal Transport (OT) to transfer the colour of a sourceimage to match the colour of a target image of the same scene that may exhibit large motion changes betweenimages. By definition OT does not take into account any available information about correspondences whencomputing the optimal solution. To tackle this problem we propose to encode overlapping neighborhoodsof pixels using both their colour and spatial correspondences estimated using motion estimation. We solvethe high dimensional problem in 1D space using an iterative projection approach. We further introducesmoothing as part of the iterative algorithms for solving optimal transport namely Iterative DistributionTransport (IDT) and its variant the Sliced Wasserstein Distance (SWD). Experiments show quantitative andqualitative improvements over previous state of the art colour transfer methods.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Harmonic Convolutional Networks based on Discrete Cosine Transform
Authors:
Matej Ulicny,
Vladimir A. Krylov,
Rozenn Dahyot
Abstract:
Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. We propose to learn these filters as combinations of preset spectral filters defined by the Discrete Cosine Transform (DCT). Our proposed DCT-based harmonic blocks replace conventional convolutional layers to produce partially or fully harmonic versions of new or existing CNN archite…
▽ More
Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. We propose to learn these filters as combinations of preset spectral filters defined by the Discrete Cosine Transform (DCT). Our proposed DCT-based harmonic blocks replace conventional convolutional layers to produce partially or fully harmonic versions of new or existing CNN architectures. Using DCT energy compaction properties, we demonstrate how the harmonic networks can be efficiently compressed by truncating high-frequency information in harmonic blocks thanks to the redundancies in the spectral domain. We report extensive experimental validation demonstrating benefits of the introduction of harmonic blocks into state-of-the-art CNN models in image classification, object detection and semantic segmentation applications.
△ Less
Submitted 9 April, 2022; v1 submitted 17 January, 2020;
originally announced January 2020.
-
Performance-Oriented Neural Architecture Search
Authors:
Andrew Anderson,
Jing Su,
Rozenn Dahyot,
David Gregg
Abstract:
Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around a…
▽ More
Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate $0.88\%$ increase in TOP-1 accuracy with $1.85\times$ reduction in latency for keyword spotting in audio on an embedded SoC, and $1.59\times$ on a high-end GPU.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Entropic Regularisation of Robust Optimal Transport
Authors:
Rozenn Dahyot,
Hana Alghamdi,
Mairead Grogan
Abstract:
Grogan et al [11,12] have recently proposed a solution to colour transfer by minimising the Euclidean distance L2 between two probability density functions capturing the colour distributions of two images (palette and target). It was shown to be very competitive to alternative solutions based on Optimal Transport for colour transfer. We show that in fact Grogan et al's formulation can also be unde…
▽ More
Grogan et al [11,12] have recently proposed a solution to colour transfer by minimising the Euclidean distance L2 between two probability density functions capturing the colour distributions of two images (palette and target). It was shown to be very competitive to alternative solutions based on Optimal Transport for colour transfer. We show that in fact Grogan et al's formulation can also be understood as a new robust Optimal Transport based framework with entropy regularisation over marginals.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Harmonic Networks with Limited Training Samples
Authors:
Matej Ulicny,
Vladimir A. Krylov,
Rozenn Dahyot
Abstract:
Convolutional neural networks (CNNs) are very popular nowadays for image processing. CNNs allow one to learn optimal filters in a (mostly) supervised machine learning context. However this typically requires abundant labelled training data to estimate the filter parameters. Alternative strategies have been deployed for reducing the number of parameters and / or filters to be learned and thus decre…
▽ More
Convolutional neural networks (CNNs) are very popular nowadays for image processing. CNNs allow one to learn optimal filters in a (mostly) supervised machine learning context. However this typically requires abundant labelled training data to estimate the filter parameters. Alternative strategies have been deployed for reducing the number of parameters and / or filters to be learned and thus decrease overfitting. In the context of reverting to preset filters, we propose here a computationally efficient harmonic block that uses Discrete Cosine Transform (DCT) filters in CNNs. In this work we examine the performance of harmonic networks in limited training data scenario. We validate experimentally that its performance compares well against scattering networks that use wavelets as preset filters.
△ Less
Submitted 30 April, 2019;
originally announced May 2019.
-
Automatic detection of passable roads after floods in remote sensed and social media data
Authors:
Kashif Ahmad,
Konstantin Pogorelov,
Michael Riegler,
Olga Ostroukhova,
Paal Halvorsen,
Nicola Conci,
Rozenn Dahyot
Abstract:
This paper addresses the problem of floods classification and floods aftermath detection utilizing both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for two corresponding tasks at the MediaEval 2018 be…
▽ More
This paper addresses the problem of floods classification and floods aftermath detection utilizing both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for two corresponding tasks at the MediaEval 2018 benchmarking challenge. The tasks are (i) identification of images providing evidence for road passability and (ii) differentiation and detection of passable and non-passable roads in images from two complementary sources of information. For the first challenge, we mainly rely on object and scene-level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are then combined using early, late and double fusion techniques. To identify whether or not it is possible for a vehicle to pass a road in satellite images, we rely on Convolutional Neural Networks and a transfer learning-based classification approach. The evaluation of the proposed methods are carried out on the large-scale datasets provided for the benchmark competition. The results demonstrate significant improvement in the performance over the recent state-of-art approaches.
△ Less
Submitted 10 January, 2019;
originally announced January 2019.
-
Harmonic Networks: Integrating Spectral Information into CNNs
Authors:
Matej Ulicny,
Vladimir A. Krylov,
Rozenn Dahyot
Abstract:
Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. In contrast, in this paper we propose harmonic blocks that produce features by learning optimal combinations of spectral filters defined by the Discrete Cosine Transform. The harmonic blocks are used to replace conventional convolutional layers to construct partial or fully harmonic…
▽ More
Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. In contrast, in this paper we propose harmonic blocks that produce features by learning optimal combinations of spectral filters defined by the Discrete Cosine Transform. The harmonic blocks are used to replace conventional convolutional layers to construct partial or fully harmonic CNNs. We extensively validate our approach and show that the introduction of harmonic blocks into state-of-the-art CNN baseline architectures results in comparable or better performance in classification tasks on small NORB, CIFAR10 and CIFAR100 datasets.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Automatic Discovery and Geotagging of Objects from Street View Imagery
Authors:
Vladimir A. Krylov,
Eamonn Kenny,
Rozenn Dahyot
Abstract:
Many applications such as autonomous navigation, urban planning and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper we propose to automatically detect and compute the GPS coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks:…
▽ More
Many applications such as autonomous navigation, urban planning and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper we propose to automatically detect and compute the GPS coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks: the first segments objects in the images while the second estimates their distance from the camera. To geolocate all the detected objects coherently we propose a novel custom Markov Random Field model to perform objects triangulation. The novelty of the resulting pipeline is the combined use of monocular depth estimation and triangulation to enable automatic mapping of complex scenes with multiple visually similar objects of interest. We validate experimentally the effectiveness of our approach on two object classes: traffic lights and telegraph poles. The experiments report high object recall rates and GPS accuracy within 2 meters, which is comparable with the precision of single-frequency GPS receivers.
△ Less
Submitted 1 December, 2017; v1 submitted 28 August, 2017;
originally announced August 2017.
-
Shape Registration with Directional Data
Authors:
Mairéad Grogan,
Rozenn Dahyot
Abstract:
We propose several cost functions for registration of shapes encoded with Euclidean and/or non-Euclidean information (unit vectors). Our framework is assessed for estimation of both rigid and non-rigid transformations between the target and model shapes corresponding to 2D contours and 3D surfaces. The experimental results obtained confirm that using the combination of a point's position and unit…
▽ More
We propose several cost functions for registration of shapes encoded with Euclidean and/or non-Euclidean information (unit vectors). Our framework is assessed for estimation of both rigid and non-rigid transformations between the target and model shapes corresponding to 2D contours and 3D surfaces. The experimental results obtained confirm that using the combination of a point's position and unit normal vector in a cost function can enhance the registration results compared to state of the art methods.
△ Less
Submitted 29 August, 2017; v1 submitted 25 August, 2017;
originally announced August 2017.
-
Robust Registration of Gaussian Mixtures for Colour Transfer
Authors:
Mairéad Grogan,
Rozenn Dahyot
Abstract:
We present a flexible approach to colour transfer inspired by techniques recently proposed for shape registration. Colour distributions of the palette and target images are modelled with Gaussian Mixture Models (GMMs) that are robustly registered to infer a non linear parametric transfer function. We show experimentally that our approach compares well to current techniques both quantitatively and…
▽ More
We present a flexible approach to colour transfer inspired by techniques recently proposed for shape registration. Colour distributions of the palette and target images are modelled with Gaussian Mixture Models (GMMs) that are robustly registered to infer a non linear parametric transfer function. We show experimentally that our approach compares well to current techniques both quantitatively and qualitatively. Moreover, our technique is computationally the fastest and can take efficient advantage of parallel processing architectures for recolouring images and videos. Our transfer function is parametric and hence can be stored in memory for later usage and also combined with other computed transfer functions to create interesting visual effects. Overall this paper provides a fast user friendly approach to recolouring of image and video materials.
△ Less
Submitted 17 May, 2017;
originally announced May 2017.