-
Parameter-Free Bio-Inspired Channel Attention for Enhanced Cardiac MRI Reconstruction
Authors:
Anam Hashmi,
Julia Dietlmeier,
Kathleen M. Curran,
Noel E. O'Connor
Abstract:
Attention is a fundamental component of the human visual recognition system. The inclusion of attention in a convolutional neural network amplifies relevant visual features and suppresses the less important ones. Integrating attention mechanisms into convolutional neural networks enhances model performance and interpretability. Spatial and channel attention mechanisms have shown significant advant…
▽ More
Attention is a fundamental component of the human visual recognition system. The inclusion of attention in a convolutional neural network amplifies relevant visual features and suppresses the less important ones. Integrating attention mechanisms into convolutional neural networks enhances model performance and interpretability. Spatial and channel attention mechanisms have shown significant advantages across many downstream tasks in medical imaging. While existing attention modules have proven to be effective, their design often lacks a robust theoretical underpinning. In this study, we address this gap by proposing a non-linear attention architecture for cardiac MRI reconstruction and hypothesize that insights from ecological principles can guide the development of effective and efficient attention mechanisms. Specifically, we investigate a non-linear ecological difference equation that describes single-species population growth to devise a parameter-free attention module surpassing current state-of-the-art parameter-free methods.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Authors:
Ayush K. Rai,
Kyle Min,
Tarun Krishna,
Feiyan Hu,
Alan F. Smeaton,
Noel E. O'Connor
Abstract:
Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-base…
▽ More
Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-based masking, as well as approaches that leverage key motion priors, optical flow and semantic cues from externally pre-trained models. In this work, we introduce a novel and generalizable Trajectory-Aware Adaptive Token Sampler (TATS), which models the motion dynamics of tokens and can be seamlessly integrated into the masked autoencoder (MAE) framework to select motion-centric tokens in videos. Additionally, we propose a unified training strategy that enables joint optimization of both MAE and TATS from scratch using Proximal Policy Optimization (PPO). We show that our model allows for aggressive masking without compromising performance on the downstream task of action recognition while also ensuring that the pre-training remains memory efficient. Extensive experiments of the proposed approach across four benchmarks, including Something-Something v2, Kinetics-400, UCF101, and HMDB51, demonstrate the effectiveness, transferability, generalization, and efficiency of our work compared to other state-of-the-art methods.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Comparative Analysis of Machine Learning-Based Imputation Techniques for Air Quality Datasets with High Missing Data Rates
Authors:
Sen Yan,
David J. O'Connor,
Xiaojun Wang,
Noel E. O'Connor,
Alan F. Smeaton,
Mingming Liu
Abstract:
Urban pollution poses serious health risks, particularly in relation to traffic-related air pollution, which remains a major concern in many cities. Vehicle emissions contribute to respiratory and cardiovascular issues, especially for vulnerable and exposed road users like pedestrians and cyclists. Therefore, accurate air quality monitoring with high spatial resolution is vital for good urban envi…
▽ More
Urban pollution poses serious health risks, particularly in relation to traffic-related air pollution, which remains a major concern in many cities. Vehicle emissions contribute to respiratory and cardiovascular issues, especially for vulnerable and exposed road users like pedestrians and cyclists. Therefore, accurate air quality monitoring with high spatial resolution is vital for good urban environmental management. This study aims to provide insights for processing spatiotemporal datasets with high missing data rates. In this study, the challenge of high missing data rates is a result of the limited data available and the fine granularity required for precise classification of PM2.5 levels. The data used for analysis and imputation were collected from both mobile sensors and fixed stations by Dynamic Parcel Distribution, the Environmental Protection Agency, and Google in Dublin, Ireland, where the missing data rate was approximately 82.42%, making accurate Particulate Matter 2.5 level predictions particularly difficult. Various imputation and prediction approaches were evaluated and compared, including ensemble methods, deep learning models, and diffusion models. External features such as traffic flow, weather conditions, and data from the nearest stations were incorporated to enhance model performance. The results indicate that diffusion methods with external features achieved the highest F1 score, reaching 0.9486 (Accuracy: 94.26%, Precision: 94.42%, Recall: 94.82%), with ensemble models achieving the highest accuracy of 94.82%, illustrating that good performance can be obtained despite a high missing data rate.
△ Less
Submitted 25 December, 2024; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation
Authors:
Kirill Sirotkin,
Marcos Escudero-Viñolo,
Pablo Carballeira,
Mayug Maniparambil,
Catarina Barata,
Noel E. O'Connor
Abstract:
Foundation models trained on web-scraped datasets propagate societal biases to downstream tasks. While counterfactual generation enables bias analysis, existing methods introduce artifacts by modifying contextual elements like clothing and background. We present a localized counterfactual generation method that preserves image context by constraining counterfactual modifications to specific attrib…
▽ More
Foundation models trained on web-scraped datasets propagate societal biases to downstream tasks. While counterfactual generation enables bias analysis, existing methods introduce artifacts by modifying contextual elements like clothing and background. We present a localized counterfactual generation method that preserves image context by constraining counterfactual modifications to specific attribute-relevant regions through automated masking and guided inpainting. When applied to the Conceptual Captions dataset for creating gender counterfactuals, our method results in higher visual and semantic fidelity than state-of-the-art alternatives, while maintaining the performance of models trained using only real data on non-human-centric tasks. Models fine-tuned with our counterfactuals demonstrate measurable bias reduction across multiple metrics, including a decrease in gender classification disparity and balanced person preference scores, while preserving ImageNet zero-shot performance. The results establish a framework for creating balanced datasets that enable both accurate bias profiling and effective mitigation.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
Authors:
Mayug Maniparambil,
Raiymbek Akshulakov,
Yasser Abdelaziz Dahou Djilali,
Sanath Narayan,
Ankit Singh,
Noel E. O'Connor
Abstract:
Recent contrastive multimodal vision-language models like CLIP have demonstrated robust open-world semantic understanding, becoming the standard image backbones for vision-language applications. However, recent findings suggest high semantic similarity between well-trained unimodal encoders, which raises a key question: Is there a plausible way to connect unimodal backbones for vision-language tas…
▽ More
Recent contrastive multimodal vision-language models like CLIP have demonstrated robust open-world semantic understanding, becoming the standard image backbones for vision-language applications. However, recent findings suggest high semantic similarity between well-trained unimodal encoders, which raises a key question: Is there a plausible way to connect unimodal backbones for vision-language tasks? To this end, we propose a novel framework that aligns vision and language using frozen unimodal encoders. It involves selecting semantically similar encoders in the latent space, curating a concept-rich dataset of image-caption pairs, and training simple MLP projectors. We evaluated our approach on 12 zero-shot classification datasets and 2 image-text retrieval datasets. Our best model, utilizing DINOv2 and All-Roberta-Large text encoder, achieves 76\(\%\) accuracy on ImageNet with a 20-fold reduction in data and 65-fold reduction in compute requirements compared multi-modal alignment where models are trained from scratch. The proposed framework enhances the accessibility of multimodal model development while enabling flexible adaptation across diverse scenarios. Code and curated datasets are available at \texttt{github.com/mayug/freeze-align}.
△ Less
Submitted 23 March, 2025; v1 submitted 28 September, 2024;
originally announced September 2024.
-
Synthetic Time Series for Anomaly Detection in Cloud Microservices
Authors:
Mohamed Allam,
Noureddine Boujnah,
Noel E. O'Connor,
Mingming Liu
Abstract:
This paper proposes a framework for time series generation built to investigate anomaly detection in cloud microservices. In the field of cloud computing, ensuring the reliability of microservices is of paramount concern and yet a remarkably challenging task. Despite the large amount of research in this area, validation of anomaly detection algorithms in realistic environments is difficult to achi…
▽ More
This paper proposes a framework for time series generation built to investigate anomaly detection in cloud microservices. In the field of cloud computing, ensuring the reliability of microservices is of paramount concern and yet a remarkably challenging task. Despite the large amount of research in this area, validation of anomaly detection algorithms in realistic environments is difficult to achieve. To address this challenge, we propose a framework to mimic the complex time series patterns representative of both normal and anomalous cloud microservices behaviors. We detail the pipeline implementation that allows deployment and management of microservices as well as the theoretical approach required to generate anomalies. Two datasets generated using the proposed framework have been made publicly available through GitHub.
△ Less
Submitted 21 July, 2024;
originally announced August 2024.
-
An accurate detection is not all you need to combat label noise in web-noisy datasets
Authors:
Paul Albert,
Jack Valmadre,
Eric Arazo,
Tarun Krishna,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Training a classifier on web-crawled data demands learning algorithms that are robust to annotation errors and irrelevant examples. This paper builds upon the recent empirical observation that applying unsupervised contrastive learning to noisy, web-crawled datasets yields a feature representation under which the in-distribution (ID) and out-of-distribution (OOD) samples are linearly separable. We…
▽ More
Training a classifier on web-crawled data demands learning algorithms that are robust to annotation errors and irrelevant examples. This paper builds upon the recent empirical observation that applying unsupervised contrastive learning to noisy, web-crawled datasets yields a feature representation under which the in-distribution (ID) and out-of-distribution (OOD) samples are linearly separable. We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples, and yet, surprisingly, this detection does not translate into gains in classification accuracy. Digging deeper into this phenomenon, we discover that the near-perfect detection misses a type of clean examples that are valuable for supervised learning. These examples often represent visually simple images, which are relatively easy to identify as clean examples using standard loss- or distance-based methods despite being poorly separated from the OOD distribution using unsupervised learning. Because we further observe a low correlation with SOTA metrics, this urges us to propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach. When combined with the SOTA algorithm PLS, we substantially improve SOTA results for real-world image classification in the presence of web noise github.com/PaulAlbert31/LSA
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Accelerating Cardiac MRI Reconstruction with CMRatt: An Attention-Driven Approach
Authors:
Anam Hashmi,
Julia Dietlmeier,
Kathleen M. Curran,
Noel E. O'Connor
Abstract:
Cine cardiac magnetic resonance (CMR) imaging is recognised as the benchmark modality for the comprehensive assessment of cardiac function. Nevertheless, the acquisition process of cine CMR is considered as an impediment due to its prolonged scanning time. One commonly used strategy to expedite the acquisition process is through k-space undersampling, though it comes with a drawback of introducing…
▽ More
Cine cardiac magnetic resonance (CMR) imaging is recognised as the benchmark modality for the comprehensive assessment of cardiac function. Nevertheless, the acquisition process of cine CMR is considered as an impediment due to its prolonged scanning time. One commonly used strategy to expedite the acquisition process is through k-space undersampling, though it comes with a drawback of introducing aliasing effects in the reconstructed image. Lately, deep learning-based methods have shown remarkable results over traditional approaches in rapidly achieving precise CMR reconstructed images. This study aims to explore the untapped potential of attention mechanisms incorporated with a deep learning model within the context of the CMR reconstruction problem. We are motivated by the fact that attention has proven beneficial in downstream tasks such as image classification and segmentation, but has not been systematically analysed in the context of CMR reconstruction. Our primary goal is to identify the strengths and potential limitations of attention algorithms when integrated with a convolutional backbone model such as a U-Net. To achieve this, we benchmark different state-of-the-art spatial and channel attention mechanisms on the CMRxRecon dataset and quantitatively evaluate the quality of reconstruction using objective metrics. Furthermore, inspired by the best performing attention mechanism, we propose a new, simple yet effective, attention pipeline specifically optimised for the task of cardiac image reconstruction that outperforms other state-of-the-art attention methods. The layer and model code will be made publicly available.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Authors:
Sidra Aleem,
Fangyijie Wang,
Mayug Maniparambil,
Eric Arazo,
Julia Dietlmeier,
Guenole Silvestre,
Kathleen Curran,
Noel E. O'Connor,
Suzanne Little
Abstract:
The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily…
▽ More
The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts are available at: https://github.com/aleemsidra/SaLIP.
△ Less
Submitted 30 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Do Vision and Language Encoders Represent the World Similarly?
Authors:
Mayug Maniparambil,
Raiymbek Akshulakov,
Yasser Abdelaziz Dahou Djilali,
Sanath Narayan,
Mohamed El Amine Seddik,
Karttikeya Mangalam,
Noel E. O'Connor
Abstract:
Aligned text-image encoders such as CLIP have become the de facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment exist between uni-modal vision and language encoders since they fundamentally represent the same physical world? Analyzing the latent spaces structure…
▽ More
Aligned text-image encoders such as CLIP have become the de facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment exist between uni-modal vision and language encoders since they fundamentally represent the same physical world? Analyzing the latent spaces structure of vision and language models on image-caption benchmarks using the Centered Kernel Alignment (CKA), we find that the representation spaces of unaligned and aligned encoders are semantically similar. In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training. We frame this as a seeded graph-matching problem exploiting the semantic similarity between graphs and propose two methods - a Fast Quadratic Assignment Problem optimization, and a novel localized CKA metric-based matching/retrieval. We demonstrate the effectiveness of this on several downstream tasks including cross-lingual, cross-domain caption matching and image classification. Code available at github.com/mayug/0-shot-llm-vision.
△ Less
Submitted 22 March, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach
Authors:
Ayush K. Rai,
Tarun Krishna,
Feiyan Hu,
Alexandru Drimbarean,
Kevin McGuinness,
Alan F. Smeaton,
Noel E. O'Connor
Abstract:
Video Anomaly Detection (VAD) is an open-set recognition task, which is usually formulated as a one-class classification (OCC) problem, where training data is comprised of videos with normal instances while test data contains both normal and anomalous instances. Recent works have investigated the creation of pseudo-anomalies (PAs) using only the normal data and making strong assumptions about real…
▽ More
Video Anomaly Detection (VAD) is an open-set recognition task, which is usually formulated as a one-class classification (OCC) problem, where training data is comprised of videos with normal instances while test data contains both normal and anomalous instances. Recent works have investigated the creation of pseudo-anomalies (PAs) using only the normal data and making strong assumptions about real-world anomalies with regards to abnormality of objects and speed of motion to inject prior information about anomalies in an autoencoder (AE) based reconstruction model during training. This work proposes a novel method for generating generic spatio-temporal PAs by inpainting a masked out region of an image using a pre-trained Latent Diffusion Model and further perturbing the optical flow using mixup to emulate spatio-temporal distortions in the data. In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting by learning three types of anomaly indicators, namely reconstruction quality, temporal irregularity and semantic inconsistency. Extensive experiments on four VAD benchmark datasets namely Ped2, Avenue, ShanghaiTech and UBnormal demonstrate that our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting. Our analysis also examines the transferability and generalisation of PAs across these datasets, offering valuable insights by identifying real-world anomalies through PAs.
△ Less
Submitted 7 April, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Breathing Green: Maximising Health and Environmental Benefits for Active Transportation Users Leveraging Large Scale Air Quality Data
Authors:
Sen Yan,
Shaoshu Zhu,
Jaime B. Fernandez,
Eric Arazo Sánchez,
Yingqi Gu,
Noel E. O'Connor,
David O'Connor,
Mingming Liu
Abstract:
Pollution in urban areas can have significant adverse effects on the health and well-being of citizens, with traffic-related air pollution being a major concern in many cities. Pollutants emitted by vehicles, such as nitrogen oxides, carbon monoxide, and particulate matter, can cause respiratory and cardiovascular problems, particularly for vulnerable road users like pedestrians and cyclists. Furt…
▽ More
Pollution in urban areas can have significant adverse effects on the health and well-being of citizens, with traffic-related air pollution being a major concern in many cities. Pollutants emitted by vehicles, such as nitrogen oxides, carbon monoxide, and particulate matter, can cause respiratory and cardiovascular problems, particularly for vulnerable road users like pedestrians and cyclists. Furthermore, recent research has indicated that individuals living in more polluted areas are at a greater risk of developing chronic illnesses such as asthma, allergies, and cancer. Addressing these problems is crucial to protecting public health and maximising environmental benefits. In this project, we explore the feasibility of tackling this challenge by leveraging big data analysis and data-driven methods. Specifically, we investigate the recently released Google Air Quality dataset and devise an optimisation strategy to suggest green travel routes for different types of active transportation users in Dublin. To demonstrate our achievement, we have developed a prototype and have shown that citizens who use our model to plan their outdoor activities can benefit notably, with a significant decrease of 17.87% on average in pollutant intake, from the environmental advantages it offers.
△ Less
Submitted 18 July, 2024; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Self-Supervised and Semi-Supervised Polyp Segmentation using Synthetic Data
Authors:
Enric Moreu,
Eric Arazo,
Kevin McGuinness,
Noel E. O'Connor
Abstract:
Early detection of colorectal polyps is of utmost importance for their treatment and for colorectal cancer prevention. Computer vision techniques have the potential to aid professionals in the diagnosis stage, where colonoscopies are manually carried out to examine the entirety of the patient's colon. The main challenge in medical imaging is the lack of data, and a further challenge specific to po…
▽ More
Early detection of colorectal polyps is of utmost importance for their treatment and for colorectal cancer prevention. Computer vision techniques have the potential to aid professionals in the diagnosis stage, where colonoscopies are manually carried out to examine the entirety of the patient's colon. The main challenge in medical imaging is the lack of data, and a further challenge specific to polyp segmentation approaches is the difficulty of manually labeling the available data: the annotation process for segmentation tasks is very time-consuming. While most recent approaches address the data availability challenge with sophisticated techniques to better exploit the available labeled data, few of them explore the self-supervised or semi-supervised paradigm, where the amount of labeling required is greatly reduced. To address both challenges, we leverage synthetic data and propose an end-to-end model for polyp segmentation that integrates real and synthetic data to artificially increase the size of the datasets and aid the training when unlabeled samples are available. Concretely, our model, Pl-CUT-Seg, transforms synthetic images with an image-to-image translation module and combines the resulting images with real images to train a segmentation model, where we use model predictions as pseudo-labels to better leverage unlabeled samples. Additionally, we propose PL-CUT-Seg+, an improved version of the model that incorporates targeted regularization to address the domain gap between real and synthetic images. The models are evaluated on standard benchmarks for polyp segmentation and reach state-of-the-art results in the self- and semi-supervised setups.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts
Authors:
Mayug Maniparambil,
Chris Vorster,
Derek Molloy,
Noel Murphy,
Kevin McGuinness,
Noel E. O'Connor
Abstract:
Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing prompts that are relevant to the dataset. Such prompt engineering makes use of domain expertise and a validation dataset. Meanwhile, recent developments in generativ…
▽ More
Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing prompts that are relevant to the dataset. Such prompt engineering makes use of domain expertise and a validation dataset. Meanwhile, recent developments in generative pretrained models like GPT-4 mean they can be used as advanced internet search tools. They can also be manipulated to provide visual information in any structure. In this work, we show that GPT-4 can be used to generate text that is visually descriptive and how this can be used to adapt CLIP to downstream tasks. We show considerable improvements in 0-shot transfer accuracy on specialized fine-grained datasets like EuroSAT (~7%), DTD (~7%), SUN397 (~4.6%), and CUB (~3.3%) when compared to CLIP's default prompt. We also design a simple few-shot adapter that learns to choose the best possible sentences to construct generalizable classifiers that outperform the recently proposed CoCoOP by ~2% on average and by over 4% on 4 specialized fine-grained datasets. The code, prompts, and auxiliary text dataset is available at https://github.com/mayug/VDT-Adapter.
△ Less
Submitted 8 August, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Joint one-sided synthetic unpaired image translation and segmentation for colorectal cancer prevention
Authors:
Enric Moreu,
Eric Arazo,
Kevin McGuinness,
Noel E. O'Connor
Abstract:
Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We propose CUT-seg, a joint training where a segmentation model and a…
▽ More
Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We propose CUT-seg, a joint training where a segmentation model and a generative model are jointly trained to produce realistic images while learning to segment polyps. We take advantage of recent one-sided translation models because they use significantly less memory, allowing us to add a segmentation model in the training loop. CUT-seg performs better, is computationally less expensive, and requires less real images than other memory-intensive image translation approaches that require two stage training. Promising results are achieved on five real polyp segmentation datasets using only one real image and zero real annotations. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colon
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Fashion CUT: Unsupervised domain adaptation for visual pattern classification in clothes using synthetic data and pseudo-labels
Authors:
Enric Moreu,
Alex Martinelli,
Martina Naughton,
Philip Kelly,
Noel E. O'Connor
Abstract:
Accurate product information is critical for e-commerce stores to allow customers to browse, filter, and search for products. Product data quality is affected by missing or incorrect information resulting in poor customer experience. While machine learning can be used to correct inaccurate or missing information, achieving high performance on fashion image classification tasks requires large amoun…
▽ More
Accurate product information is critical for e-commerce stores to allow customers to browse, filter, and search for products. Product data quality is affected by missing or incorrect information resulting in poor customer experience. While machine learning can be used to correct inaccurate or missing information, achieving high performance on fashion image classification tasks requires large amounts of annotated data, but it is expensive to generate due to labeling costs. One solution can be to generate synthetic data which requires no manual labeling. However, training a model with a dataset of solely synthetic images can lead to poor generalization when performing inference on real-world data because of the domain shift. We introduce a new unsupervised domain adaptation technique that converts images from the synthetic domain into the real-world domain. Our approach combines a generative neural network and a classifier that are jointly trained to produce realistic images while preserving the synthetic label information. We found that using real-world pseudo-labels during training helps the classifier to generalize in the real-world domain, reducing the synthetic bias. We successfully train a visual pattern classification model in the fashion domain without real-world annotations. Experiments show that our method outperforms other unsupervised domain adaptation algorithms.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
U-Park: A User-Centric Smart Parking Recommendation System for Electric Shared Micromobility Services
Authors:
Sen Yan,
Noel E. O'Connor,
Mingming Liu
Abstract:
Electric Shared Micromobility Services (ESMS) has become a vital element within the Mobility as a Service framework, contributing to sustainable transportation systems. However, existing ESMS face notable design challenges such as shortcomings in integration, transparency, and user-centred approaches, resulting in increased operational costs and decreased service quality. A key operational issue f…
▽ More
Electric Shared Micromobility Services (ESMS) has become a vital element within the Mobility as a Service framework, contributing to sustainable transportation systems. However, existing ESMS face notable design challenges such as shortcomings in integration, transparency, and user-centred approaches, resulting in increased operational costs and decreased service quality. A key operational issue for ESMS revolves around parking, particularly ensuring the availability of parking spaces as users approach their destinations. For instance, a recent study illustrated that nearly 13% of shared E-Bike users in Dublin, Ireland, encounter difficulties parking their E-Bikes due to inadequate planning and guidance. In response, we introduce U-Park, a user-centric smart parking recommendation system designed for ESMS, providing tailored recommendations to users by analysing their historical mobility data, trip trajectory, and parking space availability. We present the system architecture, implement it, and evaluate its performance using real-world data from an Irish-based shared E-Bike provider, MOBY Bikes. Our results illustrate U-Park's ability to predict a user's destination within a shared E-Bike system, achieving an approximate accuracy rate of over 97.60%, all without requiring direct user input. Experiments have proven that this predictive capability empowers U-Park to suggest the optimal parking station to users based on the availability of predicted parking spaces, improving the probability of obtaining a parking spot by 24.91% on average and 29.66% on maximum when parking availability is limited.
△ Less
Submitted 18 July, 2024; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies
Authors:
Qiang Wang,
Robert McCarthy,
David Cordova Bulens,
Francisco Roldan Sanchez,
Kevin McGuinness,
Noel E. O'Connor,
Stephen J. Redmond
Abstract:
This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy le…
▽ More
This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy learning algorithm, Behavioral Cloning (BC), performed remarkably well when trained on expert datasets, it outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, BC's performance deteriorated when applied to mixed datasets, and the performance of offline RL algorithms was also unsatisfactory. Upon examining the mixed datasets, we observed that they contained a significant amount of expert data, although this data was unlabeled. To address this issue, we proposed a semi-supervised learning-based classifier to identify the underlying expert behavior within mixed datasets, effectively isolating the expert data. To further enhance BC's performance, we leveraged the geometric symmetry of the RRC arena to augment the training dataset through mathematical transformations. In the end, our submission surpassed that of all other participants, even those who employed complex offline RL algorithms and intricate data processing and feature engineering techniques.
△ Less
Submitted 21 September, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Improving Behavioural Cloning with Positive Unlabeled Learning
Authors:
Qiang Wang,
Robert McCarthy,
David Cordova Bulens,
Kevin McGuinness,
Noel E. O'Connor,
Nico Gürtler,
Felix Widmaier,
Francisco Roldan Sanchez,
Stephen J. Redmond
Abstract:
Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajecto…
▽ More
Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: \url{https://sites.google.com/view/offline-policy-learning-pubc}.
△ Less
Submitted 21 September, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Unifying Synergies between Self-supervised Learning and Dynamic Computation
Authors:
Tarun Krishna,
Ayush K Rai,
Alexandru Drimbarean,
Eric Arazo,
Paul Albert,
Alan F Smeaton,
Kevin McGuinness,
Noel E O'Connor
Abstract:
Computationally expensive training strategies make self-supervised learning (SSL) impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweightmodel, which usually involves multiple epochs of fine-tuning (or distilling steps) of a large pre-trained model, making it more computation…
▽ More
Computationally expensive training strategies make self-supervised learning (SSL) impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweightmodel, which usually involves multiple epochs of fine-tuning (or distilling steps) of a large pre-trained model, making it more computationally challenging. In this work we present a novel perspective on the interplay between SSL and DC paradigms. In particular, we show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting without any additional fine-tuning or pruning steps. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off and therefore yields a generic and multi-purpose architecture for application specific industrial settings. Extensive experiments on several image classification benchmarks including CIFAR-10/100, STL-10 and ImageNet-100, demonstrate that the proposed training strategy provides a dense and corresponding gated sub-network that achieves on-par performance compared with the vanilla self-supervised setting, but at a significant reduction in computation in terms of FLOPs, under a range of target budgets (td ).
△ Less
Submitted 9 September, 2023; v1 submitted 22 January, 2023;
originally announced January 2023.
-
Motion Aware Self-Supervision for Generic Event Boundary Detection
Authors:
Ayush K. Rai,
Tarun Krishna,
Julia Dietlmeier,
Kevin McGuinness,
Alan F. Smeaton,
Noel E. O'Connor
Abstract:
The task of Generic Event Boundary Detection (GEBD) aims to detect moments in videos that are naturally perceived by humans as generic and taxonomy-free event boundaries. Modeling the dynamically evolving temporal and spatial changes in a video makes GEBD a difficult problem to solve. Existing approaches involve very complex and sophisticated pipelines in terms of architectural design choices, hen…
▽ More
The task of Generic Event Boundary Detection (GEBD) aims to detect moments in videos that are naturally perceived by humans as generic and taxonomy-free event boundaries. Modeling the dynamically evolving temporal and spatial changes in a video makes GEBD a difficult problem to solve. Existing approaches involve very complex and sophisticated pipelines in terms of architectural design choices, hence creating a need for more straightforward and simplified approaches. In this work, we address this issue by revisiting a simple and effective self-supervised method and augment it with a differentiable motion feature learning module to tackle the spatial and temporal diversities in the GEBD task. We perform extensive experiments on the challenging Kinetics-GEBD and TAPOS datasets to demonstrate the efficacy of the proposed approach compared to the other self-supervised state-of-the-art methods. We also show that this simple self-supervised approach learns motion features without any explicit motion-specific pretext task.
△ Less
Submitted 12 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Is your noise correction noisy? PLS: Robustness to label noise with two stage detection
Authors:
Paul Albert,
Eric Arazo,
Tarun Krishna,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Designing robust algorithms capable of training accurate neural networks on uncurated datasets from the web has been the subject of much research as it reduces the need for time consuming human labor. The focus of many previous research contributions has been on the detection of different types of label noise; however, this paper proposes to improve the correction accuracy of noisy samples once th…
▽ More
Designing robust algorithms capable of training accurate neural networks on uncurated datasets from the web has been the subject of much research as it reduces the need for time consuming human labor. The focus of many previous research contributions has been on the detection of different types of label noise; however, this paper proposes to improve the correction accuracy of noisy samples once they have been detected. In many state-of-the-art contributions, a two phase approach is adopted where the noisy samples are detected before guessing a corrected pseudo-label in a semi-supervised fashion. The guessed pseudo-labels are then used in the supervised objective without ensuring that the label guess is likely to be correct. This can lead to confirmation bias, which reduces the noise robustness. Here we propose the pseudo-loss, a simple metric that we find to be strongly correlated with pseudo-label correctness on noisy samples. Using the pseudo-loss, we dynamically down weight under-confident pseudo-labels throughout training to avoid confirmation bias and improve the network accuracy. We additionally propose to use a confidence guided contrastive objective that learns robust representation on an interpolated objective between class bound (supervised) for confidently corrected samples and unsupervised representation for under-confident label corrections. Experiments demonstrate the state-of-the-art performance of our Pseudo-Loss Selection (PLS) algorithm on a variety of benchmark datasets including curated data synthetically corrupted with in-distribution and out-of-distribution noise, and two real world web noise datasets. Our experiments are fully reproducible github.com/PaulAlbert31/SNCF
△ Less
Submitted 15 October, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Cardiac Segmentation using Transfer Learning under Respiratory Motion Artifacts
Authors:
Carles Garcia-Cabrera,
Eric Arazo,
Kathleen M. Curran,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Methods that are resilient to artifacts in the cardiac magnetic resonance imaging (MRI) while performing ventricle segmentation, are crucial for ensuring quality in structural and functional analysis of those tissues. While there has been significant efforts on improving the quality of the algorithms, few works have tackled the harm that the artifacts generate in the predictions. In this work, we…
▽ More
Methods that are resilient to artifacts in the cardiac magnetic resonance imaging (MRI) while performing ventricle segmentation, are crucial for ensuring quality in structural and functional analysis of those tissues. While there has been significant efforts on improving the quality of the algorithms, few works have tackled the harm that the artifacts generate in the predictions. In this work, we study fine tuning of pretrained networks to improve the resilience of previous methods to these artifacts. In our proposed method, we adopted the extensive usage of data augmentations that mimic those artifacts. The results significantly improved the baseline segmentations (up to 0.06 Dice score, and 4mm Hausdorff distance improvement).
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Dynamic Channel Selection in Self-Supervised Learning
Authors:
Tarun Krishna,
Ayush K. Rai,
Yasser A. D. Djilali,
Alan F. Smeaton,
Kevin McGuinness,
Noel E. O'Connor
Abstract:
Whilst computer vision models built using self-supervised approaches are now commonplace, some important questions remain. Do self-supervised models learn highly redundant channel features? What if a self-supervised network could dynamically select the important channels and get rid of the unnecessary ones? Currently, convnets pre-trained with self-supervision have obtained comparable performance…
▽ More
Whilst computer vision models built using self-supervised approaches are now commonplace, some important questions remain. Do self-supervised models learn highly redundant channel features? What if a self-supervised network could dynamically select the important channels and get rid of the unnecessary ones? Currently, convnets pre-trained with self-supervision have obtained comparable performance on downstream tasks in comparison to their supervised counterparts in computer vision. However, there are drawbacks to self-supervised models including their large numbers of parameters, computationally expensive training strategies and a clear need for faster inference on downstream tasks. In this work, our goal is to address the latter by studying how a standard channel selection method developed for supervised learning can be applied to networks trained with self-supervision. We validate our findings on a range of target budgets $t_{d}$ for channel computation on image classification task across different datasets, specifically CIFAR-10, CIFAR-100, and ImageNet-100, obtaining comparable performance to that of the original network when selecting all channels but at a significant reduction in computation reported in terms of FLOPs.
△ Less
Submitted 16 December, 2022; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets
Authors:
Paul Albert,
Eric Arazo,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset, but their main drawback remains the proportion of incorrect (noisy) samples retrieved. These noisy samples have been evidenced by previous works to be a mixture of in-distribution (ID) samples, assigned to the incorrect category but presenting similar visual semantics to other…
▽ More
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset, but their main drawback remains the proportion of incorrect (noisy) samples retrieved. These noisy samples have been evidenced by previous works to be a mixture of in-distribution (ID) samples, assigned to the incorrect category but presenting similar visual semantics to other classes in the dataset, and out-of-distribution (OOD) images, which share no semantic correlation with any category from the dataset. The latter are, in practice, the dominant type of noisy images retrieved. To tackle this noise duality, we propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning to represent images in a feature space. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere. We then spectrally embed the unsupervised representations using a fixed neighborhood size and apply an outlier sensitive clustering at the class level to detect the clean and OOD clusters as well as ID noisy outliers. We finally train a noise robust neural network that corrects ID noise to the correct category and utilizes OOD samples in a guided contrastive objective, clustering them to improve low-level features. Our algorithm improves the state-of-the-art results on synthetic noise image datasets as well as real-world web-crawled data. Our work is fully reproducible github.com/PaulAlbert31/SNCF.
△ Less
Submitted 18 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Utilizing unsupervised learning to improve sward content prediction and herbage mass estimation
Authors:
Paul Albert,
Mohamed Saadeldin,
Badri Narayanan,
Brian Mac Namee,
Deirdre Hennessy,
Aisling H. O'Connor,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Sward species composition estimation is a tedious one. Herbage must be collected in the field, manually separated into components, dried and weighed to estimate species composition. Deep learning approaches using neural networks have been used in previous work to propose faster and more cost efficient alternatives to this process by estimating the biomass information from a picture of an area of p…
▽ More
Sward species composition estimation is a tedious one. Herbage must be collected in the field, manually separated into components, dried and weighed to estimate species composition. Deep learning approaches using neural networks have been used in previous work to propose faster and more cost efficient alternatives to this process by estimating the biomass information from a picture of an area of pasture alone. Deep learning approaches have, however, struggled to generalize to distant geographical locations and necessitated further data collection to retrain and perform optimally in different climates. In this work, we enhance the deep learning solution by reducing the need for ground-truthed (GT) images when training the neural network. We demonstrate how unsupervised contrastive learning can be used in the sward composition prediction problem and compare with the state-of-the-art on the publicly available GrassClover dataset collected in Denmark as well as a more recent dataset from Ireland where we tackle herbage mass and height estimation.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Unsupervised domain adaptation and super resolution on drone images for autonomous dry herbage biomass estimation
Authors:
Paul Albert,
Mohamed Saadeldin,
Badri Narayanan,
Jaime Fernandez,
Brian Mac Namee,
Deirdre Hennessey,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Herbage mass yield and composition estimation is an important tool for dairy farmers to ensure an adequate supply of high quality herbage for grazing and subsequently milk production. By accurately estimating herbage mass and composition, targeted nitrogen fertiliser application strategies can be deployed to improve localised regions in a herbage field, effectively reducing the negative impacts of…
▽ More
Herbage mass yield and composition estimation is an important tool for dairy farmers to ensure an adequate supply of high quality herbage for grazing and subsequently milk production. By accurately estimating herbage mass and composition, targeted nitrogen fertiliser application strategies can be deployed to improve localised regions in a herbage field, effectively reducing the negative impacts of over-fertilization on biodiversity and the environment. In this context, deep learning algorithms offer a tempting alternative to the usual means of sward composition estimation, which involves the destructive process of cutting a sample from the herbage field and sorting by hand all plant species in the herbage. The process is labour intensive and time consuming and so not utilised by farmers. Deep learning has been successfully applied in this context on images collected by high-resolution cameras on the ground. Moving the deep learning solution to drone imaging, however, has the potential to further improve the herbage mass yield and composition estimation task by extending the ground-level estimation to the large surfaces occupied by fields/paddocks. Drone images come at the cost of lower resolution views of the fields taken from a high altitude and requires further herbage ground-truth collection from the large surfaces covered by drone images. This paper proposes to transfer knowledge learned on ground-level images to raw drone images in an unsupervised manner. To do so, we use unpaired image style translation to enhance the resolution of drone images by a factor of eight and modify them to appear closer to their ground-level counterparts. We then ... ~\url{www.github.com/PaulAlbert31/Clover_SSL}.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Parking Behaviour Analysis of Shared E-Bike Users Based on a Real-World Dataset -- A Case Study in Dublin, Ireland
Authors:
Sen Yan,
Mingming Liu,
Noel E. O'Connor
Abstract:
In recent years, an increasing number of shared E-bikes have been rolling out rapidly in our cities. It therefore becomes important to understand new behaviour patterns of the cyclists in using these E-bikes as a foundation for the novel design of shared micromobility services as part of the realisation for next generation intelligent transportation systems. In this paper, we deeply investigate th…
▽ More
In recent years, an increasing number of shared E-bikes have been rolling out rapidly in our cities. It therefore becomes important to understand new behaviour patterns of the cyclists in using these E-bikes as a foundation for the novel design of shared micromobility services as part of the realisation for next generation intelligent transportation systems. In this paper, we deeply investigate the users' behaviour of shared E-bikes in a case study by using the real-world dataset collected from the shared E-bike company, MOBY, which currently operates in Dublin, Ireland. More specifically, we look into the parking behaviours of users as we know that inappropriate parking of these bikes can not only increase the management costs of the company but also result in other users' inconveniences, especially in situations of battery shortage, which inevitably reduces the overall operational efficacy of these shared E-bikes. Our work has conducted analysis at both bike station and individual level in a fully anonymous and GDPR-Compliant manner, and our results have shown that up to 12.9% of shared E-bike users did not park their bikes properly at the designated stands. Different visualisation tools have been applied to better illustrate our obtained results.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Synthetic data for unsupervised polyp segmentation
Authors:
Enric Moreu,
Kevin McGuinness,
Noel E. O'Connor
Abstract:
Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We use zero annotations from medical professionals in our pipeline. Ou…
▽ More
Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We use zero annotations from medical professionals in our pipeline. Our fully unsupervised method achieves promising results on five real polyp segmentation datasets. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colon
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Domain Randomization for Object Counting
Authors:
Enric Moreu,
Kevin McGuinness,
Diego Ortego,
Noel E. O'Connor
Abstract:
Recently, the use of synthetic datasets based on game engines has been shown to improve the performance of several tasks in computer vision. However, these datasets are typically only appropriate for the specific domains depicted in computer games, such as urban scenes involving vehicles and people. In this paper, we present an approach to generate synthetic datasets for object counting for any do…
▽ More
Recently, the use of synthetic datasets based on game engines has been shown to improve the performance of several tasks in computer vision. However, these datasets are typically only appropriate for the specific domains depicted in computer games, such as urban scenes involving vehicles and people. In this paper, we present an approach to generate synthetic datasets for object counting for any domain without the need for photo-realistic techniques manually generated by expensive teams of 3D artists. We introduce a domain randomization approach for object counting based on synthetic datasets that are quick and inexpensive to generate. We deliberately avoid photorealism and drastically increase the variability of the dataset, producing images with random textures and 3D transformations, which improves generalization. Experiments show that our method facilitates good performance on various real word object counting datasets for multiple domains: people, vehicles, penguins, and fruit. The source code is available at: https://github.com/enric1994/dr4oc
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment
Authors:
Luis Lebron,
Yvette Graham,
Kevin McGuinness,
Konstantinos Kouramas,
Noel E. O'Connor
Abstract:
Evaluating video captioning systems is a challenging task as there are multiple factors to consider; for instance: the fluency of the caption, multiple actions happening in a single scene, and the human bias of what is considered important. Most metrics try to measure how similar the system generated captions are to a single or a set of human-annotated captions. This paper presents a new method ba…
▽ More
Evaluating video captioning systems is a challenging task as there are multiple factors to consider; for instance: the fluency of the caption, multiple actions happening in a single scene, and the human bias of what is considered important. Most metrics try to measure how similar the system generated captions are to a single or a set of human-annotated captions. This paper presents a new method based on a deep learning model to evaluate these systems. The model is based on BERT, which is a language model that has been shown to work well in multiple NLP tasks. The aim is for the model to learn to perform an evaluation similar to that of a human. To do so, we use a dataset that contains human evaluations of system generated captions. The dataset consists of the human judgments of the captions produce by the system participating in various years of the TRECVid video to text task. These annotations will be made publicly available. BERTHA obtain favourable results, outperforming the commonly used metrics in some setups.
△ Less
Submitted 16 May, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Improving Person Re-Identification with Temporal Constraints
Authors:
Julia Dietlmeier,
Feiyan Hu,
Frances Ryan,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
In this paper we introduce an image-based person re-identification dataset collected across five non-overlapping camera views in the large and busy airport in Dublin, Ireland. Unlike all publicly available image-based datasets, our dataset contains timestamp information in addition to frame number, and camera and person IDs. Also our dataset has been fully anonymized to comply with modern data pri…
▽ More
In this paper we introduce an image-based person re-identification dataset collected across five non-overlapping camera views in the large and busy airport in Dublin, Ireland. Unlike all publicly available image-based datasets, our dataset contains timestamp information in addition to frame number, and camera and person IDs. Also our dataset has been fully anonymized to comply with modern data privacy regulations. We apply state-of-the-art person re-identification models to our dataset and show that by leveraging the available timestamp information we are able to achieve a significant gain of 37.43% in mAP and a gain of 30.22% in Rank1 accuracy. We also propose a Bayesian temporal re-ranking post-processing step, which further adds a 10.03% gain in mAP and 9.95% gain in Rank1 accuracy metrics. This work on combining visual and temporal information is not possible on other image-based person re-identification datasets. We believe that the proposed new dataset will enable further development of person re-identification research for challenging real-world applications. DAA dataset can be downloaded from https://bit.ly/3AtXTd6
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
How Important is Importance Sampling for Deep Budgeted Training?
Authors:
Eric Arazo,
Diego Ortego,
Paul Albert,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Long iterative training processes for Deep Neural Networks (DNNs) are commonly required to achieve state-of-the-art performance in many computer vision tasks. Importance sampling approaches might play a key role in budgeted training regimes, i.e. when limiting the number of training iterations. These approaches aim at dynamically estimating the importance of each sample to focus on the most releva…
▽ More
Long iterative training processes for Deep Neural Networks (DNNs) are commonly required to achieve state-of-the-art performance in many computer vision tasks. Importance sampling approaches might play a key role in budgeted training regimes, i.e. when limiting the number of training iterations. These approaches aim at dynamically estimating the importance of each sample to focus on the most relevant and speed up convergence. This work explores this paradigm and how a budget constraint interacts with importance sampling approaches and data augmentation techniques. We show that under budget restrictions, importance sampling approaches do not provide a consistent improvement over uniform sampling. We suggest that, given a specific budget, the best course of action is to disregard the importance and introduce adequate data augmentation; e.g. when reducing the budget to a 30% in CIFAR-10/100, RICAP data augmentation maintains accuracy, while importance sampling does not. We conclude from our work that DNNs under budget restrictions benefit greatly from variety in the training set and that finding the right samples to train on is not the most effective strategy when balancing high performance with low computational requirements. Source code available at https://git.io/JKHa3 .
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Discerning Generic Event Boundaries in Long-Form Wild Videos
Authors:
Ayush K Rai,
Tarun Krishna,
Julia Dietlmeier,
Kevin McGuinness,
Alan F Smeaton,
Noel E O'Connor
Abstract:
Detecting generic, taxonomy-free event boundaries invideos represents a major stride forward towards holisticvideo understanding. In this paper we present a technique forgeneric event boundary detection based on a two stream in-flated 3D convolutions architecture, which can learn spatio-temporal features from videos. Our work is inspired from theGeneric Event Boundary Detection Challenge (part of…
▽ More
Detecting generic, taxonomy-free event boundaries invideos represents a major stride forward towards holisticvideo understanding. In this paper we present a technique forgeneric event boundary detection based on a two stream in-flated 3D convolutions architecture, which can learn spatio-temporal features from videos. Our work is inspired from theGeneric Event Boundary Detection Challenge (part of CVPR2021 Long Form Video Understanding- LOVEU Workshop).Throughout the paper we provide an in-depth analysis ofthe experiments performed along with an interpretation ofthe results obtained.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Optimal Distributed Bandwidth Allocation in NB-IoT Networks
Authors:
Hongde Wu,
Zhengyong Chen,
Noel E. O'Connor,
Mingming Liu
Abstract:
In this paper, we investigate a key problem of Narrowband-Internet of Things (NB-IoT) in the context of 5G with Mobile Edge Computing (MEC). We address the challenge that IoT devices may have different priorities when demanding bandwidth for data transmission in specific applications and services. Due to the scarcity of bandwidth in an MEC enabled IoT network, our objective is to optimize bandwidt…
▽ More
In this paper, we investigate a key problem of Narrowband-Internet of Things (NB-IoT) in the context of 5G with Mobile Edge Computing (MEC). We address the challenge that IoT devices may have different priorities when demanding bandwidth for data transmission in specific applications and services. Due to the scarcity of bandwidth in an MEC enabled IoT network, our objective is to optimize bandwidth allocation for a group of NB-IoT devices in a way that the group can work collaboratively to maximize their overall utility. To this end, we design an optimal distributed algorithm and use simulations to demonstrate its efficacy to effectively manage various IoT data streams in a fully distributed framework.
△ Less
Submitted 5 March, 2021;
originally announced May 2021.
-
A Comparative Study of Using Spatial-Temporal Graph Convolutional Networks for Predicting Availability in Bike Sharing Schemes
Authors:
Zhengyong Chen,
Hongde Wu,
Noel E. O'Connor,
Mingming Liu
Abstract:
Accurately forecasting transportation demand is crucial for efficient urban traffic guidance, control and management. One solution to enhance the level of prediction accuracy is to leverage graph convolutional networks (GCN), a neural network based modelling approach with the ability to process data contained in graph based structures. As a powerful extension of GCN, a spatial-temporal graph convo…
▽ More
Accurately forecasting transportation demand is crucial for efficient urban traffic guidance, control and management. One solution to enhance the level of prediction accuracy is to leverage graph convolutional networks (GCN), a neural network based modelling approach with the ability to process data contained in graph based structures. As a powerful extension of GCN, a spatial-temporal graph convolutional network (ST-GCN) aims to capture the relationship of data contained in the graphical nodes across both spatial and temporal dimensions, which presents a novel deep learning paradigm for the analysis of complex time-series data that also involves spatial information as present in transportation use cases. In this paper, we present an Attention-based ST-GCN (AST-GCN) for predicting the number of available bikes in bike-sharing systems in cities, where the attention-based mechanism is introduced to further improve the performance of an ST-GCN. Furthermore, we also discuss the impacts of different modelling methods of adjacency matrices on the proposed architecture. Our experimental results are presented using two real-world datasets, Dublinbikes and NYC-Citi Bike, to illustrate the efficacy of our proposed model which outperforms the majority of existing approaches.
△ Less
Submitted 6 July, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
An ADMM-based Optimal Transmission Frequency Management System for IoT Edge Intelligence
Authors:
Hongde Wu,
Noel E. O'Connor,
Jennifer Bruton,
Mingming Liu
Abstract:
In this paper, we investigate a key problem of Internet of Things (IoT) applications in practice. Our research objective is to optimize the transmission frequencies for a group of IoT edge devices under practical constraints. Our key assumption is that different IoT devices may have different priority levels when transmitting data in a resource-constrained environment and that those priority level…
▽ More
In this paper, we investigate a key problem of Internet of Things (IoT) applications in practice. Our research objective is to optimize the transmission frequencies for a group of IoT edge devices under practical constraints. Our key assumption is that different IoT devices may have different priority levels when transmitting data in a resource-constrained environment and that those priority levels may only be locally defined and accessible by edge devices for privacy concerns. To address this problem, we leverage the well-known Alternating Direction Method of Multipliers (ADMM) optimization method and demonstrate its applicability for effectively managing various IoT data streams in a decentralized framework. Our experimental results show that the transmission frequency of each edge device can converge to optimality with little delay using ADMM, and the proposed system can be adjusted dynamically when a new device connects to the system. In addition, we also introduce an anomaly detection mechanism to the system when a device's transmission frequency may be compromised by external manipulation, showing that the proposed system is robust and secure for various IoT applications.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
An Intelligent Multi-Speed Advisory System using Improved Whale Optimisation Algorithm
Authors:
Beiran Chen,
Mingming Liu,
Yi Zhang,
Zhengyong Chen,
Yingqi Gu,
Noel E. O'Connor
Abstract:
An intelligent speed advisory system can be used to recommend speed for vehicles travelling in a given road network in cities. In this paper, we extend our previous work where a distributed speed advisory system has been devised to recommend an optimal consensus speed for a fleet of Internal Combustion Engine Vehicles (ICEVs) in a highway scenario. In particular, we propose a novel optimisation fr…
▽ More
An intelligent speed advisory system can be used to recommend speed for vehicles travelling in a given road network in cities. In this paper, we extend our previous work where a distributed speed advisory system has been devised to recommend an optimal consensus speed for a fleet of Internal Combustion Engine Vehicles (ICEVs) in a highway scenario. In particular, we propose a novel optimisation framework where the exact format of each vehicle's cost function can be implicit, and our algorithm can be used to recommend multiple consensus speeds for vehicles travelling on different lanes in an urban highway scenario. Our studies show that the proposed scheme based on an improved whale optimisation algorithm can effectively reduce CO2 emission generated from ICEVs while providing different recommended speed options for groups of vehicles.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding
Authors:
Marc Górriz,
Saverio Blasi,
Alan F. Smeaton,
Noel E. O'Connor,
Marta Mrak
Abstract:
Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to b…
▽ More
Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to be complex and computationally intense, and may be difficult to deploy in a practical video coding pipeline. This work focuses on reducing the complexity of such methodologies, to design a set of simplified and cost-effective attention-based architectures for chroma intra-prediction. A novel size-agnostic multi-model approach is proposed to reduce the complexity of the inference process. The resulting simplified architecture is still capable of outperforming state-of-the-art methods. Moreover, a collection of simplifications is presented in this paper, to further reduce the complexity overhead of the proposed prediction architecture. Thanks to these simplifications, a reduction in the number of parameters of around 90% is achieved with respect to the original attention-based methodologies. Simplifications include a framework for reducing the overhead of the convolutional operations, a simplified cross-component processing model integrated into the original architecture, and a methodology to perform integer-precision approximations with the aim to obtain fast and hardware-aware implementations. The proposed schemes are integrated into the Versatile Video Coding (VVC) prediction pipeline, retaining compression efficiency of state-of-the-art chroma intra-prediction methods based on neural networks, while offering different directions for significantly reducing coding complexity.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
MPC-CSAS: Multi-Party Computation for Real-time Privacy-preserving Speed Advisory Systems
Authors:
Mingming Liu,
Long Cheng,
Yingqi Gu,
Ying Wang,
Qingzhi Liu,
Noel E. O'Connor
Abstract:
As a part of Advanced Driver Assistance Systems (ADASs), Consensus-based Speed Advisory Systems (CSAS) have been proposed to recommend a common speed to a group of vehicles for specific application purposes, such as emission control and energy management. With Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I) technologies and advanced control theories in place, state-of-the-art CSAS can be…
▽ More
As a part of Advanced Driver Assistance Systems (ADASs), Consensus-based Speed Advisory Systems (CSAS) have been proposed to recommend a common speed to a group of vehicles for specific application purposes, such as emission control and energy management. With Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I) technologies and advanced control theories in place, state-of-the-art CSAS can be designed to get an optimal speed in a privacy-preserving and decentralized manner. However, the current method only works for specific cost functions of vehicles, and its execution usually involves many algorithm iterations leading long convergence time. Therefore, the state-of-the-art design method is not applicable to a CSAS design which requires real-time decision making. In this paper, we address the problem by introducing MPC-CSAS, a Multi-Party Computation (MPC) based design approach for privacy-preserving CSAS. Our proposed method is simple to implement and applicable to all types of cost functions of vehicles. Moreover, our simulation results show that the proposed MPC-CSAS can achieve very promising system performance in just one algorithm iteration without using extra infrastructure for a typical CSAS.
△ Less
Submitted 16 January, 2021;
originally announced January 2021.
-
Investigating Memorability of Dynamic Media
Authors:
Phuc H. Le-Khac,
Ayush K. Rai,
Graham Healy,
Alan F. Smeaton,
Noel E. O'Connor
Abstract:
The Predicting Media Memorability task in MediaEval'20 has some challenging aspects compared to previous years. In this paper we identify the high-dynamic content in videos and dataset of limited size as the core challenges for the task, we propose directions to overcome some of these challenges and we present our initial result in these directions.
The Predicting Media Memorability task in MediaEval'20 has some challenging aspects compared to previous years. In this paper we identify the high-dynamic content in videos and dataset of limited size as the core challenges for the task, we propose directions to overcome some of these challenges and we present our initial result in these directions.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Multi-Objective Interpolation Training for Robustness to Label Noise
Authors:
Diego Ortego,
Eric Arazo,
Paul Albert,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Deep neural networks trained with standard cross-entropy loss memorize noisy labels, which degrades their performance. Most research to mitigate this memorization proposes new robust classification loss functions. Conversely, we propose a Multi-Objective Interpolation Training (MOIT) approach that jointly exploits contrastive learning and classification to mutually help each other and boost perfor…
▽ More
Deep neural networks trained with standard cross-entropy loss memorize noisy labels, which degrades their performance. Most research to mitigate this memorization proposes new robust classification loss functions. Conversely, we propose a Multi-Objective Interpolation Training (MOIT) approach that jointly exploits contrastive learning and classification to mutually help each other and boost performance against label noise. We show that standard supervised contrastive learning degrades in the presence of label noise and propose an interpolation training strategy to mitigate this behavior. We further propose a novel label noise detection method that exploits the robust feature representations learned via contrastive learning to estimate per-sample soft-labels whose disagreements with the original labels accurately identify noisy samples. This detection allows treating noisy samples as unlabeled and training a classifier in a semi-supervised manner to prevent noise memorization and improve representation learning. We further propose MOIT+, a refinement of MOIT by fine-tuning on detected clean samples. Hyperparameter and ablation studies verify the key components of our method. Experiments on synthetic and real-world noise benchmarks demonstrate that MOIT/MOIT+ achieves state-of-the-art results. Code is available at https://git.io/JI40X.
△ Less
Submitted 18 March, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Unsupervised Contrastive Learning of Sound Event Representations
Authors:
Eduardo Fonseca,
Diego Ortego,
Kevin McGuinness,
Noel E. O'Connor,
Xavier Serra
Abstract:
Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound…
▽ More
Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by other data augmentations. We analyze the main components of our method via ablation experiments. We evaluate the learned representations using linear evaluation, and in two in-domain downstream sound event classification tasks, namely, using limited manually labeled data, and using noisy labeled data. Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels, outperforming supervised baselines.
△ Less
Submitted 15 November, 2020;
originally announced November 2020.
-
How important are faces for person re-identification?
Authors:
Julia Dietlmeier,
Joseph Antony,
Kevin McGuinness,
Noel E. O'Connor
Abstract:
This paper investigates the dependence of existing state-of-the-art person re-identification models on the presence and visibility of human faces. We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets including Market1501, DukeMTMC-reID, CUHK03, Viper, and Airport. Using a cross-section of existing state-of-the-art model…
▽ More
This paper investigates the dependence of existing state-of-the-art person re-identification models on the presence and visibility of human faces. We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets including Market1501, DukeMTMC-reID, CUHK03, Viper, and Airport. Using a cross-section of existing state-of-the-art models that range in accuracy and computational efficiency, we evaluate the effect of this anonymization on re-identification performance using standard metrics. Perhaps surprisingly, the effect on mAP is very small, and accuracy is recovered by simply training on the anonymized versions of the data rather than the original data. These findings are consistent across multiple models and datasets. These results indicate that datasets can be safely anonymized by blurring faces without significantly impacting the performance of person reidentification systems, and may allow for the release of new richer re-identification datasets where previously there were privacy or data protection concerns.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Utilising Visual Attention Cues for Vehicle Detection and Tracking
Authors:
Feiyan Hu,
Venkatesh G M,
Noel E. O'Connor,
Alan F. Smeaton,
Suzanne Little
Abstract:
Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency m…
▽ More
Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
Reliable Label Bootstrapping for Semi-Supervised Learning
Authors:
Paul Albert,
Diego Ortego,
Eric Arazo,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Reducing the amount of labels required to train convolutional neural networks without performance degradation is key to effectively reduce human annotation efforts. We propose Reliable Label Bootstrapping (ReLaB), an unsupervised preprossessing algorithm which improves the performance of semi-supervised algorithms in extremely low supervision settings. Given a dataset with few labeled samples, we…
▽ More
Reducing the amount of labels required to train convolutional neural networks without performance degradation is key to effectively reduce human annotation efforts. We propose Reliable Label Bootstrapping (ReLaB), an unsupervised preprossessing algorithm which improves the performance of semi-supervised algorithms in extremely low supervision settings. Given a dataset with few labeled samples, we first learn meaningful self-supervised, latent features for the data. Second, a label propagation algorithm propagates the known labels on the unsupervised features, effectively labeling the full dataset in an automatic fashion. Third, we select a subset of correctly labeled (reliable) samples using a label noise detection algorithm. Finally, we train a semi-supervised algorithm on the extended subset. We show that the selection of the network architecture and the self-supervised algorithm are important factors to achieve successful label propagation and demonstrate that ReLaB substantially improves semi-supervised learning in scenarios of very limited supervision on CIFAR-10, CIFAR-100 and mini-ImageNet. We reach average error rates of $\boldsymbol{22.34}$ with 1 random labeled sample per class on CIFAR-10 and lower this error to $\boldsymbol{8.46}$ when the labeled sample in each class is highly representative. Our work is fully reproducible: https://github.com/PaulAlbert31/ReLaB.
△ Less
Submitted 25 February, 2021; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Chroma Intra Prediction with attention-based CNN architectures
Authors:
Marc Górriz,
Saverio Blasi,
Alan F. Smeaton,
Noel E. O'Connor,
Marta Mrak
Abstract:
Neural networks can be used in video coding to improve chroma intra-prediction. In particular, usage of fully-connected networks has enabled better cross-component prediction with respect to traditional linear models. Nonetheless, state-of-the-art architectures tend to disregard the location of individual reference samples in the prediction process. This paper proposes a new neural network archite…
▽ More
Neural networks can be used in video coding to improve chroma intra-prediction. In particular, usage of fully-connected networks has enabled better cross-component prediction with respect to traditional linear models. Nonetheless, state-of-the-art architectures tend to disregard the location of individual reference samples in the prediction process. This paper proposes a new neural network architecture for cross-component intra-prediction. The network uses a novel attention module to model spatial relations between reference and predicted samples. The proposed approach is integrated into the Versatile Video Coding (VVC) prediction pipeline. Experimental results demonstrate compression gains over the latest VVC anchor compared with state-of-the-art chroma intra-prediction methods based on neural networks.
△ Less
Submitted 27 June, 2020;
originally announced June 2020.
-
Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding
Authors:
Luka Murn,
Saverio Blasi,
Alan F. Smeaton,
Noel E. O'Connor,
Marta Mrak
Abstract:
Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion…
▽ More
Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion compensation. Contrary to previous efforts, the proposed approach focuses on complexity reduction achieved by interpreting the interpolation filters learned by the networks. When the approach is implemented in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for individual sequences is achieved compared with the baseline VVC, while the complexity of learned interpolation is significantly reduced compared to the application of full neural network.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Investigating Class-level Difficulty Factors in Multi-label Classification Problems
Authors:
Mark Marsden,
Kevin McGuinness,
Joseph Antony,
Haolin Wei,
Milan Redzic,
Jian Tang,
Zhilan Hu,
Alan Smeaton,
Noel E O'Connor
Abstract:
This work investigates the use of class-level difficulty factors in multi-label classification problems for the first time. Four class-level difficulty factors are proposed: frequency, visual variation, semantic abstraction, and class co-occurrence. Once computed for a given multi-label classification dataset, these difficulty factors are shown to have several potential applications including the…
▽ More
This work investigates the use of class-level difficulty factors in multi-label classification problems for the first time. Four class-level difficulty factors are proposed: frequency, visual variation, semantic abstraction, and class co-occurrence. Once computed for a given multi-label classification dataset, these difficulty factors are shown to have several potential applications including the prediction of class-level performance across datasets and the improvement of predictive performance through difficulty weighted optimisation. Significant improvements to mAP and AUC performance are observed for two challenging multi-label datasets (WWW Crowd and Visual Genome) with the inclusion of difficulty weighted optimisation. The proposed technique does not require any additional computational complexity during training or inference and can be extended over time with inclusion of other class-level difficulty factors.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Towards Robust Learning with Different Label Noise Distributions
Authors:
Diego Ortego,
Eric Arazo,
Paul Albert,
Noel E. O'Connor,
Kevin McGuinness
Abstract:
Noisy labels are an unavoidable consequence of labeling processes and detecting them is an important step towards preventing performance degradations in Convolutional Neural Networks. Discarding noisy labels avoids a harmful memorization, while the associated image content can still be exploited in a semi-supervised learning (SSL) setup. Clean samples are usually identified using the small loss tr…
▽ More
Noisy labels are an unavoidable consequence of labeling processes and detecting them is an important step towards preventing performance degradations in Convolutional Neural Networks. Discarding noisy labels avoids a harmful memorization, while the associated image content can still be exploited in a semi-supervised learning (SSL) setup. Clean samples are usually identified using the small loss trick, i.e. they exhibit a low loss. However, we show that different noise distributions make the application of this trick less straightforward and propose to continuously relabel all images to reveal a discriminative loss against multiple distributions. SSL is then applied twice, once to improve the clean-noisy detection and again for training the final model. We design an experimental setup based on ImageNet32/64 for better understanding the consequences of representation learning with differing label noise distributions and find that non-uniform out-of-distribution noise better resembles real-world noise and that in most cases intermediate features are not affected by label noise corruption. Experiments in CIFAR-10/100, ImageNet32/64 and WebVision (real-world noise) demonstrate that the proposed label noise Distribution Robust Pseudo-Labeling (DRPL) approach gives substantial improvements over recent state-of-the-art. Code is available at https://git.io/JJ0PV.
△ Less
Submitted 27 July, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.