-
AI-GenBench: A New Ongoing Benchmark for AI-Generated Image Detection
Authors:
Lorenzo Pellegrini,
Davide Cozzolino,
Serafino Pandolfini,
Davide Maltoni,
Matteo Ferrara,
Luisa Verdoliva,
Marco Prati,
Marco Ramilli
Abstract:
The rapid advancement of generative AI has revolutionized image creation, enabling high-quality synthesis from text prompts while raising critical challenges for media authenticity. We present Ai-GenBench, a novel benchmark designed to address the urgent need for robust detection of AI-generated images in real-world scenarios. Unlike existing solutions that evaluate models on static datasets, Ai-G…
▽ More
The rapid advancement of generative AI has revolutionized image creation, enabling high-quality synthesis from text prompts while raising critical challenges for media authenticity. We present Ai-GenBench, a novel benchmark designed to address the urgent need for robust detection of AI-generated images in real-world scenarios. Unlike existing solutions that evaluate models on static datasets, Ai-GenBench introduces a temporal evaluation framework where detection methods are incrementally trained on synthetic images, historically ordered by their generative models, to test their ability to generalize to new generative models, such as the transition from GANs to diffusion models. Our benchmark focuses on high-quality, diverse visual content and overcomes key limitations of current approaches, including arbitrary dataset splits, unfair comparisons, and excessive computational demands. Ai-GenBench provides a comprehensive dataset, a standardized evaluation protocol, and accessible tools for both researchers and non-experts (e.g., journalists, fact-checkers), ensuring reproducibility while maintaining practical training requirements. By establishing clear evaluation rules and controlled augmentation strategies, Ai-GenBench enables meaningful comparison of detection methods and scalable solutions. Code and data are publicly available to ensure reproducibility and to support the development of robust forensic detectors to keep pace with the rise of new synthetic generators.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
From Gaming to Research: GTA V for Synthetic Data Generation for Robotics and Navigations
Authors:
Matteo Scucchia,
Matteo Ferrara,
Davide Maltoni
Abstract:
In computer vision, the development of robust algorithms capable of generalizing effectively in real-world scenarios more and more often requires large-scale datasets collected under diverse environmental conditions. However, acquiring such datasets is time-consuming, costly, and sometimes unfeasible. To address these limitations, the use of synthetic data has gained attention as a viable alternat…
▽ More
In computer vision, the development of robust algorithms capable of generalizing effectively in real-world scenarios more and more often requires large-scale datasets collected under diverse environmental conditions. However, acquiring such datasets is time-consuming, costly, and sometimes unfeasible. To address these limitations, the use of synthetic data has gained attention as a viable alternative, allowing researchers to generate vast amounts of data while simulating various environmental contexts in a controlled setting. In this study, we investigate the use of synthetic data in robotics and navigation, specifically focusing on Simultaneous Localization and Mapping (SLAM) and Visual Place Recognition (VPR). In particular, we introduce a synthetic dataset created using the virtual environment of the video game Grand Theft Auto V (GTA V), along with an algorithm designed to generate a VPR dataset, without human supervision. Through a series of experiments centered on SLAM and VPR, we demonstrate that synthetic data derived from GTA V are qualitatively comparable to real-world data. Furthermore, these synthetic data can complement or even substitute real-world data in these applications. This study sets the stage for the creation of large-scale synthetic datasets, offering a cost-effective and scalable solution for future research and development.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
ONOT: a High-Quality ICAO-compliant Synthetic Mugshot Dataset
Authors:
Nicolò Di Domenico,
Guido Borghi,
Annalisa Franco,
Davide Maltoni
Abstract:
Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets containing personal information, such as faces. Following this intuition, in this paper we introduce ONOT, a synthetic dataset specifically focused on the generation of high-quality faces in adherence to the requirements of the ISO/IEC 39794-5 standa…
▽ More
Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets containing personal information, such as faces. Following this intuition, in this paper we introduce ONOT, a synthetic dataset specifically focused on the generation of high-quality faces in adherence to the requirements of the ISO/IEC 39794-5 standards that, following the guidelines of the International Civil Aviation Organization (ICAO), defines the interchange formats of face images in electronic Machine-Readable Travel Documents (eMRTD). The strictly controlled and varied mugshot images included in ONOT are useful in research fields related to the analysis of face images in eMRTD, such as Morphing Attack Detection and Face Quality Assessment. The dataset is publicly released, in combination with the generation procedure details in order to improve the reproducibility and enable future extensions.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Dealing with Subject Similarity in Differential Morphing Attack Detection
Authors:
Nicolò Di Domenico,
Guido Borghi,
Annalisa Franco,
Davide Maltoni
Abstract:
The advent of morphing attacks has posed significant security concerns for automated Face Recognition systems, raising the pressing need for robust and effective Morphing Attack Detection (MAD) methods able to effectively address this issue. In this paper, we focus on Differential MAD (D-MAD), where a trusted live capture, usually representing the criminal, is compared with the document image to c…
▽ More
The advent of morphing attacks has posed significant security concerns for automated Face Recognition systems, raising the pressing need for robust and effective Morphing Attack Detection (MAD) methods able to effectively address this issue. In this paper, we focus on Differential MAD (D-MAD), where a trusted live capture, usually representing the criminal, is compared with the document image to classify it as morphed or bona fide. We show these approaches based on identity features are effective when the morphed image and the live one are sufficiently diverse; unfortunately, the effectiveness is significantly reduced when the same approaches are applied to look-alike subjects or in all those cases when the similarity between the two compared images is high (e.g. comparison between the morphed image and the accomplice). Therefore, in this paper, we propose ACIdA, a modular D-MAD system, consisting of a module for the attempt type classification, and two modules for the identity and artifacts analysis on input images. Successfully addressing this task would allow broadening the D-MAD applications including, for instance, the document enrollment stage, which currently relies entirely on human evaluation, thus limiting the possibility of releasing ID documents with manipulated images, as well as the automated gates to detect both accomplices and criminals. An extensive cross-dataset experimental evaluation conducted on the introduced scenario shows that ACIdA achieves state-of-the-art results, outperforming literature competitors, while maintaining good performance in traditional D-MAD benchmarks.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
V-MAD: Video-based Morphing Attack Detection in Operational Scenarios
Authors:
Guido Borghi,
Annalisa Franco,
Nicolò Di Domenico,
Matteo Ferrara,
Davide Maltoni
Abstract:
In response to the rising threat of the face morphing attack, this paper introduces and explores the potential of Video-based Morphing Attack Detection (V-MAD) systems in real-world operational scenarios. While current morphing attack detection methods primarily focus on a single or a pair of images, V-MAD is based on video sequences, exploiting the video streams often acquired by face verificatio…
▽ More
In response to the rising threat of the face morphing attack, this paper introduces and explores the potential of Video-based Morphing Attack Detection (V-MAD) systems in real-world operational scenarios. While current morphing attack detection methods primarily focus on a single or a pair of images, V-MAD is based on video sequences, exploiting the video streams often acquired by face verification tools available, for instance, at airport gates. Through this study, we show for the first time the advantages that the availability of multiple probe frames can bring to the morphing attack detection task, especially in scenarios where the quality of probe images is varied and might be affected, for instance, by pose or illumination variations. Experimental results on a real operational database demonstrate that video sequences represent valuable information for increasing the robustness and performance of morphing attack detection systems.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
SDFR: Synthetic Data for Face Recognition Competition
Authors:
Hatef Otroshi Shahreza,
Christophe Ecabert,
Anjith George,
Alexander Unnervik,
Sébastien Marcel,
Nicolò Di Domenico,
Guido Borghi,
Davide Maltoni,
Fadi Boutros,
Julia Vogel,
Naser Damer,
Ángela Sánchez-Pérez,
EnriqueMas-Candela,
Jorge Calvo-Zaragoza,
Bernardo Biesseck,
Pedro Vidal,
Roger Granada,
David Menotti,
Ivan DeAndres-Tame,
Simone Maurizio La Cava,
Sara Concas,
Pietro Melzi,
Ruben Tolosana,
Ruben Vera-Rodriguez,
Gianpaolo Perelli
, et al. (3 additional authors not shown)
Abstract:
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data…
▽ More
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) and established to investigate the use of synthetic data for training face recognition models. The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones. In the first task, the face recognition backbone was fixed and the dataset size was limited, while the second task provided almost complete freedom on the model backbone, the dataset, and the training pipeline. The submitted models were trained on existing and also new synthetic datasets and used clever methods to improve training with synthetic data. The submissions were evaluated and ranked on a diverse set of seven benchmarking datasets. The paper gives an overview of the submitted face recognition models and reports achieved performance compared to baseline models trained on real and synthetic datasets. Furthermore, the evaluation of submissions is extended to bias assessment across different demography groups. Lastly, an outlook on the current state of the research in training face recognition models using synthetic data is presented, and existing problems as well as potential future directions are also discussed.
△ Less
Submitted 9 April, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Continual Learning by Three-Phase Consolidation
Authors:
Davide Maltoni,
Lorenzo Pellegrini
Abstract:
TPC (Three-Phase Consolidation) is here introduced as a simple but effective approach to continually learn new classes (and/or instances of known classes) while controlling forgetting of previous knowledge. Each experience (a.k.a. task) is learned in three phases characterized by different rules and learning dynamics, aimed at removing the class-bias problem (due to class unbalancing) and limiting…
▽ More
TPC (Three-Phase Consolidation) is here introduced as a simple but effective approach to continually learn new classes (and/or instances of known classes) while controlling forgetting of previous knowledge. Each experience (a.k.a. task) is learned in three phases characterized by different rules and learning dynamics, aimed at removing the class-bias problem (due to class unbalancing) and limiting gradient-based corrections to prevent forgetting of underrepresented classes. Several experiments on complex datasets demonstrate its accuracy and efficiency advantages over competitive existing approaches. The algorithm and all the results presented in this paper are fully reproducible thanks to its publication on the Avalanche open framework for continual learning.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Enabling On-device Continual Learning with Binary Neural Networks
Authors:
Lorenzo Vorabbi,
Davide Maltoni,
Guido Borghi,
Stefano Santi
Abstract:
On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Secon…
▽ More
On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Second, the development of learning algorithms on models with extreme quantization levels, such as Binary Neural Networks (BNNs), is critical due to the drastic reduction in bit representation. In this study, we propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks to enable on-device training while maintaining competitive performance. Specifically, our approach leverages binary latent replay (LR) activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation. The experimental validation demonstrates a significant accuracy improvement in combination with a noticeable reduction in memory requirement, confirming the suitability of our approach in expanding the practical applications of deep learning in real-world scenarios.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
On-Device Learning with Binary Neural Networks
Authors:
Lorenzo Vorabbi,
Davide Maltoni,
Stefano Santi
Abstract:
Existing Continual Learning (CL) solutions only partially address the constraints on power, memory and computation of the deep learning models when deployed on low-power embedded CPUs. In this paper, we propose a CL solution that embraces the recent advancements in CL field and the efficiency of the Binary Neural Networks (BNN), that use 1-bit for weights and activations to efficiently execute dee…
▽ More
Existing Continual Learning (CL) solutions only partially address the constraints on power, memory and computation of the deep learning models when deployed on low-power embedded CPUs. In this paper, we propose a CL solution that embraces the recent advancements in CL field and the efficiency of the Binary Neural Networks (BNN), that use 1-bit for weights and activations to efficiently execute deep learning models. We propose a hybrid quantization of CWR* (an effective CL approach) that considers differently forward and backward pass in order to retain more precision during gradient update step and at the same time minimizing the latency overhead. The choice of a binary network as backbone is essential to meet the constraints of low power devices and, to the best of authors' knowledge, this is the first attempt to prove on-device learning with BNN. The experimental validation carried out confirms the validity and the suitability of the proposed method.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Arithmetic with Language Models: from Memorization to Computation
Authors:
Davide Maltoni,
Matteo Ferrara
Abstract:
A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a g…
▽ More
A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.
△ Less
Submitted 2 August, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Detecting Morphing Attacks via Continual Incremental Training
Authors:
Lorenzo Pellegrini,
Guido Borghi,
Annalisa Franco,
Davide Maltoni
Abstract:
Scenarios in which restrictions in data transfer and storage limit the possibility to compose a single dataset -- also exploiting different data sources -- to perform a batch-based training procedure, make the development of robust models particularly challenging. We hypothesize that the recent Continual Learning (CL) paradigm may represent an effective solution to enable incremental training, eve…
▽ More
Scenarios in which restrictions in data transfer and storage limit the possibility to compose a single dataset -- also exploiting different data sources -- to perform a batch-based training procedure, make the development of robust models particularly challenging. We hypothesize that the recent Continual Learning (CL) paradigm may represent an effective solution to enable incremental training, even through multiple sites. Indeed, a basic assumption of CL is that once a model has been trained, old data can no longer be used in successive training iterations and in principle can be deleted. Therefore, in this paper, we investigate the performance of different Continual Learning methods in this scenario, simulating a learning model that is updated every time a new chunk of data, even of variable size, is available. Experimental results reveal that a particular CL method, namely Learning without Forgetting (LwF), is one of the best-performing algorithms. Then, we investigate its usage and parametrization in Morphing Attack Detection and Object Classification tasks, specifically with respect to the amount of new training data that became available.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Input Layer Binarization with Bit-Plane Encoding
Authors:
Lorenzo Vorabbi,
Davide Maltoni,
Stefano Santi
Abstract:
Binary Neural Networks (BNNs) use 1-bit weights and activations to efficiently execute deep convolutional neural networks on edge devices. Nevertheless, the binarization of the first layer is conventionally excluded, as it leads to a large accuracy loss. The few works addressing the first layer binarization, typically increase the number of input channels to enhance data representation; such data…
▽ More
Binary Neural Networks (BNNs) use 1-bit weights and activations to efficiently execute deep convolutional neural networks on edge devices. Nevertheless, the binarization of the first layer is conventionally excluded, as it leads to a large accuracy loss. The few works addressing the first layer binarization, typically increase the number of input channels to enhance data representation; such data expansion raises the amount of operations needed and it is feasible only on systems with enough computational resources. In this work, we present a new method to binarize the first layer using directly the 8-bit representation of input data; we exploit the standard bit-planes encoding to extract features bit-wise (using depth-wise convolutions); after a re-weighting stage, features are fused again. The resulting model is fully binarized and our first layer binarization approach is model independent. The concept is evaluated on three classification datasets (CIFAR10, SVHN and CIFAR100) for different model architectures (VGG and ResNet) and, the proposed technique outperforms state of the art methods both in accuracy and BMACs reduction.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Optimizing data-flow in Binary Neural Networks
Authors:
L. Vorabbi,
D. Maltoni,
S. Santi
Abstract:
Binary Neural Networks (BNNs) can significantly accelerate the inference time of a neural network by replacing its expensive floating-point arithmetic with bitwise operations. Most existing solutions, however, do not fully optimize data flow through the BNN layers, and intermediate conversions from 1 to 16/32 bits often further hinder efficiency. We propose a novel training scheme that can increas…
▽ More
Binary Neural Networks (BNNs) can significantly accelerate the inference time of a neural network by replacing its expensive floating-point arithmetic with bitwise operations. Most existing solutions, however, do not fully optimize data flow through the BNN layers, and intermediate conversions from 1 to 16/32 bits often further hinder efficiency. We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline; specifically, we introduce a clipping block that decreases the data-width from 32 bits to 8. Furthermore, we reduce the internal accumulator size of a binary layer, usually kept using 32-bit to prevent data overflow without losing accuracy. Additionally, we provide an optimization of the Batch Normalization layer that both reduces latency and simplifies deployment. Finally, we present an optimized implementation of the Binary Direct Convolution for ARM instruction sets. Our experiments show a consistent improvement of the inference speed (up to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no drop in accuracy for at least one full-precision model.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Region Prediction for Efficient Robot Localization on Large Maps
Authors:
Matteo Scucchia,
Davide Maltoni
Abstract:
Recognizing already explored places (a.k.a. place recognition) is a fundamental task in Simultaneous Localization and Mapping (SLAM) to enable robot relocalization and loop closure detection. In topological SLAM the recognition takes place by comparing a signature (or feature vector) associated to the current node with the signatures of the nodes in the known map. However, as the number of nodes i…
▽ More
Recognizing already explored places (a.k.a. place recognition) is a fundamental task in Simultaneous Localization and Mapping (SLAM) to enable robot relocalization and loop closure detection. In topological SLAM the recognition takes place by comparing a signature (or feature vector) associated to the current node with the signatures of the nodes in the known map. However, as the number of nodes increases, matching the current node signature against all the existing ones becomes inefficient and thwarts real-time navigation. In this paper we propose a novel approach to pre-select a subset of map nodes for place recognition. The map nodes are clustered during exploration and each cluster is associated with a region. The region labels become the prediction targets of a deep neural network and, during navigation, only the nodes associated with the regions predicted with high probability are considered for matching. While the proposed technique can be integrated in different SLAM approaches, in this work we describe an effective integration with RTAB-Map (a popular framework for real-time topological SLAM) which allowed us to design and run several experiments to demonstrate its effectiveness. All the code and material from the experiments will be available online at https://github.com/MI-BioLab/region-learner.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
On the challenges to learn from Natural Data Streams
Authors:
Guido Borghi,
Gabriele Graffieti,
Davide Maltoni
Abstract:
In real-world contexts, sometimes data are available in form of Natural Data Streams, i.e. data characterized by a streaming nature, unbalanced distribution, data drift over a long time frame and strong correlation of samples in short time ranges. Moreover, a clear separation between the traditional training and deployment phases is usually lacking. This data organization and fruition represents a…
▽ More
In real-world contexts, sometimes data are available in form of Natural Data Streams, i.e. data characterized by a streaming nature, unbalanced distribution, data drift over a long time frame and strong correlation of samples in short time ranges. Moreover, a clear separation between the traditional training and deployment phases is usually lacking. This data organization and fruition represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms and incremental learning agents, i.e. agents that have the ability to incrementally improve their knowledge through the past experience. In this paper, we investigate the classification performance of a variety of algorithms that belong to various research field, i.e. Continual, Streaming and Online Learning, that receives as training input Natural Data Streams. The experimental validation is carried out on three different datasets, expressly organized to replicate this challenging setting.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Architect, Regularize and Replay (ARR): a Flexible Hybrid Approach for Continual Learning
Authors:
Vincenzo Lomonaco,
Lorenzo Pellegrini,
Gabriele Graffieti,
Davide Maltoni
Abstract:
In recent years we have witnessed a renewed interest in machine learning methodologies, especially for deep representation learning, that could overcome basic i.i.d. assumptions and tackle non-stationary environments subject to various distributional shifts or sample selection biases. Within this context, several computational approaches based on architectural priors, regularizers and replay polic…
▽ More
In recent years we have witnessed a renewed interest in machine learning methodologies, especially for deep representation learning, that could overcome basic i.i.d. assumptions and tackle non-stationary environments subject to various distributional shifts or sample selection biases. Within this context, several computational approaches based on architectural priors, regularizers and replay policies have been proposed with different degrees of success depending on the specific scenario in which they were developed and assessed. However, designing comprehensive hybrid solutions that can flexibly and generally be applied with tunable efficiency-effectiveness trade-offs still seems a distant goal. In this paper, we propose "Architect, Regularize and Replay" (ARR), an hybrid generalization of the renowned AR1 algorithm and its variants, that can achieve state-of-the-art results in classic scenarios (e.g. class-incremental learning) but also generalize to arbitrary data streams generated from real-world datasets such as CIFAR-100, CORe50 and ImageNet-1000.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
Morphing Attack Potential
Authors:
Matteo Ferrara,
Annalisa Franco,
Davide Maltoni,
Christoph Busch
Abstract:
In security systems the risk assessment in the sense of common criteria testing is a very relevant topic; this requires quantifying the attack potential in terms of the expertise of the attacker, his knowledge about the target and access to equipment. Contrary to those attacks, the recently revealed morphing attacks against Face Recognition Systems (FRSs) can not be assessed by any of the above cr…
▽ More
In security systems the risk assessment in the sense of common criteria testing is a very relevant topic; this requires quantifying the attack potential in terms of the expertise of the attacker, his knowledge about the target and access to equipment. Contrary to those attacks, the recently revealed morphing attacks against Face Recognition Systems (FRSs) can not be assessed by any of the above criteria. But not all morphing techniques pose the same risk for an operational face recognition system. This paper introduces with the Morphing Attack Potential (MAP) a consistent methodology, that can quantify the risk, which a certain morphing attack creates.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Generative Negative Replay for Continual Learning
Authors:
Gabriele Graffieti,
Davide Maltoni,
Lorenzo Pellegrini,
Vincenzo Lomonaco
Abstract:
Learning continually is a key aspect of intelligence and a necessary ability to solve many real-life problems. One of the most effective strategies to control catastrophic forgetting, the Achilles' heel of continual learning, is storing part of the old data and replaying them interleaved with new experiences (also known as the replay approach). Generative replay, which is using generative models t…
▽ More
Learning continually is a key aspect of intelligence and a necessary ability to solve many real-life problems. One of the most effective strategies to control catastrophic forgetting, the Achilles' heel of continual learning, is storing part of the old data and replaying them interleaved with new experiences (also known as the replay approach). Generative replay, which is using generative models to provide replay patterns on demand, is particularly intriguing, however, it was shown to be effective mainly under simplified assumptions, such as simple scenarios and low-dimensional data. In this paper, we show that, while the generated data are usually not able to improve the classification accuracy for the old classes, they can be effective as negative examples (or antagonists) to better learn the new classes, especially when the learning experiences are small and contain examples of just one or few classes. The proposed approach is validated on complex class-incremental and data-incremental continual learning scenarios (CORe50 and ImageNet-1000) composed of high-dimensional data and a large number of training experiences: a setup where existing generative replay approaches usually fail.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Is Class-Incremental Enough for Continual Learning?
Authors:
Andrea Cossu,
Gabriele Graffieti,
Lorenzo Pellegrini,
Davide Maltoni,
Davide Bacciu,
Antonio Carta,
Vincenzo Lomonaco
Abstract:
The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit…
▽ More
The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit that an excessive focus on this setting may be limiting for future research on continual learning, since class-incremental scenarios artificially exacerbate catastrophic forgetting, at the expense of other important objectives like forward transfer and computational efficiency. In many real-world environments, in fact, repetition of previously encountered concepts occurs naturally and contributes to softening the disruption of previous knowledge. We advocate for a more in-depth study of alternative continual learning scenarios, in which repetition is integrated by design in the stream of incoming information. Starting from already existing proposals, we describe the advantages such class-incremental with repetition scenarios could offer for a more comprehensive assessment of continual learning models.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Continual Learning at the Edge: Real-Time Training on Smartphone Devices
Authors:
Lorenzo Pellegrini,
Vincenzo Lomonaco,
Gabriele Graffieti,
Davide Maltoni
Abstract:
On-device training for personalized learning is a challenging research problem. Being able to quickly adapt deep prediction models at the edge is necessary to better suit personal user needs. However, adaptation on the edge poses some questions on both the efficiency and sustainability of the learning process and on the ability to work under shifting data distributions. Indeed, naively fine-tuning…
▽ More
On-device training for personalized learning is a challenging research problem. Being able to quickly adapt deep prediction models at the edge is necessary to better suit personal user needs. However, adaptation on the edge poses some questions on both the efficiency and sustainability of the learning process and on the ability to work under shifting data distributions. Indeed, naively fine-tuning a prediction model only on the newly available data results in catastrophic forgetting, a sudden erasure of previously acquired knowledge. In this paper, we detail the implementation and deployment of a hybrid continual learning strategy (AR1*) on a native Android application for real-time on-device personalization without forgetting. Our benchmark, based on an extension of the CORe50 dataset, shows the efficiency and effectiveness of our solution.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
Avalanche: an End-to-End Library for Continual Learning
Authors:
Vincenzo Lomonaco,
Lorenzo Pellegrini,
Andrea Cossu,
Antonio Carta,
Gabriele Graffieti,
Tyler L. Hayes,
Matthias De Lange,
Marc Masana,
Jary Pomponi,
Gido van de Ven,
Martin Mundt,
Qi She,
Keiland Cooper,
Jeremy Forest,
Eden Belouadah,
Simone Calderara,
German I. Parisi,
Fabio Cuzzolin,
Andreas Tolias,
Simone Scardapane,
Luca Antiga,
Subutai Amhad,
Adrian Popescu,
Christopher Kanan,
Joost van de Weijer
, et al. (3 additional authors not shown)
Abstract:
Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standa…
▽ More
Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standard benchmarks are hard to reproduce. In this work, we propose Avalanche, an open-source end-to-end library for continual learning research based on PyTorch. Avalanche is designed to provide a shared and collaborative codebase for fast prototyping, training, and reproducible evaluation of continual learning algorithms.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions
Authors:
Vincenzo Lomonaco,
Lorenzo Pellegrini,
Pau Rodriguez,
Massimo Caccia,
Qi She,
Yu Chen,
Quentin Jodelet,
Ruiping Wang,
Zheda Mai,
David Vazquez,
German I. Parisi,
Nikhil Churamani,
Marc Pickett,
Issam Laradji,
Davide Maltoni
Abstract:
In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous. However, despite the significant and undoubted progress of the field in addressing the issue of catastrophic forgetting, benchmarking different continual learning approaches is a…
▽ More
In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous. However, despite the significant and undoubted progress of the field in addressing the issue of catastrophic forgetting, benchmarking different continual learning approaches is a difficult task by itself. In fact, given the proliferation of different settings, training and evaluation protocols, metrics and nomenclature, it is often tricky to properly characterize a continual learning algorithm, relate it to other solutions and gauge its real-world applicability. The first Continual Learning in Computer Vision challenge held at CVPR in 2020 has been one of the first opportunities to evaluate different continual learning algorithms on a common hardware with a large set of shared evaluation metrics and 3 different settings based on the realistic CORe50 video benchmark. In this paper, we report the main results of the competition, which counted more than 79 teams registered, 11 finalists and 2300$ in prizes. We also summarize the winning approaches, current challenges and future research directions.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Memory-Latency-Accuracy Trade-offs for Continual Learning on a RISC-V Extreme-Edge Node
Authors:
Leonardo Ravaglia,
Manuele Rusci,
Alessandro Capotondi,
Francesco Conti,
Lorenzo Pellegrini,
Vincenzo Lomonaco,
Davide Maltoni,
Luca Benini
Abstract:
AI-powered edge devices currently lack the ability to adapt their embedded inference models to the ever-changing environment. To tackle this issue, Continual Learning (CL) strategies aim at incrementally improving the decision capabilities based on newly acquired data. In this work, after quantifying memory and computational requirements of CL algorithms, we define a novel HW/SW extreme-edge platf…
▽ More
AI-powered edge devices currently lack the ability to adapt their embedded inference models to the ever-changing environment. To tackle this issue, Continual Learning (CL) strategies aim at incrementally improving the decision capabilities based on newly acquired data. In this work, after quantifying memory and computational requirements of CL algorithms, we define a novel HW/SW extreme-edge platform featuring a low power RISC-V octa-core cluster tailored for on-demand incremental learning over locally sensed data. The presented multi-core HW/SW architecture achieves a peak performance of 2.21 and 1.70 MAC/cycle, respectively, when running forward and backward steps of the gradient descent. We report the trade-off between memory footprint, latency, and accuracy for learning a new class with Latent Replay CL when targeting an image classification task on the CORe50 dataset. For a CL setting that retrains all the layers, taking 5h to learn a new class and achieving up to 77.3% of precision, a more efficient solution retrains only part of the network, reaching an accuracy of 72.5% with a memory requirement of 300 MB and a computation latency of 1.5 hours. On the other side, retraining only the last layer results in the fastest (867 ms) and less memory hungry (20 MB) solution but scoring 58% on the CORe50 dataset. Thanks to the parallelism of the low-power cluster engine, our HW/SW platform results 25x faster than typical MCU device, on which CL is still impractical, and demonstrates an 11x gain in terms of energy consumption with respect to mobile-class solutions.
△ Less
Submitted 22 July, 2020;
originally announced July 2020.
-
Morphing Attack Detection -- Database, Evaluation Platform and Benchmarking
Authors:
Kiran Raja,
Matteo Ferrara,
Annalisa Franco,
Luuk Spreeuwers,
Illias Batskos,
Florens de Wit Marta Gomez-Barrero,
Ulrich Scherhag,
Daniel Fischer,
Sushma Venkatesh,
Jag Mohan Singh,
Guoqiang Li,
Loïc Bergeron,
Sergey Isadskiy,
Raghavendra Ramachandra,
Christian Rathgeb,
Dinusha Frings,
Uwe Seidel,
Fons Knopjes,
Raymond Veldhuis,
Davide Maltoni,
Christoph Busch
Abstract:
Morphing attacks have posed a severe threat to Face Recognition System (FRS). Despite the number of advancements reported in recent works, we note serious open issues such as independent benchmarking, generalizability challenges and considerations to age, gender, ethnicity that are inadequately addressed. Morphing Attack Detection (MAD) algorithms often are prone to generalization challenges as th…
▽ More
Morphing attacks have posed a severe threat to Face Recognition System (FRS). Despite the number of advancements reported in recent works, we note serious open issues such as independent benchmarking, generalizability challenges and considerations to age, gender, ethnicity that are inadequately addressed. Morphing Attack Detection (MAD) algorithms often are prone to generalization challenges as they are database dependent. The existing databases, mostly of semi-public nature, lack in diversity in terms of ethnicity, various morphing process and post-processing pipelines. Further, they do not reflect a realistic operational scenario for Automated Border Control (ABC) and do not provide a basis to test MAD on unseen data, in order to benchmark the robustness of algorithms. In this work, we present a new sequestered dataset for facilitating the advancements of MAD where the algorithms can be tested on unseen data in an effort to better generalize. The newly constructed dataset consists of facial images from 150 subjects from various ethnicities, age-groups and both genders. In order to challenge the existing MAD algorithms, the morphed images are with careful subject pre-selection created from the contributing images, and further post-processed to remove morphing artifacts. The images are also printed and scanned to remove all digital cues and to simulate a realistic challenge for MAD algorithms. Further, we present a new online evaluation platform to test algorithms on sequestered data. With the platform we can benchmark the morph detection performance and study the generalization ability. This work also presents a detailed analysis on various subsets of sequestered data and outlines open challenges for future directions in MAD research.
△ Less
Submitted 28 September, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
Authors:
Qi She,
Fan Feng,
Qi Liu,
Rosa H. M. Chan,
Xinyue Hao,
Chuanlin Lan,
Qihan Yang,
Vincenzo Lomonaco,
German I. Parisi,
Heechul Bae,
Eoin Brophy,
Baoquan Chen,
Gabriele Graffieti,
Vidit Goel,
Hyonyoung Han,
Sathursan Kanagarajah,
Somesh Kumar,
Siew-Kei Lam,
Tin Lun Lam,
Liang Ma,
Davide Maltoni,
Lorenzo Pellegrini,
Duvindu Piyasena,
Shiliang Pu,
Debdoot Sheet
, et al. (11 additional authors not shown)
Abstract:
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w…
▽ More
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Latent Replay for Real-Time Continual Learning
Authors:
Lorenzo Pellegrini,
Gabriele Graffieti,
Vincenzo Lomonaco,
Davide Maltoni
Abstract:
Training deep neural networks at the edge on light computational devices, embedded systems and robotic platforms is nowadays very challenging. Continual learning techniques, where complex models are incrementally trained on small batches of new data, can make the learning problem tractable even for CPU-only embedded devices enabling remarkable levels of adaptiveness and autonomy. However, a number…
▽ More
Training deep neural networks at the edge on light computational devices, embedded systems and robotic platforms is nowadays very challenging. Continual learning techniques, where complex models are incrementally trained on small batches of new data, can make the learning problem tractable even for CPU-only embedded devices enabling remarkable levels of adaptiveness and autonomy. However, a number of practical problems need to be solved: catastrophic forgetting before anything else. In this paper we introduce an original technique named "Latent Replay" where, instead of storing a portion of past data in the input space, we store activations volumes at some intermediate layer. This can significantly reduce the computation and storage required by native rehearsal. To keep the representation stable and the stored activations valid we propose to slow-down learning at all the layers below the latent replay one, leaving the layers above free to learn at full pace. In our experiments we show that Latent Replay, combined with existing continual learning techniques, achieves state-of-the-art performance on complex video benchmarks such as CORe50 NICv2 (with nearly 400 small and highly non-i.i.d. batches) and OpenLORIS. Finally, we demonstrate the feasibility of nearly real-time continual learning on the edge through the deployment of the proposed technique on a smartphone device.
△ Less
Submitted 4 March, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches
Authors:
Vincenzo Lomonaco,
Davide Maltoni,
Lorenzo Pellegrini
Abstract:
Robotic vision is a field where continual learning can play a significant role. An embodied agent operating in a complex environment subject to frequent and unpredictable changes is required to learn and adapt continuously. In the context of object recognition, for example, a robot should be able to learn (without forgetting) objects of never before seen classes as well as improving its recognitio…
▽ More
Robotic vision is a field where continual learning can play a significant role. An embodied agent operating in a complex environment subject to frequent and unpredictable changes is required to learn and adapt continuously. In the context of object recognition, for example, a robot should be able to learn (without forgetting) objects of never before seen classes as well as improving its recognition capabilities as new instances of already known classes are discovered. Ideally, continual learning should be triggered by the availability of short videos of single objects and performed on-line on on-board hardware with fine-grained updates. In this paper, we introduce a novel continual learning protocol based on the CORe50 benchmark and propose two rehearsal-free continual learning techniques, CWR* and AR1*, that can learn effectively even in the challenging case of nearly 400 small non-i.i.d. incremental batches. In particular, our experiments show that AR1* can outperform other state-of-the-art rehearsal-free techniques by more than 15% accuracy in some cases, with a very light and constant computational and memory overhead across training batches.
△ Less
Submitted 21 April, 2020; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Continual Learning for Robotics: Definition, Framework, Learning Strategies, Opportunities and Challenges
Authors:
Timothée Lesort,
Vincenzo Lomonaco,
Andrei Stoian,
Davide Maltoni,
David Filliat,
Natalia Díaz-Rodríguez
Abstract:
Continual learning (CL) is a particular machine learning paradigm where the data distribution and learning objective changes through time, or where all the training data and objective criteria are never available at once. The evolution of the learning process is modeled by a sequence of learning experiences where the goal is to be able to learn new skills all along the sequence without forgetting…
▽ More
Continual learning (CL) is a particular machine learning paradigm where the data distribution and learning objective changes through time, or where all the training data and objective criteria are never available at once. The evolution of the learning process is modeled by a sequence of learning experiences where the goal is to be able to learn new skills all along the sequence without forgetting what has been previously learned. Continual learning also aims at the same time at optimizing the memory, the computation power and the speed during the learning process.
An important challenge for machine learning is not necessarily finding solutions that work in the real world but rather finding stable algorithms that can learn in real world. Hence, the ideal approach would be tackling the real world in a embodied platform: an autonomous agent. Continual learning would then be effective in an autonomous agent or robot, which would learn autonomously through time about the external world, and incrementally develop a set of complex skills and knowledge.
Robotic agents have to learn to adapt and interact with their environment using a continuous stream of observations. Some recent approaches aim at tackling continual learning for robotics, but most recent papers on continual learning only experiment approaches in simulation or with static datasets. Unfortunately, the evaluation of those algorithms does not provide insights on whether their solutions may help continual learning in the context of robotics. This paper aims at reviewing the existing state of the art of continual learning, summarizing existing benchmarks and metrics, and proposing a framework for presenting and evaluating both robotics and non robotics approaches in a way that makes transfer between both fields easier.
△ Less
Submitted 22 November, 2019; v1 submitted 29 June, 2019;
originally announced July 2019.
-
Continual Reinforcement Learning in 3D Non-stationary Environments
Authors:
Vincenzo Lomonaco,
Karan Desai,
Eugenio Culurciello,
Davide Maltoni
Abstract:
High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stati…
▽ More
High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.
△ Less
Submitted 21 April, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Face morphing detection in the presence of printing/scanning and heterogeneous image sources
Authors:
Matteo Ferrara,
Annalisa Franco,
Davide Maltoni
Abstract:
Face morphing represents nowadays a big security threat in the context of electronic identity documents as well as an interesting challenge for researchers in the field of face recognition. Despite of the good performance obtained by state-of-the-art approaches on digital images, no satisfactory solutions have been identified so far to deal with cross-database testing and printed-scanned images (t…
▽ More
Face morphing represents nowadays a big security threat in the context of electronic identity documents as well as an interesting challenge for researchers in the field of face recognition. Despite of the good performance obtained by state-of-the-art approaches on digital images, no satisfactory solutions have been identified so far to deal with cross-database testing and printed-scanned images (typically used in many countries for document issuing). In this work, novel approaches are proposed to train Deep Neural Networks for morphing detection: in particular generation of simulated printed-scanned images together with other data augmentation strategies and pre-training on large face recognition datasets, allowed to reach state-of-the-art accuracy on challenging datasets from heterogeneous image sources.
△ Less
Submitted 24 February, 2021; v1 submitted 25 January, 2019;
originally announced January 2019.
-
Don't forget, there is more than forgetting: new metrics for Continual Learning
Authors:
Natalia Díaz-Rodríguez,
Vincenzo Lomonaco,
David Filliat,
Davide Maltoni
Abstract:
Continual learning consists of algorithms that learn from a stream of data/tasks continuously and adaptively thought time, enabling the incremental development of ever more complex knowledge and skills. The lack of consensus in evaluating continual learning algorithms and the almost exclusive focus on forgetting motivate us to propose a more comprehensive set of implementation independent metrics…
▽ More
Continual learning consists of algorithms that learn from a stream of data/tasks continuously and adaptively thought time, enabling the incremental development of ever more complex knowledge and skills. The lack of consensus in evaluating continual learning algorithms and the almost exclusive focus on forgetting motivate us to propose a more comprehensive set of implementation independent metrics accounting for several factors we believe have practical implications worth considering in the deployment of real AI systems that learn continually: accuracy or performance over time, backward and forward knowledge transfer, memory overhead as well as computational efficiency. Drawing inspiration from the standard Multi-Attribute Value Theory (MAVT) we further propose to fuse these metrics into a single score for ranking purposes and we evaluate our proposal with five continual learning strategies on the iCIFAR-100 continual learning benchmark.
△ Less
Submitted 31 October, 2018;
originally announced October 2018.
-
Continuous Learning in Single-Incremental-Task Scenarios
Authors:
Davide Maltoni,
Vincenzo Lomonaco
Abstract:
It was recently shown that architectural, regularization and rehearsal strategies can be used to train deep models sequentially on a number of disjoint tasks without forgetting previously acquired knowledge. However, these strategies are still unsatisfactory if the tasks are not disjoint but constitute a single incremental task (e.g., class-incremental learning). In this paper we point out the dif…
▽ More
It was recently shown that architectural, regularization and rehearsal strategies can be used to train deep models sequentially on a number of disjoint tasks without forgetting previously acquired knowledge. However, these strategies are still unsatisfactory if the tasks are not disjoint but constitute a single incremental task (e.g., class-incremental learning). In this paper we point out the differences between multi-task and single-incremental-task scenarios and show that well-known approaches such as LWF, EWC and SI are not ideal for incremental task scenarios. A new approach, denoted as AR1, combining architectural and regularization strategies is then specifically proposed. AR1 overhead (in term of memory and computation) is very small thus making it suitable for online learning. When tested on CORe50 and iCIFAR-100, AR1 outperformed existing regularization strategies by a good margin.
△ Less
Submitted 22 January, 2019; v1 submitted 22 June, 2018;
originally announced June 2018.
-
CORe50: a New Dataset and Benchmark for Continuous Object Recognition
Authors:
Vincenzo Lomonaco,
Davide Maltoni
Abstract:
Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), wh…
▽ More
Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios.
△ Less
Submitted 9 May, 2017;
originally announced May 2017.
-
Semi-supervised Tuning from Temporal Coherence
Authors:
Davide Maltoni,
Vincenzo Lomonaco
Abstract:
Recent works demonstrated the usefulness of temporal coherence to regularize supervised training or to learn invariant features with deep architectures. In particular, enforcing smooth output changes while presenting temporally-closed frames from video sequences, proved to be an effective strategy. In this paper we prove the efficacy of temporal coherence for semi-supervised incremental tuning. We…
▽ More
Recent works demonstrated the usefulness of temporal coherence to regularize supervised training or to learn invariant features with deep architectures. In particular, enforcing smooth output changes while presenting temporally-closed frames from video sequences, proved to be an effective strategy. In this paper we prove the efficacy of temporal coherence for semi-supervised incremental tuning. We show that a deep architecture, just mildly trained in a supervised manner, can progressively improve its classification accuracy, if exposed to video sequences of unlabeled data. The extent to which, in some cases, a semi-supervised tuning allows to improve classification accuracy (approaching the supervised one) is somewhat surprising. A number of control experiments pointed out the fundamental role of temporal coherence.
△ Less
Submitted 4 January, 2016; v1 submitted 10 November, 2015;
originally announced November 2015.